Thursday, November 10, 2005

The Power of the Spatial to Attribute Join-Part 2

This is part two on my series on the mapping and analysis of complex multi-dimensional datasets. Today we will look at historical sea surface temperatures (SST). I am by no means an expert on SST models our on the effect of warming sea surface temperatures on oceanic biological habitat. So take this example with a grain of salt...

IMPORTANCE: The number of consecutive records above the max baseline can adversely effect the benthic habitat of the oceanic environment. For example, El Nino can cause coral reef bleaching. This bleaching occurs because sea surface temperatures are continuously above the historical upper limit for the species.

GOAL: For every unique ID_ONEDG (unique spatial location), select the maximum number of CONSECUTIVE months that are above the long term average maximum yearly value.

EXAMPLE DATASET: HADL_SST
TABLE COMMENTS: Sea Surface Temperatures: OI version 2 SST Analysis from NCEP - Reynolds, Smith, and Stokes. One degree resolution. Monthly from 1871 to 1999. Extent: 40N Latitude to 40S Latitude No land. Modeled data.


SQL> desc hadl_sst
Name Null? Type
----------------------------------------- -------- ------------
ID_ONEDG NOT NULL NUMBER(6)
DATE_RECORDED NOT NULL DATE
SST_CENTIGRADE NOT NULL NUMBER(6,2)

Total number of records: 33676740


For speed purposes, for this example, we need to compute the max_baseline_value.

create table HADL_SST_AVG_YR_MAX_MNTH as (
select ID_ONEDG, avg(yearly_max) MAX_BASELINE_VALUE
from (select ID_ONEDG, max(SST_CENTIGRADE) yearly_max
from hadl_sst
group by ID_ONEDG, to_char(DATE_RECORDED, 'YYYY')
)
group by ID_ONEDG
);

SQL> desc HADL_SST_AVG_YR_MAX_MNTH
Name Null? Type
----------------------------------------- -------- ---------------------
ID_ONEDG NOT NULL NUMBER(6)
MAX_BASELINE_VALUE NUMBER

Total number of records: 21755


Spatial attribute table that contains spatial information about the center of the one degree cell (latitude and longitude). This is a table that is registered with ESRI's spatial database engine (SDE). This can be visualized with traditional GIS software.


SQL> desc one_degree
Name Null? Type
----------------------------------------- -------- --------------------
OBJECTID NOT NULL NUMBER(38)
LONG_1 NUMBER(6,1)
LAT_1 NUMBER(6,1)
ID_ONEDG NUMBER(6)
SHAPE NUMBER(38)

Total number of records: 64800


The objectives of my approach to this query are:
-Select all records that have a monthly temperature greater than the baseline over a predefined three year period centered on an elnino year (or la nina).
-Group the selected rows for each ID_ONEDG where the previous record is within a month AND has a temperature value greater than the baseline.

This is where we take advantage of the LAG/LEAD Analytical function provided by Oracle.


select A.ID_ONEDG ID_ONEDG,
a.date_recorded date_recorded,
case
when date_recorded -
lag(a.date_recorded)
over (order by a.ID_ONEDG, a.date_recorded)
not between 0 and 35
then rownum
end rn,
a.sst_centigrade sst_centigrade,
b.MAX_BASELINE_VALUE MAX_BASELINE_VALUE
from hadl_sst a, HADL_SST_AVG_YR_MAX_MNTH B, ONE_DEGREE C
where C.LAT_1 between -40 and 40
and B.ID_ONEDG = C.ID_ONEDG
and A.ID_ONEDG = B.ID_ONEDG
and A.date_recorded between
to_date('01-jan-1957', 'dd-mon-yyyy')
and to_date('31-dec-1959', 'dd-mon-yyyy')
and a.sst_centigrade > b.MAX_BASELINE_VALUE
order by a.ID_ONEDG, a.date_recorded


ID_ONEDG DATE_RECO RN SST_CENTIGRADE MAX_BASELINE_VALUE
---------- --------- ---------- -------------- ------------------
18041 01-AUG-98 35 20.47 20.1232558
18042 01-AUG-98 20.66 19.9818605
18043 01-AUG-98 20.7 19.8624806
18044 01-AUG-98 20.56 19.6863566
18044 01-SEP-98 19.87 19.6863566
18045 01-AUG-97 40 19.63 19.465969
18045 01-SEP-97 19.58 19.465969
18045 01-AUG-98 42 20.26 19.465969
18045 01-SEP-98 19.91 19.465969
18046 01-AUG-97 44 19.72 19.2837209
18046 01-SEP-97 19.65 19.2837209
18046 01-AUG-98 46 20.05 19.2837209
18046 01-SEP-98 19.87 19.2837209


Notice that ID_ONEDG of 18045 has two groups of consecutive months where the temperature is greater than the baseline value (Aug-97 to Sep-97 & Aug-98 to Sep-98).


-Once the groups have been recognized we need to carry the group id forward.

select ID_ONEDG,
nvl(max(rn)
over (order by ID_ONEDG,date_recorded),
-1) max_rn
from ( QUERY-FROM-ABOVE )
order by 1


ID_ONEDG MAX_RN
---------- ----------
18041 35
18042 35
18043 35
18044 35
18044 35
18045 40
18045 40
18045 42
18045 42
18046 44
18046 44
18046 46
18046 46


Notice that for ID_ONEDG of 18045 we have two groups each with two values (40 and 42).

-Now we group id_onedg and the max_rn to compute the count of records that have similar values for id_onedg and max_rn.


select ID_ONEDG, count(max_rn) consec_months
from ( QUERY-FROM-ABOVE )
group by ID_ONEDG, max_rn

ID_ONEDG CONSEC_MONTHS
---------- -------------
18041 1
18042 1
18043 1
18044 2
18045 2
18045 2
18046 2
18046 2



-Now that we have the count for all consecutive months above the baseline for each id_onedg we need to select the maximum number of consecutive months.


select max(consec_months) max_consecutive, ID_ONEDG
from ( QUERY-FROM-ABOVE )
group by ID_ONEDG

MAX_CONSECUTIVE ID_ONEDG
--------------- ----------
1 18041
1 18042
1 18043
2 18044
2 18045
2 18046



WRAP-UP: Analytical functions can be very powerful. The only way to do the previous query without analytical functions would be to build a PL/SQL procedure (email me if you want to see that version) that does the max consecutive computation for each spatial location.


select max(consec_months) max_consecutive, ID_ONEDG
from
(
select ID_ONEDG, count(max_rn) consec_months
from
(
select ID_ONEDG, nvl(max(rn)
over(order by ID_ONEDG,date_recorded), -1) max_rn
from
(
select A.ID_ONEDG ID_ONEDG,
a.date_recorded date_recorded,
case
when date_recorded - lag(a.date_recorded) over (order by a.ID_ONEDG, a.date_recorded) not between 0 and 35
then rownum
end rn,
a.sst_centigrade sst_centigrade,
b.MAX_BASELINE_VALUE MAX_BASELINE_VALUE
from hadl_sst a, HADL_SST_AVG_YR_MAX_MNTH B, ONE_DEGREE C
where C.LAT_1 between -40 and 40
and B.ID_ONEDG = C.ID_ONEDG
and A.ID_ONEDG = B.ID_ONEDG
and A.date_recorded between to_date('01-jan-1997', 'dd-mon-yyyy') and to_date('31-dec-1999', 'dd-mon-yyyy')
and a.sst_centigrade > b.MAX_BASELINE_VALUE
order by a.ID_ONEDG, a.date_recorded
)
order by 1
)
group by ID_ONEDG, max_rn
)
group by ID_ONEDG

21516 rows selected.

Elapsed: 00:00:24.06

So what does it look like?

Conesecutive months above the baseline maximum sea surface temperature for the three year period of 1979 to 1981 (1980 was considered a normal year) and the three year period of 1982 to 1984 (1984 was considered a El Nino period).


Conesecutive months above the baseline maximum sea surface temperature for the three year period of 1988 to 1990 (1989 was considered a La Nina year) and the three year period of 1996 to 1998 (1997 was considered a strong El Nino period).

Questions? Comments?

Cheers,

Jeremy

Tuesday, November 01, 2005

The Power of the Spatial to Attribute Join-Part 1

This is the start of a series of posts related to the mapping and analysis of complex multi-dimensional datasets. Traditionally it has been difficult to map multiple dimensional datasets in a GIS. In this series we are going to look at several database driven GIS projects related to multi-dimensional mapping.

The first project relates to the mapping and analysis of Kansas Oil and Gas production data.

Oil and Gas Production Through Time
A dynamic web page that allows a user to map production data by section through time. For each product (oil or gas), you can map production by year, range of years (average or cumulative), or the percent change from one year to another. This is a good example of time-series gis within an enterprise database system.
From... KGS Energy Resources Interactive Maps


The image to the right contains a table schematic of the main lease table in our database. This table identifies how much oil and gas was produced from a designated lease. As you can see we have:
While this is useful information we need to structure it a bit more to make useful real-time mapping visualization. Because of the complex nature of lease information, we cannot currently match leases to particular wells in Kansas. We can aggregate the spatial location of leases up to public land survey boundaries (PLSS). Once we do this aggregation we have a much cleaner table to work with.


Where "RECNMBR" is the unique id for a particular 640-acre section in Kansas.

From this table can construct queries that identify the amount of production for a particular product for a particular year or for the sum over a range of years.

In the query above we are asking the database to create a view that gives us the sum of gas production between from 1990 to 2000. Within this view we are using the Group By function to group the recnmbr column into a distinct number. Each distinct recnmbr will have multiple production values. First we eliminate all the rows that do not produce gas (where product = 'G') and then we eliminate all the rows where the year is not between 1990 and 2000. We are then left with rows that have produced gas between 1990 and 2000. Take RECNMBR 2555, it has produced gas 10 out of the 12 possible years in that range. We need to summarize these 10 values for this unique RECNMBR. We can perform functions like: count, average, min, max, and sum. In our case we are going to sum the values that fit our criteria.

Now we have reduced that multi-dimensional table to one value per spatial polygon (RECNMBR). We can now join that view to a spatial dataset in either ArcMap or ArcIMS. Now we can easily produce a map like this...


Let's take it a step further and say we want to produce a map of the percent change in gas production between the years 1990 and 2000. To do that we need to formulate a bit more complicated query.

CREATE OR REPLACE VIEW OIL_GAS_WEB_PROD163922 AS
(select a.recnmbr as RECNMBR,
decode(sign(avg(b.x1)-avg(c.x2)),
1, ROUND((-100 * (avg(b.x1) - avg(c.x2))/avg(b.x1)),0),
-1,ROUND((-100 * (avg(b.x1) - avg(c.x2))/avg(b.x1)),0),
0) as PERCENTCHANGE
from ACRES_640_PRODUCTION a,
(select recnmbr, PRODUCTION x1
from ACRES_640_PRODUCTION
where year = 1990
and product = 'G') b,
(select recnmbr, PRODUCTION x2
from ACRES_640_PRODUCTION
where year = 2000
and product = 'G') c
where a.recnmbr = b.recnmbr
and a.recnmbr = c.recnmbr
and a.year in (1990,2000)
and a.product = 'G'
group by a.recnmbr
union
select distinct b.recnmbr RECNMBR,
19901 as PERCENTCHANGE
from (select recnmbr,
PRODUCTION x1
from ACRES_640_PRODUCTION
where year = 1990
and product = 'G') b
left outer join
(select recnmbr, PRODUCTION x2
from ACRES_640_PRODUCTION
where year = 2000
and product = 'G') c
on b.recnmbr = c.recnmbr
where b.x1 is not null
and c.x2 is null
union
select distinct c.recnmbr RECNMBR,
20001 as PERCENTCHANGE
from (select recnmbr,
PRODUCTION x1
from ACRES_640_PRODUCTION
where year = 1990
and product = 'G') b
right outer join
(select recnmbr,
PRODUCTION x2
from ACRES_640_PRODUCTION
where year = 2000
and product = 'G') c
on c.recnmbr = b.recnmbr
where c.x2 is not null
and b.x1 is null )

Don't be frightened! The query should be broken up into three parts. The first part will compute the percent change between year 1990 and year 2000 for all the sections where we have gas production in 1990 and in 2000 (and a.year in (1990,2000)). Parts 2 and 3 of the union are specifically for sections that produce in one year, but not the other. The final map looks like...


Areas in red are places where gas production increased from 1990 to 2000. Areas in blue show a precent decreases. Areas in green did not produce in 1990. Areas in orange stopped producing by 2000. Email me if you have any questions.

Next time we will look at sea-surface temeratures through time. I know you can't wait!

This page is powered by Blogger. Isn't yours?