Thursday, November 10, 2005

The Power of the Spatial to Attribute Join-Part 2

This is part two on my series on the mapping and analysis of complex multi-dimensional datasets. Today we will look at historical sea surface temperatures (SST). I am by no means an expert on SST models our on the effect of warming sea surface temperatures on oceanic biological habitat. So take this example with a grain of salt...

IMPORTANCE: The number of consecutive records above the max baseline can adversely effect the benthic habitat of the oceanic environment. For example, El Nino can cause coral reef bleaching. This bleaching occurs because sea surface temperatures are continuously above the historical upper limit for the species.

GOAL: For every unique ID_ONEDG (unique spatial location), select the maximum number of CONSECUTIVE months that are above the long term average maximum yearly value.

EXAMPLE DATASET: HADL_SST
TABLE COMMENTS: Sea Surface Temperatures: OI version 2 SST Analysis from NCEP - Reynolds, Smith, and Stokes. One degree resolution. Monthly from 1871 to 1999. Extent: 40N Latitude to 40S Latitude No land. Modeled data.


SQL> desc hadl_sst
Name Null? Type
----------------------------------------- -------- ------------
ID_ONEDG NOT NULL NUMBER(6)
DATE_RECORDED NOT NULL DATE
SST_CENTIGRADE NOT NULL NUMBER(6,2)

Total number of records: 33676740


For speed purposes, for this example, we need to compute the max_baseline_value.

create table HADL_SST_AVG_YR_MAX_MNTH as (
select ID_ONEDG, avg(yearly_max) MAX_BASELINE_VALUE
from (select ID_ONEDG, max(SST_CENTIGRADE) yearly_max
from hadl_sst
group by ID_ONEDG, to_char(DATE_RECORDED, 'YYYY')
)
group by ID_ONEDG
);

SQL> desc HADL_SST_AVG_YR_MAX_MNTH
Name Null? Type
----------------------------------------- -------- ---------------------
ID_ONEDG NOT NULL NUMBER(6)
MAX_BASELINE_VALUE NUMBER

Total number of records: 21755


Spatial attribute table that contains spatial information about the center of the one degree cell (latitude and longitude). This is a table that is registered with ESRI's spatial database engine (SDE). This can be visualized with traditional GIS software.


SQL> desc one_degree
Name Null? Type
----------------------------------------- -------- --------------------
OBJECTID NOT NULL NUMBER(38)
LONG_1 NUMBER(6,1)
LAT_1 NUMBER(6,1)
ID_ONEDG NUMBER(6)
SHAPE NUMBER(38)

Total number of records: 64800


The objectives of my approach to this query are:
-Select all records that have a monthly temperature greater than the baseline over a predefined three year period centered on an elnino year (or la nina).
-Group the selected rows for each ID_ONEDG where the previous record is within a month AND has a temperature value greater than the baseline.

This is where we take advantage of the LAG/LEAD Analytical function provided by Oracle.


select A.ID_ONEDG ID_ONEDG,
a.date_recorded date_recorded,
case
when date_recorded -
lag(a.date_recorded)
over (order by a.ID_ONEDG, a.date_recorded)
not between 0 and 35
then rownum
end rn,
a.sst_centigrade sst_centigrade,
b.MAX_BASELINE_VALUE MAX_BASELINE_VALUE
from hadl_sst a, HADL_SST_AVG_YR_MAX_MNTH B, ONE_DEGREE C
where C.LAT_1 between -40 and 40
and B.ID_ONEDG = C.ID_ONEDG
and A.ID_ONEDG = B.ID_ONEDG
and A.date_recorded between
to_date('01-jan-1957', 'dd-mon-yyyy')
and to_date('31-dec-1959', 'dd-mon-yyyy')
and a.sst_centigrade > b.MAX_BASELINE_VALUE
order by a.ID_ONEDG, a.date_recorded


ID_ONEDG DATE_RECO RN SST_CENTIGRADE MAX_BASELINE_VALUE
---------- --------- ---------- -------------- ------------------
18041 01-AUG-98 35 20.47 20.1232558
18042 01-AUG-98 20.66 19.9818605
18043 01-AUG-98 20.7 19.8624806
18044 01-AUG-98 20.56 19.6863566
18044 01-SEP-98 19.87 19.6863566
18045 01-AUG-97 40 19.63 19.465969
18045 01-SEP-97 19.58 19.465969
18045 01-AUG-98 42 20.26 19.465969
18045 01-SEP-98 19.91 19.465969
18046 01-AUG-97 44 19.72 19.2837209
18046 01-SEP-97 19.65 19.2837209
18046 01-AUG-98 46 20.05 19.2837209
18046 01-SEP-98 19.87 19.2837209


Notice that ID_ONEDG of 18045 has two groups of consecutive months where the temperature is greater than the baseline value (Aug-97 to Sep-97 & Aug-98 to Sep-98).


-Once the groups have been recognized we need to carry the group id forward.

select ID_ONEDG,
nvl(max(rn)
over (order by ID_ONEDG,date_recorded),
-1) max_rn
from ( QUERY-FROM-ABOVE )
order by 1


ID_ONEDG MAX_RN
---------- ----------
18041 35
18042 35
18043 35
18044 35
18044 35
18045 40
18045 40
18045 42
18045 42
18046 44
18046 44
18046 46
18046 46


Notice that for ID_ONEDG of 18045 we have two groups each with two values (40 and 42).

-Now we group id_onedg and the max_rn to compute the count of records that have similar values for id_onedg and max_rn.


select ID_ONEDG, count(max_rn) consec_months
from ( QUERY-FROM-ABOVE )
group by ID_ONEDG, max_rn

ID_ONEDG CONSEC_MONTHS
---------- -------------
18041 1
18042 1
18043 1
18044 2
18045 2
18045 2
18046 2
18046 2



-Now that we have the count for all consecutive months above the baseline for each id_onedg we need to select the maximum number of consecutive months.


select max(consec_months) max_consecutive, ID_ONEDG
from ( QUERY-FROM-ABOVE )
group by ID_ONEDG

MAX_CONSECUTIVE ID_ONEDG
--------------- ----------
1 18041
1 18042
1 18043
2 18044
2 18045
2 18046



WRAP-UP: Analytical functions can be very powerful. The only way to do the previous query without analytical functions would be to build a PL/SQL procedure (email me if you want to see that version) that does the max consecutive computation for each spatial location.


select max(consec_months) max_consecutive, ID_ONEDG
from
(
select ID_ONEDG, count(max_rn) consec_months
from
(
select ID_ONEDG, nvl(max(rn)
over(order by ID_ONEDG,date_recorded), -1) max_rn
from
(
select A.ID_ONEDG ID_ONEDG,
a.date_recorded date_recorded,
case
when date_recorded - lag(a.date_recorded) over (order by a.ID_ONEDG, a.date_recorded) not between 0 and 35
then rownum
end rn,
a.sst_centigrade sst_centigrade,
b.MAX_BASELINE_VALUE MAX_BASELINE_VALUE
from hadl_sst a, HADL_SST_AVG_YR_MAX_MNTH B, ONE_DEGREE C
where C.LAT_1 between -40 and 40
and B.ID_ONEDG = C.ID_ONEDG
and A.ID_ONEDG = B.ID_ONEDG
and A.date_recorded between to_date('01-jan-1997', 'dd-mon-yyyy') and to_date('31-dec-1999', 'dd-mon-yyyy')
and a.sst_centigrade > b.MAX_BASELINE_VALUE
order by a.ID_ONEDG, a.date_recorded
)
order by 1
)
group by ID_ONEDG, max_rn
)
group by ID_ONEDG

21516 rows selected.

Elapsed: 00:00:24.06

So what does it look like?

Conesecutive months above the baseline maximum sea surface temperature for the three year period of 1979 to 1981 (1980 was considered a normal year) and the three year period of 1982 to 1984 (1984 was considered a El Nino period).


Conesecutive months above the baseline maximum sea surface temperature for the three year period of 1988 to 1990 (1989 was considered a La Nina year) and the three year period of 1996 to 1998 (1997 was considered a strong El Nino period).

Questions? Comments?

Cheers,

Jeremy

Comments:
Thanks for this very interesting new blog. Unusual and great topic. I hope it won't get too Oracle centric as time passes.
 
Hi,

In 2003, I initiated some threads in comp.infosystems.gis on much the same topic as this blog, i.e., why don't GIS/Digital Cartography folks get into SQL more?! It seemed to me then, and now, that the GIS field is too map-centric, i.e., we are preoccupied with software that has the do-it-all approach, and we buy it because it has graphic tools to make nice maps. We need to focus more on the database side and deal with it through SQL. The map making, fun though it is, comes last.

I think when we get into the habit of thinking this way, lots of things become possible. In the USA,, it can bring freedom from the sluggish performance of ESRI software as we deal with our data elswhere (Oracle, MySQL, MSAccess...whatever) and then use the GIS apps to do what they do best.

Thanks for getting the word out with this blog!
 
You bet Lichanos-

My organization (Kansas Geological Survey--Academic/Research/Service orgainization) has TONS of data. It has been a gold mine for me to "spatially enable" the database (oil, gas, water, structure, chemistry, bigeoinformatics, etc) with GIS and good old fashion SQL. The visualization potentials are staggering. The ability to spatially aggregate large tabular datasets and join those results to spatial layers is extremely powerful. I hope to get back on it and post more examples over the next couple of weeks. I would love to see (or find) a book on database driven mapping applications. I can think of dozens just in my discipline....

Jeremy
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?