Monday, March 06, 2006

Summary Statistics

One of my most used web pages is a page that summarizes a numeric column for min, max, standard deviation, count, and produces a 10 class equal binned histogram. One way to do this within Oracle is to execute two queries. The first query gives us the minimum and maximum value to establish how big our binned histogram buckets need to be. The second query counts the number of rows that fall within each bucket.

Take the following interactive map from the Kansas Well Top Stratigraphy Viewer:



The map above is displaying the depth of the Chase formation wells in Kansas relative to sea level. The statistical summary of that dataset can be viewed by selecting the "View Current Viewing Stats" link.



The first query gives us the min, max, standard deviation, and count.

SELECT avg(WELL_TOP_SEALEVEL) average,
min(WELL_TOP_SEALEVEL) minimum,
max(WELL_TOP_SEALEVEL) maximum,
stddev(WELL_TOP_SEALEVEL) standard_dev,
count(WELL_TOP_SEALEVEL) count
FROM PLSS.strat_well_top_1MAR2006
WHERE FORMATION_NAME = 'Chase Group'
and WELL_TOP is not null
and WELL_TOP <> 9999
and GROUND_ELEVATION > 0
and LONGITUDE >= -102.25798
and LONGITUDE <= -94.201246 and LATITUDE >= 34.4665145
and LATITUDE <= 42.5232570


All we are really doing is letting the database summarize (avg, min, max, etc.) a numeric column (WELL_TOP_SEALEVEL) based on the where criteria (the formation is Chase Group, the well value exists, the well ground elevation is above 0, and the well is within the current viewing extent).

Once we know the min and max value we can produce a query that counts the number of rows that fall within an equal 10 class range.


SELECT avg(WELL_TOP_SEALEVEL) average,
stddev(WELL_TOP_SEALEVEL) stddevvalue, '1' C0
,sum(case
when WELL_TOP_SEALEVEL between -643 AND -410.7
then 1
else 0
end) C1
,sum(case
when WELL_TOP_SEALEVEL between -410.7 AND -178.4
then 1
else 0
end) C2
,sum(case
when WELL_TOP_SEALEVEL between -178.4 AND 53.9
then 1
else 0
end) C3
,sum(case
when WELL_TOP_SEALEVEL between 53.9 AND 286.2
then 1
else 0
end) C4
,sum(case
when WELL_TOP_SEALEVEL between 286.2 AND 518.5
then 1
else 0
end) C5
,sum(case
when WELL_TOP_SEALEVEL between 518.5 AND 750.8
then 1
else 0
end) C6
,sum(case
when WELL_TOP_SEALEVEL between 750.8 AND 983.1
then 1
else 0
end) C7
,sum(case
when WELL_TOP_SEALEVEL between 983.1 AND 1215.4
then 1
else 0
end) C8
,sum(case
when WELL_TOP_SEALEVEL between 1215.4 AND 1447.7
then 1
else 0
end) C9
,sum(case
when WELL_TOP_SEALEVEL between 1447.7 AND 1680
then 1
else 0
end) C10
FROM PLSS.strat_well_top_1MAR2006
WHERE FORMATION_NAME = 'Chase Group'
AND WELL_TOP is not null
AND WELL_TOP <> 9999


Here we are using the case function on a row by row basis.

sum(case
when WELL_TOP_SEALEVEL between 1447.7 AND 1680
then 1
else 0
end)

If the value is between x and y then give it a value of 1, otherwise give it a value of 0. When we sum up that count for all rows we end up with a number of rows where the value is within the pre-defined bin.

Cheers,

Jeremy

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?