### Monday, March 06, 2006

## Summary Statistics

One of my most used web pages is a page that summarizes a numeric column for min, max, standard deviation, count, and produces a 10 class equal binned histogram. One way to do this within Oracle is to execute two queries. The first query gives us the minimum and maximum value to establish how big our binned histogram buckets need to be. The second query counts the number of rows that fall within each bucket.

Take the following interactive map from the Kansas Well Top Stratigraphy Viewer:

The map above is displaying the depth of the Chase formation wells in Kansas relative to sea level. The statistical summary of that dataset can be viewed by selecting the "View Current Viewing Stats" link.

The first query gives us the min, max, standard deviation, and count.

All we are really doing is letting the database summarize (avg, min, max, etc.) a numeric column (WELL_TOP_SEALEVEL) based on the where criteria (the formation is Chase Group, the well value exists, the well ground elevation is above 0, and the well is within the current viewing extent).

Once we know the min and max value we can produce a query that counts the number of rows that fall within an equal 10 class range.

Here we are using the case function on a row by row basis.

If the value is between x and y then give it a value of 1, otherwise give it a value of 0. When we sum up that count for all rows we end up with a number of rows where the value is within the pre-defined bin.

Cheers,

Jeremy

Take the following interactive map from the Kansas Well Top Stratigraphy Viewer:

The map above is displaying the depth of the Chase formation wells in Kansas relative to sea level. The statistical summary of that dataset can be viewed by selecting the "View Current Viewing Stats" link.

The first query gives us the min, max, standard deviation, and count.

SELECT avg(WELL_TOP_SEALEVEL) average,

min(WELL_TOP_SEALEVEL) minimum,

max(WELL_TOP_SEALEVEL) maximum,

stddev(WELL_TOP_SEALEVEL) standard_dev,

count(WELL_TOP_SEALEVEL) count

FROM PLSS.strat_well_top_1MAR2006

WHERE FORMATION_NAME = 'Chase Group'

and WELL_TOP is not null

and WELL_TOP <> 9999

and GROUND_ELEVATION > 0

and LONGITUDE >= -102.25798

and LONGITUDE <= -94.201246 and LATITUDE >= 34.4665145

and LATITUDE <= 42.5232570

All we are really doing is letting the database summarize (avg, min, max, etc.) a numeric column (WELL_TOP_SEALEVEL) based on the where criteria (the formation is Chase Group, the well value exists, the well ground elevation is above 0, and the well is within the current viewing extent).

Once we know the min and max value we can produce a query that counts the number of rows that fall within an equal 10 class range.

SELECT avg(WELL_TOP_SEALEVEL) average,

stddev(WELL_TOP_SEALEVEL) stddevvalue, '1' C0

,sum(case

when WELL_TOP_SEALEVEL between -643 AND -410.7

then 1

else 0

end) C1

,sum(case

when WELL_TOP_SEALEVEL between -410.7 AND -178.4

then 1

else 0

end) C2

,sum(case

when WELL_TOP_SEALEVEL between -178.4 AND 53.9

then 1

else 0

end) C3

,sum(case

when WELL_TOP_SEALEVEL between 53.9 AND 286.2

then 1

else 0

end) C4

,sum(case

when WELL_TOP_SEALEVEL between 286.2 AND 518.5

then 1

else 0

end) C5

,sum(case

when WELL_TOP_SEALEVEL between 518.5 AND 750.8

then 1

else 0

end) C6

,sum(case

when WELL_TOP_SEALEVEL between 750.8 AND 983.1

then 1

else 0

end) C7

,sum(case

when WELL_TOP_SEALEVEL between 983.1 AND 1215.4

then 1

else 0

end) C8

,sum(case

when WELL_TOP_SEALEVEL between 1215.4 AND 1447.7

then 1

else 0

end) C9

,sum(case

when WELL_TOP_SEALEVEL between 1447.7 AND 1680

then 1

else 0

end) C10

FROM PLSS.strat_well_top_1MAR2006

WHERE FORMATION_NAME = 'Chase Group'

AND WELL_TOP is not null

AND WELL_TOP <> 9999

Here we are using the case function on a row by row basis.

sum(case

when WELL_TOP_SEALEVEL between 1447.7 AND 1680

then 1

else 0

end)

If the value is between x and y then give it a value of 1, otherwise give it a value of 0. When we sum up that count for all rows we end up with a number of rows where the value is within the pre-defined bin.

Cheers,

Jeremy