Terrestrial Precipitation:
1900-2014 Gridded Monthly Time Series

(Version 4.01)

interpolated and documented by

Kenji Matsuura and Cort J. Willmott
[with support from NASA's Innovation in Climate Education (NICE) Program]


For additional information concerning this archive,
please contact us at:

Department of Geography
University
of Delaware
Newark, DE 19716
(302) 831-2294

or

kenjisan@udel.edu


Archive (Version 4.01) released in May, 2015


STATION DATA SOURCES:

Station data, monthly-total raingage-measured precipitation (P, mm), were compiled from several updated sources including a recent version of the Global Historical Climatology Network dataset GHCN2); a version of the Daily Global Historical Climatology Network (GHCN-Daily) (Menne et al., 2012); an Atmospheric Environment Service/Environment Canada archive; data from the Hydrometeorological Institute in St. Petersburg, Russia (courtesy of Nikolay Shiklomanov); GC-Net data (Steffen et al., 1996); Greenland station records from the Automatic Weather Station Project (courtesy of Charles R. Stearns at the University of Wisconsin-Madison); daily data for India1 from the National Center for Atmospheric Research (NCAR); Sharon Nicholsons archive of African precipitation data (2001)2; Webber and Willmotts (1998) South American monthly precipitation station records3 (for notes 1, 2, and 3, see the documentation README file); and daily records from the Global Surface Summary of Day (GSOD). Station climatologies from Legates and Willmotts (1990) unadjusted (for raingage undercatch) archive also were used as a part of the background climatology (see Spatial Interpolation below). Station P values were not adjusted to reduce raingage undercatch bias.

Monthly values were derived from those archives that contained daily observations. From data within each daily station record, a monthly total P was calculated when the number of missing days within a month was no more than five. When a month had more than five missing daily values, its monthly total P value was considered to be missing. We noticed that some daily precipitation observations within the GSOD archive seemed to be unrealistic. To mitigate deleterious influences from such observations, we first applied filters (similar to ones described by Durre et al., 2010) to the original GSOD daily values. These filters seemed to remove many unrealistic values as well as duplicated months and years within the records.

After filtering the original daily GSOD records and then calculating monthly GSOD values, our derived monthly GSOD P values were compared with the other monthly station records. First, using all monthly records (other than our GSOD-based records), composite monthly-total-P times series were created. Stations (with the same geographic coordinates) that appeared within more than one of the various archives were assumed to be the same station and their records were merged into a single station time series. If multiple stations were merged for a single location, the median of each months competing station values was used as the merged monthly P value. Second, from these non-GSOD composite records, a monthly P value was estimated at (interpolated to) each GSOD station location for each month. Each interpolated non-GSOD monthly time series then was compared to the corresponding GSOD observations at each GSOD station. Cross-validation errors were also estimated at each GSOD station location. In a third step, using a three-month window, all none-zero GSOD monthly values (over the entire observation period) were sorted in ascending order, and the differences between consecutive sorted values were calculated. The same procedure was applied to the non-GSOD base-line values over the same time period, and the maximum difference between two consecutive values was determined. Also, the maximum cross-validation error during the same time period was determined. When a difference in the sorted consecutive GSOD values became much larger than a threshold value (that consisted of the maximum difference and the maximum cross-validation error in the base-line estimated series), the GSOD values larger than the value that produced the difference were ignored; because of the likelihood that they were erroneous observations. Please note that this procedure was subjected to a one-tailed test starting from the smallest none-zero value.

Although the filtering procedures mentioned above relate to non-zero monthly observations in the GSOD archive, we also noticed that there are some unusual zero values in the GSOD dataset; especially in terms of the frequency of occurrence. An unrealistically long string of zero values, for instance, was observed. To assess monthly zero values and strings, zero-value GSOD observations were compared to those corresponding (in time) base-line estimates. When the base-line value was not zero and above a certain threshold while the GSOD value was zero, the corresponding GSOD value was marked as missing because the GSOD value is unlikely to be zero.

After processing the GSOD records, composite monthly P station records were again created by merging all monthly records, including the GSOD series. One issue that we encountered was, within the different archives, the formats of stations geographic coordinates were dissimilar. To address related precision issues, station records were merged only when the estimated distance between two stations (or the centroids of two station clusters) was less than 2.5 km. New composite station-record, geographic coordinates were estimated after the merging. Once again, if there were multiple station P values available at a location, the median value was used. Within the resultant station-record archive (over the period 1900-2014), the number of available station and merged-station P values for a month ranges from about 7,500 to about 40,500.

SPATIAL INTERPOLATION:

Station values of monthly total raingage-measured precipitation (P) were interpolated to a 0.5 degree by 0.5 degree latitude/longitude grid, where the grid nodes are centered on the 0.25 degree.  Climatologically aided interpolation (CAI) (Willmott and Robeson, 1995) was used to estimate our monthly total precipitation fields. By using a background climatology based on a relatively dense network of stations, CAI can increase the accuracy of spatially interpolated time series of monthly climate variables. For the background climatology used here, two station climatologies were merged. The first was calculated at those of our precipitation time-series stations which had at least ten years of observations for each month. The second was the monthly station P (raw raingage) climatology of Legates and Willmott (1990). Only those Legates and Willmott stations which were not collocated with our own climatology were included in the background climatology for CAI. A monthly P value at each time-series station was differenced from our climatologically averaged P for that month which was available at or was interpolated spatially to the time-series station location. Traditional interpolation (Willmott et al., 1985) then was performed on the monthly station differences to obtain a gridded difference field. Finally, each gridded monthly difference field was added to the gridded estimates of the month’s climatology at the corresponding set of grid points. 

Traditional interpolation was accomplished with the spherical version of Shepard’s algorithm, which employs an enhanced distance-weighting method (Shepard, 1968; Willmott et al., 1985). The number of nearby stations that influenced a grid-node estimate was increased to an average of 20, from an average of 7 in earlier applications. This resulted in smaller cross-validation errors (see below) and visually more realistic precipitation fields. A more robust neighbor finding algorithm, based on spherical distance, also was used.
 
SPATIAL CROSS VALIDATION:

To indicate (roughly) the spatial interpolation errors, station-by-station cross validation was employed (Willmott and Matsuura, 1995). One station was removed at a time, and the precipitation value was then interpolated to the removed station location from the surrounding nearby stations. The difference between the real station value and the interpolated value is a local estimate of interpolation error. After each station cross validation was made, the removed station was put back into the network. To reduce network biases on cross-validation results, absolute values of the errors at the stations were interpolated to the same spatial resolution as the precipitation field.

ARCHIVE STRUCTURE:

precip_2014.tar.gz:

Monthly total precipitation for the years 1900-2014 interpolated to a 0.5 by 0.5 degree grid resolution (centered on 0.25 degree). The format of each record is:

 

Field

Columns

Variable

Fortran Format

1

1 - 8

Longitude (decimal degrees)

F8.3

2

9 - 16

Latitude (decimal degrees)

F8.3

3-14

17 - 112

Monthly Total Precipitation (mm)

12F8.1

 

precip_cv2014.tar.gz:

Cross-validation errors (absolute values) associated with precipitation for the years 1900-2014 interpolated to a 0.5 by 0.5 degree grid resolution. The format of each record is:

 

Field

Columns

Variable

Fortran Format

1

1 - 8

Longitude (decimal degrees)

F8.3

2

9 - 16

Latitude (decimal degrees)

F8.3

3-14

17 - 112

Cross-validation errors (absolute values) of Monthly Total Precipitation (mm)

12F8.1

SELECTED REFERENCES:

Durre, I., M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose (2010). Comprehensive Automated Quality Assurance of Daily Surface Observations. Journal of Applied Meteorology and Climatology, 49, 1615-1633.

Lawrimore, J. H, M. J. Menne, B. E. Gleason, C. N. Williams, D. B. Wuertz, R. S. Vose, and J. Rennie (2011). An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3, J. Geophys. Res., 116, D19121, doi:10.1029/2011JD016187.

Legates, D. R. and C. J. Willmott (1990).  Mean seasonal and spatial variability in gauge-corrected, global precipitation.  International Journal of Climatology, 10, 111-127.

Menne, M.J., I. Durre, R.S. Vose, B.E. Gleason, and T.G. Houston, 2012. An overview of the Global Historical Climatology Network-Daily Database. Journal of Atmospheric and Oceanic Technology, 29, 897-910, doi:10.1175/JTECH-D-11-00103.1.

Peterson, T. C., R. S. Vose R. Schmoyer and V. Razuvaëv (1998). Global Historicl Climatology Network (GHCN) Quality Control of Monthly Temperature Data. International Journal of Climatology, 18, 1169-1179.

Peterson, T. C. and R. S. Vose (1997). An overview of the Global Historical Climatology Network temperature database. Bulletin of the American Meteorological Society, 78, 2837-2849.

Shepard, D. (1968). A two-dimensional interpolation function for irregularly-spaced data. Proceedings, 1968 ACM National Conference, 517-523.

Steffen, K., J. E. Box, and W. Abdalati (1996).
Greenland Climate Network: GC-Net. Colbeck, S. C. Ed. CRREL 96-27 Special Report on Glaciers, Ice Sheets and Volcanoes, trib. to M. Meier, 98-103.

 

Willmott, C. J. and K. Matsuura (1995). Smart interpolation of annually averaged air temperature in the United States. Journal of Applied Meteorology, 34, 2577-2586.

 

Willmott, C.J. and S.M. Robeson (1995).  Climatologically aided interpolation (CAI) of terrestrial air temperature. International Journal of Climatology, 15(2), 221-229.


Willmott, C. J., C. M. Rowe and W. D. Philpot (1985). Small-scale climate maps: a sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring. American Cartographer, 12, 5-16.