Effects of Ignoring Survey Design Information for Data Reuse

Created 17/10/2025

Updated 17/10/2025

Data is currently being used, and reused, in ecological research at unprecedented rates. To ensure appropriate reuse however, we need to ask the question: “Are aggregated databases currently providing the right information to enable effective and unbiased reuse?” We investigate this question, with a focus on designs that purposefully bias the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those that have unequal inclusion probabilities or are stratified. We perform a simulation experiment by creating datasets with progressively more bias, and examine the resulting statistical estimates. The effect of ignoring the survey design can be profound, with biases of up to 250% when naive analytical methods are used. The bias is not reduced by adding more data. Fortunately, the bias can be mitigated by using an appropriate estimator or an appropriate model. These are only applicable however, when essential information about the survey design is available: the randomisation structure (e.g. inclusion probabilities or stratification), and/or covariates used in the randomisation process. The results suggest that such information must be stored and served with the data to support inference and reuse. Citation: S.D. Foster, J. Vanhatalo, V.M. Trenkel, T. Schulz, E. Lawrence, R. Przeslawski, and G.R. Hosack. 2021. Effects of ignoring survey design information for data reuse. Ecological Applications 31(6): e02360. 10.1002/eap.2360

Files and APIs

Tags

Additional Info

Field Value
Title Effects of Ignoring Survey Design Information for Data Reuse
Language eng
Licence Not Specified
Landing Page https://data.gov.au/data/en/dataset/ae84a9b2-6036-4525-8a83-5fe0611559f8
Contact Point
Geoscience Australia Data
clientservices@ga.gov.au
Reference Period 08/04/2019
Geospatial Coverage
Map data © OpenStreetMap contributors
{
  "coordinates": [
    [
      [
        112.0,
        -44.0
      ],
      [
        154.0,
        -44.0
      ],
      [
        154.0,
        -9.0
      ],
      [
        112.0,
        -9.0
      ],
      [
        112.0,
        -44.0
      ]
    ]
  ],
  "type": "Polygon"
}
Data Portal Geoscience Australia

Data Source

This dataset was originally found on Geoscience Australia "Effects of Ignoring Survey Design Information for Data Reuse". Please visit the source to access the original metadata of the dataset:
https://ecat.ga.gov.au/geonetwork/srv/eng/csw/dataset/effects-of-ignoring-survey-design-information-for-data-reuse