Deciding What Data to Collect
Many organizations have recognized that data should be viewed as an asset. Before acquiring new data, it is important to establish a clear statement of how the data will be used and what value it is expected to provide.
Deciding what data to collect involves identifying information needs, estimating the full costs of obtaining and managing new data and keeping it up to date, and then determining whether the cost is justified. Just as agencies don’t have unlimited resources to repair and replace their assets, there are also limitations on resources for data collection and management.
A 2007 World Bank Study summarized three guiding principles for deciding what data to collect:
- Collect only the data you need;
- Collect data to the lowest level of detail sufficient to make appropriate decisions; and
- Collect data only when they are needed.
Chapter 6 can be used to help identify the information needed to track the state of the assets and investments to maintain and improve them. The basic questions one needs to answer to identify needed data are:
- What decisions do we need to make and what questions do we need to answer that require asset data? Typically, an organization needs to be able to answer questions including but not limited to its asset inventory, the conditions and performance of the inventory, and how resources are being spent on its assets. Also, an organization needs to determine what work is needed and how much that work will cost.
- What specific data items are required or desired? Next, one must identify the data required to meet the established information needs. There may be other data items that are not strictly required, but that may be useful if collected in conjunction with the required data. For instance, answering questions and making decisions regarding pavement an organization would typically want to have an inventory of existing pavement, details on paving materials used, and details on current conditions. Additional information on treatment history or substructure conditions might not be strictly required, but if available could enhance the decision-making process.
It is also important to incorporate standard data elements for location and asset identification into requirements, ensuring consistency with other asset data in the agency.
- What value will each data item provide? It is important to distinguish “nice to have” items from those that will clearly add significant value. The cost of collecting and maintaining a data element should be compared with the potential cost savings from improved decisions to be made based on the element. Cost savings may be due to asset life extension, improved safety, reduced travel time, or internal agency efficiencies. In addition, proxy measures for information value can be considered such as the number and type of anticipated users, and the number and type of agency business processes to be impacted.
- What level of detail is required in the data? Level of detail is an issue for all assets, but is particularly an issue for linear assets such as pavement, where one may decide to capture data at any level of detail. For instance, to comply with Federal reporting requirements for pavement condition a state must collect distress data at 1/10 mile intervals for one lane of a road (typically the outside line in the predominant direction). For other applications it may be necessary to collect data for additional lanes, or at some other interval.
- What level of accuracy is needed? The degree of accuracy in the data may have a significant impact on the data collection cost and required update frequency. Ultimately the degree of accuracy required in the data is a function of how the data are used. For instance, for estimating the clearances under the bridge for the purpose of performing a bridge inspection it may be sufficient to estimate the clearance at lowest point to the nearest inch using video imagery. However, more accurate data may be required when routing an oversize vehicle or planning work for a bridge or a roadway underneath it. If a high degree of accuracy is not required it may be feasible to use sampling strategies to estimate overall conditions from data collected on a subset of assets.
- How often should data be updated? Is the data collection a one-time effort, or will the data need to be updated over time? If data will need to be updated should the updates occur annually, over a period of multiple years, or as work is performed on an asset?
Look for ways to "collect once, use multiple times" by leveraging existing data and planning data collection efforts to capture information about multiple assets.
Table 7.2 below illustrates examples of data collection strategies that might address different information needs.
7.2 Example Data Collection Strategies
|Example Asset(s)||Type of Information||Example Decisions||Example Data Collection Strategies|
|Pavement Markings||Total asset quantity by type, district, and corridor or subnetwork||Budgeting for assets maintained cyclically||Estimation based on sampling
Full inventory every 3-5 years with interim updates based on new asset installation
|Roadside Signs||Inventory of individual assets – location and type||Work planning and scheduling for assets maintained cyclically|
|Full inventory every 3-5 years with interim updates based on new asset installation|
|Guardrail||Inventory + General Condition (e.g. pass/fail or good-fair-poor)||Work planning and scheduling for assets maintained based on condition||Inventory and condition assessment every 2-3 years
Inventory and continuous monitoring (e.g. from maintenance crews or automated detection)
|Bridges||Inventory + Detailed Condition||Treatment optimization for major, long life cycle assets||Inventory and condition assessment every 1-2 years + continuous monitoring (e.g. strain gages on bridges)|
Once a general approach has been established, more detailed planning for what data elements to collect is needed. Prior to selecting data elements, identify the intended users and uses for the data, keeping in mind that there may be several different uses for a given data set. Identify some specific scenarios describing people who will use the information, and then validate these scenarios by involving internal stakeholders.
One common pitfall in identifying information needs is failing to distinguish requirements for network level and project level data. While advances in data collection technology make it feasible to collect highly detailed and accurate information, it is not generally cost-effective to gather and maintain the level of information required for project design for an entire network of assets.
A second pitfall is failing to consider the ongoing costs of updating data. The data update cycle can have a dramatic impact on data maintenance costs. Update cycles should be based both on business needs for data currency and how frequently information is likely to change. For example, asset inventory data is relatively static, but condition data may change on a year-to-year basis.
A third common pitfall is taking an asset-by-asset approach rather than a systems approach in planning for both asset data collection as well as downstream management of asset information.
Even when there is a strong business case for data collection, it is sometimes necessary to prioritize what data are collected given budget and staffing constraints. Some agencies do this by establishing different “tiers” of assets. For example:
- Tier 1: Assets with high replacement values and substantial potential cost savings from life cycle management (such as pavements and bridges)
- Tier 2: Assets that must be inventoried and assessed to meet legal obligations (such as ADA ramps, stormwater management features)
- Tier 3: Assets with high to moderate likelihood and consequences of failure (such as traffic signals, unstable slopes, high mast lighting and sign structures)
- Tier 4: Other assets that would benefit from a managed approach to budgeting and work planning (such as roadside signs, pipes and drains)
While updating data can be expensive, various strategies are available for combining data collection activities to reduce the incremental cost of collecting additional data. For instance, one approach to collecting data on traffic signal systems is to update the data when personnel perform routine maintenance work. Also, in some cases data can be extracted from a video log captured as part of the pavement data collection process.
Given limited resources for data collection, it may be helpful to formally assess the return on investment from data collection or prioritize competing data collection initiatives. A formal assessment may be of particular value when considering whether the additional benefits from collecting additional data using a new approach justify the data collection cost. NCHRP Report 866 details the steps for calculating the return on investment (ROI) from asset management system and process improvements, including asset data collection initiatives.
- Tier 1: Assets with high replacement values and substantial potential cost savings from life cycle management (e.g. pavements and bridges)
- Tier 2: Assets that must be inventoried and assessed to meet legal obligations (e.g. ADA ramps, stormwater management features)
- Tier 3: Assets with high to moderate likelihood and consequences of failure (e.g. traffic signals, unstable slopes, high mast lighting, sign structures)
- Tier 4: Other assets that would benefit from a managed approach to budgeting and work planning (e.g roadside signs, pipes and drains)