Standardized Data Collection: Legal Requirements, Guidelines, or Competition?
– Frank Fagan
Imagine that a U.S. bank wishes to develop a predictive model for granting credit to borrowers below the poverty line. With the current U.S. population at 325 million—of which nearly 40 million live in poverty — the universe of available data for making predictions may be limited. Even if a quarter seek loans, and the bank has experience lending to 10% (or 4 million borrowers), the amount of predictive precision required for avoiding bad loans and creating a profitable lending business may be insufficient nonetheless since data about past loans may be inadequate. In other national markets, where the population is larger—in particular the number of those living in poverty—data may be available for developing a sufficiently precise predictive model. That model would obviously be profitable in its country of origin, but for export, the predictive patterns that it identifies at home must also be present abroad. In terms of industrial strategy, developers of predictive models for export could collect and test general stocks of data alongside country-specific ones. In terms of policy, law could nurture low-cost data collection that stimulates the construction of models at home.
But law could go a step further and additionally encourage the development of broadly useful predictive models, especially in national machine-learning-based infrastructure investments. This can be done with substantive data collection requirements in exchange for government funding or tax incentives, or the development and announcement of process-based standards for data collection. Imposing substantive data collection requirements in exchange for funding is efficient inasmuch as the project is beneficial and the additional requirements can be profitably used in other contexts. The imposition of process-based standards entails social cost, but the provision of guidelines may be enough to reap the rewards of standardization when the private benefits from data independence are small. Efficient standards may fail to emerge, however, even with law’s endorsement, in the presence of severe collective action problems.Of course, the danger of endorsement is that the standard itself is inefficient. Competition among jurisdictions—in particular, a national desire to win the global AI race—may be expected to bring about efficient results, but only if big data is big enough within jurisdictions or across the jurisdictions of federated partners.
All of this is consistent with the problem (and general mystery) of choosing between the benefits of competition and economies of scale. Technical data collection standards present the added complexity that lawmakers may be unable to distinguish between efficient and inefficient leapfrogging. In other words, do the presumed economies enabled by standards today outweigh the drag on the potentially beneficial standards of tomorrow? And will mandating standards today eliminate the possibility that future and superior standards will arise? The answers to these questions are perhaps, at this point, still irregular enough to be empirical, and in any case, are left for future work. Today, on the other hand, surely the benefits of standardization must be discounted by an uncertain future. Standards may generate economies of scale, but they simultaneously inhibit competition and its benefits. This is the danger of centralized standards either imposed or announced. Good arguments for economies of scale can easily be made but difficult to believe upon further scrutiny. In other settings, auctions can serve as scrutinizers, though perhaps here, instead of firms bidding for a right to be sole data collector or something similar, piecemeal subsidies for collecting general variables can generate yet more data that works well over time and space, and something short of qualified standards can continue to be left in the hands of innovators.
Read more here.