by Matt Manning
What happens when you have more metadata for a database record than there are fields of data in the original record itself? If you’re like almost every other company in the world you don’t have to imagine this scenario. It’s a reality you’ve lived with for quite some time.
Every customer record in a CRM system, for instance, links in some form or fashion to data on every single interaction the customer has had with your company’s web site, Twitter feed, etc. That metadata about the customer’s behavior is massively larger than the “real” record in your database, which might already include every phone call and email interaction with the customer. In this case a “cookie” on a web site or an email address in DM campaign is the link to the vast reservoirs of behavioral data in associated metadata silos.
Behavioral action is not the only place we see the potential for generating vast amounts of metadata. Image files can also have all kinds of contextual information appended. This includes data on the people, products and objects in the image; thematic information; micro-second timestamps; source or author information; texture, color and more. There are several image search tools on the market— including an Austin start-up called Clarify—specifically diving in to this thorny arena.
If that isn’t intriguing enough, another recent example from MIT highlights how a short digital video file of an unmoving object can have valuable, deep metadata. MIT researchers used high-speed digital photography to document virtually static objects. (Sound waves and subtle air currents cause micromovements in objects that appear unmoving to humans.) The researchers gathered enough metadata on the micromovements of an object to accurately project its reaction to outside forces. The video clip (the data object) was appended with such an enormous amount of micrometre-level GPS data on the object’s imperceptible micromovements as to allow three-dimensional rendering of the object and projections on how the object would react to outside forces of varying strengths, intensities, and directions. This kind of metadata makes possible all kinds of predictive modeling about the behavior of inanimate objects while enabling holographic rendering and 3-dimensional printing.
So, for all the talk of central data repositories it seems that the sheer unfathomable volume of metadata generated means that we’re actually tending toward more and more silos of specific types of metadata that can be retrieved, unified, and analyzed quickly to inform future actions. What insights will we all see when we have our fingers on the pulse of all our customers and can use “lakes” of metadata to model future behavior of people, products, and organizations? This is hard to predict. We can say that we are likely entering a period where the “connected dots” between big data (the first two dimensions), micro-level GPS data (the third dimension), and predictive analytics (the fourth dimension) will lead us to some unexpected places in the not-too-distant future.
posted by Shyamali Ghosh on May 14, 2015
At IEI, we’re intimately familiar with the “demand side” of public information. It’s rare, though, that we get a glimpse of the issues facing public sector managers on the front lines of supplying that information. That’s just what we got when we were asked to participate in the City of Austin‘s Open Data Initiative last week.
The catalyst for the launch of Austin’s city data portal was the 2008 Federal Data Initiative. Its mandate: incentivize cities to make at least 25 datasets open to the public. The intent was to free up the estimated $3 trillion in economic activity nationwide thought to be “lost” because government data was hard to access. The initial assortment of datasets first made available through this initiative laid the foundation for the current, more ambitious stage where relevant departments have to inventory all their available data and then offer the most potentially interesting and valuable of their datasets online in an easily accessible format.
This more difficult stage means a lot more work, but it holds several tantalizing prospects.
- Shared Responsibility. Turn passive complaints into opportunities for citizens to help solve municipal issues. Increased analysis will uncover some embarrassing metrics, but these data are also invaluable in finding the areas that need attention.
- Internal Efficiency. As time-consuming backlogs of public information requests evaporate, departments have resources freed up for other more important tasks. They can also preemptively create detailed analyses highlighting their notable successes to emphasize the value of their efforts.
- Better Decisions. Only fast, easy access to data needed for making those complex municipal decisions can enable the move toward true “data-driven” decision-making.
Other opportunities include:
- Using the crowd. The City’s new “311” app allows people to submit text complaints and upload pictures (of potholes, lost dogs, downed power lines, etc.). Location data helps route the complaint to the appropriate city agency for resolution. The most popular topics?
- Loose dogs (Animal Control)
- Broken street lights (Austin Energy)
- Loud music (Police)
- Analyzing aggregate data. Both City staff and citizen activists can use the data for all kinds of analyses. For instance:
- Plotting the home zip codes of City police officers (one way to gauge community “engagement” of the department–do they live where they work?)
- Calculating the average wages and headcounts of municipal employees over time (to compare Austin metrics to other similar cities)
- Analyzing how long it takes a department to resolve a problem after receiving a complaint (e.g., fixing a pothole, getting a building permit, etc.)
- Making it easier to find current City employees. Online departmental staff listings are updated intermittently. The HR department spends a lot of effort fulfilling public information requests for simple things like:
- Employment dates (for background checks)
- Salaries (for lawyers of divorcing spouses)
- Employees by areas of responsibility (for companies marketing services to them)
- Allowing app developers to integrate diverse public data into the services they offer.
On a final note: IBM is supporting the City of Austin’s Open Data Initiative and we at IEI will be very interested in seeing IBM’s Watson Data Quality Analytics in action with active municipal data clean-up projects. This new Watson-driven service was designed to be used to highlight and resolve anomalous data issues quickly and from what we’ve seen already this could be a real game-changer.
We look forward to being a part of the City’s exciting initiative and seeing how this all plays out!
posted by Shyamali Ghosh on May 4, 2015