Customer Login  |  

by Matt Manning

Microsoft didn’t expect the web scrapers to fight back. It seemed so obvious to the massive multinational firm: We own the LinkedIn content that we paid billions of dollars to acquire. What Microsoft didn’t realize (and not for the first time*) was that the people who created the content were its actual owners.

The implications of the recent HiQ v. LinkedIn decision are hugely important and rival the impact of the 1994 Feist decision that confirmed that facts couldn’t be copyrighted. Microsoft was ordered to remove “any technology preventing hiQ from accessing public profiles” thus reversing a string of legal and technical actions that had been taken by the firm to build a fortress around their resume’ database.

These actions displayed a stunning misreading of both copyright law and of the business opportunities that have been at LinkedIn’s door for many years now. Instead of spending time and resources shoring up a one-dimensional recruitment database product, LinkedIn should have been creating a series of easy-to-use APIs that would allow LinkedIn data to be embedded into thousands of applications for free with a minimal cost associated with using certain specific tools.

Microsoft has certainly pivoted before so they may still be able to adjust their LinkedIn business model to capture the billions in potential revenues they are now foregoing. In the mean time, the web scrapers of the world will be the ones helping others to monetize LinkedIn data.

* In 1995 I attended Microsoft’s MSN launch party in Redmond where the country’s largest content providers were told how they were going to give Microsoft their content for free because of all the exposure the content owners would get. Every single person who attended that confab walked away disliking Microsoft’s arrogance and wondering how they so thoroughly misunderstood the way the information business actually worked.

{ 0 comments }

posted by Shyamali Ghosh on August 21, 2017

by Matt Manning

Information services have been steadily raising the bar on the functionality of software tools bundled with their data, making it easier for their customers to get value from their products. As these tools get more and more robust—and as customer expectations for software performance get higher and higher— these tools also shine a bright light on any gaps and anomalies within data sets. Without urgent efforts to fill these gaps in, these data “dry holes” can torpedo even the best information product.

How does this happen? Well, it’s a corollary of the “garbage in, garbage out” rule. Extending the functionality of a data-based service means that any new data—even if it’s a simple ratio of existing datapoints—needs to be as close to 100% accurate as possible. So when adding rankings, ratings, maps, or data visualizations based on ratios of underlying datapoints to a service, that data must be 100% populated and, in an ideal world, accurate. This means that null values for either the numerator or denominator in a ratio will result in a useless ratio, and one that is likely to float bad records to the top of queries run by customers. This can negate the positive value that an investment in improving software would usually bring, so much so that it could affect the user’s perceived value for the service and, eventually, renewal rates.

On the positive side, this vulnerability is easy to fix by filling in those blanks. Simply identify the missing data points and go gather that data. Of course, there’s usually a good reason for data being absent, so any effort to fill in the blanks demands an approach beyond a simple data acquisition effort.

Here are some of the related considerations that firms typically take to address “dry holes” in their databases:

  • Ensure data entry interfaces mandate that required fields must be populated before submission.
  • Include field-level validation in those interfaces to flag values that are anomalous (too high, too low, unexpected value types).
  • Resolve the thorniest anomalous values with robust exception handling:
    • Enter estimates based on comparables (i.e., a divergence of more than X% from a ratio for a similar firm triggers an escalation).
    • Add placeholder values, assigned by a subject matter expert, to reflect your firm’s best guess at an estimated value (like the annual revenues of a closely-held private company or the physical location of a company that uses a PO box address).
    • Structure the underlying database to include metadata on the provenance of data at the field level for required fields. With this approach, the end user can drill down to examine the source and age of any single critical data point, allowing for better interpretation of the data.

Other app-related challenges crop up when information services expand to include media such as images, maps or charts requiring high-quality imagery, audio, and video source files. While this is more a display than a functionality issue, it does affect the user experience. Information services need to ensure that multimedia content is acquired for all records, is accurate (the correct subject is displayed for a given record), consistent (images are all shot from the same angle), and high-quality (clear). This often requires an additional “fill in the blanks” effort to acquire and improve imagery.

Finally, adding a “see something, say something” prompt to your app asking end users to report errors, omissions, and anomalies is another simple way to convey a firm’s commitment to stamping out missing and anomalous data. Don’t forget to do more than just display a thank you message when an end user takes the time to assist you. Drop them a line after you’ve corrected the issue and perhaps reward them with a free special report or other sign of appreciation. This may be the single cheapest and most effective way to both improve retention rates and make your product the best that it can be.

{ 0 comments }

posted by Shyamali Ghosh on August 8, 2017