Customer Login  |  

by Kevin Dodds

Recently, CrowdFlower blogged about de-duplicating records in a merged CRM database using crowdsourcing. The article suggested that this kind of data is often too wildly divergent to be de-duped automatically.

At IEI, we’ve found:

  • Most CRM data is similar (first name, last name, email, phone, etc.) and easy to standardize automatically.
  • Exact matches are simple, but are only the first step in potentially dozens of other matching scenarios that resolve “fuzzy” matches automatically.
  • The crowd is best used for de-duplication after all automated de-duping processes have been exhausted.

Identical matches can be marked as resolved right away, with no further work required. Then, the dataset needs to be standardized and normalized, again through automation, which reveals even more identical matches which can be automatically resolved.

A few examples of standardization and normalization results:

Sending non-exact matches to the crowd without performing thorough fuzzy matching comes at a high cost and slows down the entire operation. Compare the automated process to the crowdsourcing process:

The task looping required to get accurate de-duping via crowdsourcing can quickly add up to three or more times what an automated process would cost. Limiting crowdsourcing de-duping efforts saves money that can instead go toward crowd research to provide up-to-date and reliable datapoints, the goal of an effective CRM database.

{ 0 comments }

posted by Shyamali Ghosh on December 16, 2013

Heat maps—data represented graphically with individual values represented by colors—are an increasingly popular way to quickly convey a lot of information. To demonstrate this, we at IEI recently used a set of data on private investments made in the U.S. thus far in 2013 (data courtesy of our friends at Lead411) to come up with this fascinating representation of “where the action is” in American start-ups. As you can see, a great deal of information leaps off the page, evoking questions and encouraging further exploration of the source data.


Click for full-sized map.

Some points that really stand out:

Is Houston really that big of a hotbed of private funding? Apparently fracking and pipelines don’t fund themselves.

New York City is the true center of American innovation! Not just a source of money anymore, NYC startups surpassed the Bay Area’s in 2013. (Relationship Science, which raised a total of just under $90 million in funding, almost did that single-handedly.)

What’s up with Indiana? It’s burning with new investment in a diverse array of industries. Indiana has received $260 billion in capital during the past decade, spread over life sciences, business services, energy, technology, industry, automotive, agriculture, and other fields.

What is important to remember here is how the data underlying the map was processed to make this visualization possible. The under-the-hood heavy lifting involved:

  • harvesting thousands of press releases
  • parsing the data into relevant fields
  • normalizing the data (geographic place data, numerical values)

Everyone loves a heat map, but creating one is a lot less about the flavor of data visualization software used than it is about the hard work and advanced tools used that allow it to be “visualized” in the first place.

Can you imagine the potential your data has to be visualized and quickly analyzed? Contact Information Evolution today and have us evaluate what is involved completely free of charge.

{ 0 comments }

posted by Shyamali Ghosh on December 10, 2013