Top ways to De-duping your data

top ways data deduping

Most systems have duplicate records. A record is duplicated if it is the same person, company or other entity that is found more than once in your system.  Duplicate records are typically created due to:

Human error of adding records without checking if that record already exists.

When uploading a list of records without checking if there are duplicates.

Duplicates can be found between different systems which were not implemented correctly.

Typically, companies choose to dedupe their data if they are migrating to a new system, are planning for a big marketing campaign or they realize the importance of clean data and perform de-duping on a yearly basis. For everyone else, a quick way to check if it is time to de-dupe, is to figure out what percentage of your records are duplicated. You can use Excel’s conditional formatting to highlight duplicate records. For example, you can highlight duplicate values for e-mail field for your contact data or website field for account records. If your number of duplicates exceeds 5% of total number of records it is time to de-dupe. For companies that have never de-duped their data, do not be surprised to see duplicate rate which is greater than 15%.

Now that you have decided to de-duplicate your records you may want to apply an automated de-duping tool to make your life easier. Do it yourself by simply using Excel or hire a data cleaning company that is specialized in de-duping.

Using de-duplication tools can help automate employees and your business rules and automatically de-dupe many of your records. It is a much quicker approach compared to doing it yourself or hiring a data cleaning company. The other advantage is that the cost is usually lower. Here are top de-duplication tools in the marketplace today.

Strategic DB’s De-duping Tool

StrategicDB deduping tool

This tool is unique as it allows you to determine your own rules of identifying duplicates and establish which records are going to survive. All you need is a CSV file containing the records that you need to de-dupe and your own rules. After running the de-duping tool, you will get a CSV file with all of the records having unique duplicate group IDs and masters that are merged and clearly written. You can then use the file to de-dupe any platform or data set.

CRM Fusion’s Demand Tools

It is a tool specifically designed for Sales force. The tool’s greatest advantage is that allows you to merge duplicate records in Sales force without having a need to export your data and merge or delete records after.

Data Ladder

It is a tool, that is similar to the Strategic DB’s Deduping tool since it also works off the CSV files. However, instead of processing the lists in the cloud you would have to download the product. This solution is better suited to B2C data sets.

If you have a simple list to be deduped, you may want to try doing it yourself. Using Excel, you can use conditional formatting to identify duplicate values in your list. For example, you can highlight duplicate Emails or duplicate websites in your list. However, it is difficult to do a more complex duplicate identifier such as partial name, website and address is the same and so on.

If you do decide to do it yourself. Identifying duplicates and selecting which record is going to survive is the first part of the de-duping process. Part two is to implement the de-duping in your system. Prior to any system changes, make sure that your system is backed up. Next step is to move any data that is needed from the record that is going to be deleted. Your final step is to delete the records that are duplicated.

Finally, if you have complex business rules or large data sets you may want to hire a professional data cleansing company that can save you time, money and ensure the highest quality of deduping.

Leave a Comment

Your email address will not be published.

You may also like