Overcoming the Challenges of Cleansing Data to Import into a LIMS

Imagine you need to import a data set into your Laboratory Information Management System (LIMS). Perhaps you’re taking on a new study and need the historic data or are merging labs with another team. Whatever the cause the result is the same. You’ve got a spreadsheet of data of unknown quality that needs to be “knocked into shape” before import. What do you do? Where do you even start when it comes to cleansing data to import into a LIMS?

1. Understand what each column is supposed to represent

You need to properly understand what the data means. Whether you’ve got Yes/No, Y/N or 1/0, you need to know what this meant to the team entering the data originally. If the fields are, say, consent options then does “Yes” mean the donor opted in or opted out? It’s very important to understand the meaning. And what do you do with an empty value?

2. Agree your terminology

Choose the list of valid options. The technical term is ontology. These are often already set up in your LIMS, although they’re not always complete. You may be expanding the scope of your data with, say, a new study so new picklist options need to be agreed. Ensure there aren’t any ambiguous values. “Whole blood”, “Blood, not centrifuged” or “Blood” may mean the same thing but you only want one term for each item/concept if you’re to make any sense of validation and reporting. Then you need to carefully check that every value in that column complies with the allowed values list. This is sometimes called “vertical checks”.

3. Find and cleanse the gaps

A spreadsheet can be misleading with empty values. Not all blanks are equal. An empty text field is not the same as a null value or a zero so standardising is a good idea. If any default meanings are implied in the previous system then you’ll need to substitute those for the gaps to be consistent with your new system e.g. Does an empty Date of Death value mean that the donor was alive at donation or simply that their status was unknown to the previous system? This is very important for processing as well as legal compliance!

4. Sanity check

Talk to the previous data owners to understand how the fields relate to each other. You can then do spot-checks to make sure that everything makes sense. If it doesn’t then this might highlight a fundamental misunderstanding. This is sometimes called “horizontal checks”. I’ve seen data sets that appeared valid (vertical checks) but that had samples that the data said were taken years before the donors were even born!

5. Document what you do

The cleansing you carry out will almost certainly need to be done more than once and so you need to carefully annotate all changes you carry out (not the individual rows but what the tasks were). It’s also a very useful reference if there’s a question about the data after import.

A final thought about overcoming the challenges of cleansing data for importing into a LIMS

There’s a lot of scope for misunderstanding and invalid assumptions to ruin the value of an imported data set. Taking some simple steps can maximise the quality of your data and therefore its value before importing.

How to Guides and Brochures

Discover more with our How to guides, white papers and brochures sharing practical advice on some of the hot topics and key challenges in life-sciences and scientific research software including our 7 Step Guide to Data Migration - Helping you tackle the challenges of data transfer

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Overcoming the Challenges of Cleansing Data to Import into a LIMS

1. Understand what each column is supposed to represent

2. Agree your terminology

3. Find and cleanse the gaps

4. Sanity check

5. Document what you do

A final thought about overcoming the challenges of cleansing data for importing into a LIMS

Related posts:

‘Hello Audit!’ - A proactive approach to improving lab and sample quality

Unlocking the Benefits of Achiever LIMS

Not all LIMS data migration services are created equal

About The Author