5 Tips on Cleansing External Data Before Importing into a LIMS
Mar 18

5 Tips on Cleansing External Data Before Importing into a LIMS

You have a data set and you know what your target LIMS system can import. So, how do you turn one into the other? It can be a real challenge to manipulate data files and half the problem is understanding the assumptions and implicit rules in the original. We briefly covered those in an earlier blog (Overcoming the Challenges of Cleansing Data to Import into a LIMS) so let’s think about the specifics.

1. Understand fully what is required

Understand the required resulting format fully so that you have a clear target towards which you can work. And don’t try to use one tool for every task. Excel is great for many things but not all. It’s not good, for example, at differentiating and processing empty text values mixed with null values. A free utility such as Notepad++ is much better for that but can’t do pivot tables. Use the right tool for the job.

2. Data types

Be clear about whether a particular field in your new system is expecting just a date or a date and time. SQL databases tend to use the same underlying data type and reformat the output. Input, however, is much fussier. Be consistent. If one column needs a time element on a field then make sure all entries for that column have times, even if they’re mostly “00:00”. And watch out for the lengths of text fields. For each column check the value with the maximum length (an Excel macro is good for this) and compare that with the maximum allowed by your new system.

And don’t get me started on importing free-text notes. They need a whole blog series on their own!

3. What’s mandatory?

If a target field is mandatory in the system then it’s probably mandatory for your import. Check that every row contains a value in that column and check whether those values need to be unique. In my experience you will find a duplicate Unique ID on both row 4 and row 10,206. Again, an Excel macro can count how many times each value appears in a column and report the duplicates. Google it!

4. A sideways approach

Sometimes you may find a data file with its data on the wrong axis. Fields listed vertically and allowed values listed horizontally. That’s horrible to import. However, all is not lost. Use your good friend Google and from there a pivot table you can soon rotate the data into a more importable format.

5. Tops and tails

Finally, after all the hard work is done, ensure that you don’t drop your guard and import your header row as data and also ensure that your last row has a blank row below it (imported CSV files sometimes miss the last row for this reason).

A final thought about cleansing data for LIMS import

There are a few pointers to try to stay sane when formatting data import files. Look out for our other blogs on data imports. With a clear aim and a little experienced guidance, you can avoid a lot of potholes and end with quality data in your LIMS and happy users

About The Author

Gary Rooksby has over 25 years’ experience implementing and evolving corporate systems including manufacturing and quality systems for a range of major clients such as the MOD. For the last 18 years Gary has specialised in Sample Management Software with emphasis on process optimisation and data management. Gary works in partnership with clients and draws on his wealth of experience to help institutes and their teams to maximise the benefits from new and upgraded systems. Business needs are constantly evolving, and Gary loves the changing challenges. Gary always focuses on delivering value to the users, whether that is financial, scientific or simply easing workloads. He believes that the system is never an end in itself; it is a tool to help the users achieve their goals and that principle is always at the heart of any system or data designs.