You have a data set and you know what your target LIMS system can import. So, how do you turn one into the other? It can be a real challenge to manipulate data files and half the problem is understanding the assumptions and implicit rules in the original. We briefly covered those in an earlier blog (Overcoming the Challenges of Cleansing Data to Import into a LIMS) so let’s think about the specifics.
1. Understand fully what is required
Understand the required resulting format fully so that you have a clear target towards which you can work. And don’t try to use one tool for every task. Excel is great for many things but not all. It’s not good, for example, at differentiating and processing empty text values mixed with null values. A free utility such as Notepad++ is much better for that but can’t do pivot tables. Use the right tool for the job.
2. Data types
Be clear about whether a particular field in your new system is expecting just a date or a date and time. SQL databases tend to use the same underlying data type and reformat the output. Input, however, is much fussier. Be consistent. If one column needs a time element on a field then make sure all entries for that column have times, even if they’re mostly “00:00”. And watch out for the lengths of text fields. For each column check the value with the maximum length (an Excel macro is good for this) and compare that with the maximum allowed by your new system.
And don’t get me started on importing free-text notes. They need a whole blog series on their own!
3. What’s mandatory?
If a target field is mandatory in the system then it’s probably mandatory for your import. Check that every row contains a value in that column and check whether those values need to be unique. In my experience you will find a duplicate Unique ID on both row 4 and row 10,206. Again, an Excel macro can count how many times each value appears in a column and report the duplicates. Google it!
4. A sideways approach
Sometimes you may find a data file with its data on the wrong axis. Fields listed vertically and allowed values listed horizontally. That’s horrible to import. However, all is not lost. Use your good friend Google and from there a pivot table you can soon rotate the data into a more importable format.
5. Tops and tails
Finally, after all the hard work is done, ensure that you don’t drop your guard and import your header row as data and also ensure that your last row has a blank row below it (imported CSV files sometimes miss the last row for this reason).
A final thought about cleansing data for LIMS import
There are a few pointers to try to stay sane when formatting data import files. Look out for our other blogs on data imports. With a clear aim and a little experienced guidance, you can avoid a lot of potholes and end with quality data in your LIMS and happy users