INTERACTIVE SOFTWARE

Who Has Time to Cleanse Date/Time Data?

Data Cleansing Date Time

Data Cleansing Date Time

Text data that needs to be matched with pre-defined data sets is often seen as most challenging. But what about innocuous date and time fields – surely they can’t be difficult? Oh yes they can! Here are four tips to avoid many potholes and maintain data accuracy.

Tip 1 – Be consistent

There are two distinct types of time-related data – points in time and durations. Points in time, such as dates of birth may simply require the date. However, sometimes a time element is very important e.g. when a sample was taken or frozen. Always use the same data type for storage – “Datetime” is best.

Durations are more challenging. You need to be clear about the accuracy you need. Do you need a number of hours, minutes or seconds or even higher accuracy? Once you have your definition you must stick strictly to it for that field. You can then record the number of units – the lowest unit e.g. seconds, as an integer.

Tip 2 – How are empty values interpreted?

The rules around what “empty” means can be vague for date fields and this is the most common issue. However, making such a field mandatory doesn’t always work. If the user doesn’t know the correct value at the time the record is entered, what should they enter? 01-01-1900? Today’s date? Whatever they choose is guaranteed to be wrong. Only make entry mandatory if the user at that point will always know the correct value.

When you decide how to interpret empty values for a particular field document that rule. Users must be consistent.

Tip 3 – Regional settings

If you’ve inherited a set of data entered by users in a different country then look out for regional layouts and interpretations. When exchanging data between the UK and USA, is “01/04/2021” April 1st or January 4th? Be clear!

Tip 4 – Perform relative checks

This kind of issue can be the most difficult to spot. Sometimes known as “sanity checks” – validate date fields against each other on the same record. If you’re looking at tissue sample data then compare the Date Taken with the Frozen Date and other related dates to ensure they make sense. I’ve seen a data set indicating the samples were taken several years before the donors were born!

A final thought about cleansing date/time data

Avoiding easy-to-miss systemic issues can smooth the way to basing clinical and business decisions on reliable date/time data and viewing related data on accurate timelines. And then you can make time for that much-needed cuppa!

Related Topics