Sign in

Five times when data is not your friend

While digital data has become an increasingly important part of our lives (also used to improve organisational systems/decisions), there are times when data is not your friend. Read on to learn about 5 such instances and how to potentially fix them, as well as some general data disadvantages.

Data explained

Simply put, data is information. In the modern-day context, it is usually digital information that is stored online, via cloud computing or in a database. Your Dictionary has a few definitions, which include:

“Data is defined as facts or figures, or information that’s stored in or used by a computer.”

“Facts that can be analysed or used in an effort to gain knowledge or make decisions.”

“Statistics or other information represented in a form suitable for processing by computer.”

Data is becoming increasingly important because of the world wide web, programs and applications that people use. These allow us to capture information (or data) online. They also allow us to process, store and retrieve it in an efficient and often automated manner. Organisations around the world, including governments, are now using this easily accessible information to learn more about people and make conclusions or decisions about them. Whilst there may be certain security concerns around it, this data is generally used to improve processes and make lives easier.

*Application = a computer program that utilises web browsers and web technology to perform tasks over the internet

Bad data explained

As useful as data is, ‘bad data’ can be equally as useless (or even harmful). IT Chronicles defines bad data as;

“any data that is unstructured and suffers from quality issues such as inaccurate, incomplete, inconsistent, and duplicated information.”

These types of errors can lead to painstaking recapturing and/or reprocessing of the information, which is both costly and inefficient. However, in the event that such issues are not even detected, the repercussions can be much worse. It could lead to misinterpretations, flawed insights, and inaccurate conclusions. Basing decisions on this and utilising ‘blanket approach’ solutions can be highly problematic and even dangerous.

Five examples of ‘bad data’

False, incomplete, and non-standardised information, as well as irrelevant, duplicated, and misinterpreted data can lead to ‘blanket approach’ solutions and negative organisational outcomes. These are outlined below;

1. False or incomplete information

False or incomplete information is probably the most common cause of ‘bad data’.

What it is?

It is when inaccurate or untrue answers are provided by the user. This may be done unknowingly or on purpose to deceive the other party/organisation collecting the data. Sometimes, users also record incomplete information. This could be due to a lack of understanding the question, a lack of knowledge, e.g. not knowing their postcode, or purposeful omissions. The latter may be done in order to hide something and/or influence a specific outcome or conclusion from the data.

Why it’s bad?

False or incomplete information may render the data unusable. In this case, people may need to reach out to the users and manually rerecord the information. If left untouched, inaccurate data can lead to flawed insights and misguided or poor decision making (as it is not a reflection of the truth). In some situations, users may be given allowances based on false information which could even be unlawful. For example, copyright permissions may be given to a person who claims to be/is believed to be someone else.

How to Fix it?

As mentioned above, parties may choose to manually reach out to users and rerecord their information. However, this is time consuming, costly, and only possible if inaccuracies and omissions are in fact detected. A more efficient method could be for data to only be accepted if all fields are filled out. To avoid deception, data should be supported by evidence or technology where feasible, such as using identification documents or IP addresses to verify identity/location.

2. Non-standardised information / irrelevant data

Standardised and specific information not only makes the data easier to record (more streamlined for the user), but it also makes it easier to process.

What it is?

Non-standardised information can apply to both how the data is sourced and how it is recorded. This may be a situation where questions are asked or answered in an inconsistent way. For example, units of measurement may change from kilometres and metres in one section to miles, feet, inches, furlongs, and yards in another. Questions may also yield irrelevant — or unimportant — information which is not needed for the core purpose of the data collection task.

Why it’s bad?

Asking questions in a non-standardised way may make the task confusing to users, resulting in frustration, inaccurate answers, and even an unwillingness to continue. Similarly, irrelevant questions, will just unnecessarily prolong the process. Data collection methods should be as quick, easy, and streamlined as possible for the user. Furthermore, these issues make the data a lot harder to process. This is because answers need to first be converted into consistent formats (for direct comparison) and irrelevant/unimportant information to be identified and discarded.

How to Fix it?

Questions should be asked in a consistent way in order to eliminate confusion and frustration for the user. They should also follow a logical flow and be as easy as possible to understand (plain, simple English or whatever the first language of the person is). Data collection should be streamlined, with little to no irrelevant or redundant information. Where possible, fields should be filled in automatically, e.g. residential address or located country by geolocation, to make the experience quick, accurate and pleasant for the user.

3. Duplicated data

By using multiple data collection methods, it is possible that data is duplicated, leading to skewed results.

What it is?

Data duplication is when sets of data (sometimes parts of it and sometimes all of it) are copied, resulting in multiple identical sets. This is bound to happen when organisations make use of several different systems, applications, and programs to collect and store information. Many of these programs also have the ability to combine multiple sources of data without being able to differentiate it. This is made worse when various departments work too independently or in ‘data silos’.

Why it’s bad?

When data is duplicated, there is often very little way to detect it. In order to do so, time consuming and costly manual checks are often required. Assumptions and conclusions are made on cumulative (or collections of) information. By counting certain sources more than once, the insights gained are likely to be skewed towards a specific person’s view.

For example, 5 people are interviewed and 3 say that they prefer chocolate ice-cream and 2 prefer vanilla. If one of the ‘vanilla’ sources is counted twice, the (false) conclusion would be that an equal number of people like chocolate and vanilla. In this case, it may be easy to recount — as the number of subjects are low — but imagine doing this on a large scale.

How to Fix it?

As doing manual checks is extremely laborious and costly, it is best to identify and eliminate duplication at the source. This can be done by using a single system, application, or program to collect information and encouraging interdepartmental collaboration. As this is not always practical, IDs or tags can be associated with certain sources of data. Programs also exist that flag pieces of information that are too similar. These can then be manually checked and verified, combined, or discarded.

*Data Silo = an insular management system in which one data system or subsystem is incapable of reciprocal operation with others that are, or should be, related

4. Misinterpreted data (no context)

This is truly a consequence of losing sight of the bigger picture, the people behind the data and working in a ‘data silo’.

What it is?

Misinterpreted data is when the data itself is not necessarily incorrect, but the conclusions drawn from it are. It happens when answers are seen as numbers, words, and statistics, rather than actual real-life people, problems, and issues. Those who read the information often work in silos and are unable to gauge the context behind the data. Furthermore, their own information biases may sway certain insights and conclusions.

Why it’s bad?

It is bad simply because the conclusions are false. By purely using numbers (and sometimes biases) to guide decision making, proposed solutions or answers cannot adequately address the question at hand. This can result in costly and potentially embarrassing mistakes, as well as angry users/customers. In worst case scenarios, this may even lead to reputational damage due to prejudices and organisations accused of ignoring the ‘human factor’.

How to Fix it?

Information can be more accurately interpreted if some background is given (even if only through technological means), and analysts learn to read ‘more than just the numbers’. When sourcing data, users can be presented with a mix of quantitative and qualitative questions in order to give context to their answers. Furthermore, those who read the information/make the conclusions can be taught certain techniques to reduce the impact of information bias.

*Information Bias = any systematic difference from the truth that arises in the collection, recall, recording and handling of information in a study, including how missing data is dealt with

*Quantitative Data = the value of data in the form of counts or numbers where each dataset has a unique numerical value associated with it

*Qualitative Data = data that is descriptive and conceptual. It can be categorised based on traits and characteristics

5. Blanket approach solutions

Sometimes misinterpreted data can lead to ‘blanket approach’ responses.

What it is?

Blanket approach solutions are where a single solution is given to every outcome, i.e. it follows a ‘one size fits all’ philosophy. This normally stems from misinterpreted data, compromise, a lack of context, understanding and/or effort. For example, a web designer using exactly the same template for every website, despite each having very different requirements, uses and functionality.

Why it’s bad?

When a single, uniform response is given to ALL users, no one is happy. Of course, sometimes monetary or resource issues may dictate this. For large scale projects, it may be impossible to customise a solution for each user, and here a compromise must be reached — one that simply pleases ‘the most’ people. However, it can be very easy for organisations to fall into the ‘one size fits all’ approach, which leads to inadequacies, dissatisfaction, and abandonment by users.

How to Fix it?

Depending on the scale of the project, solutions should be as customised as possible. This is done by truly understanding the data, its context, and the people behind it (via the means mentioned above). Once initial information is captured and assumptions are made, this should be verified by returning to the source. Users should also be more actively involved in creating the technologies, systems and applications that address their problems.

Other data disadvantages

In addition to the above, there are a few other, overlapping and more generic cases where data is ‘not your friend’ or could be considered a disadvantage. These include;

  1. Cost to Company — Data collection, sorting, storage, and security can be extremely costly. While software may be expensive, the real cost lies in human interpretation of the data and solution implementation, as well as manual checks that may be required if irregularities are detected.
  2. Bottlenecks and Inefficiency — Sometimes collecting, sorting and analysing information is not worth the potential solutions that may arise from it. Waiting for these various phases may also cause project/organisational bottlenecks. Furthermore, people can become so consumed with the data that they lose sight of the bigger picture.
  3. Security Concerns (data mismanagement) — Along with the power of ‘owning’ data is the responsibility of looking after it. Not only can this be costly, but it is relatively easy for data breaches to occur both internally and externally. Customers more than ever are weary about organisations obtaining and storing their information.
  4. Data Deletion — While destroying all physical records is a mammoth task, it is fairly easy to delete all data at the click of a button. By using certain software providers, it is also possible that your organisation does not have full ownership of the information and/or is subject to their demands.

Conclusion: so, what should you do?

While collecting and utilising data can be dangerous, its benefits generally outweigh its disadvantages. Thus, depending on the nature of your business/organisation, you need to decide whether or not you want to actively use it, to what degree and with what outcomes in mind.

.everyone needs a website .my ideas developed along with the business .this shaped the approach with clients and people .visit us www.webstudiolab.co.uk

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store