What is data integrity?

Consider that one of your colleagues accidentally deletes the record of your customer, Mark Doe. You not only lose information about a valuable customer, but you also risk losing your relationship with Jodie Doe and Sally Doe, his family members. In another case, after migrating data to the cloud, you find a ton of duplicated data, jeopardizing your operations for two whole days while data engineers try to resolve the issues.

Broken data attributes and relationships, data theft in a virus attack, or a server crash resulting in loss of data are the nightmares of data-driven organizations using data for real-time analytics and decision-making. They need undamaged data that represents real-world entities correctly and consistently. Compromised or poor-quality data can never build trust in their decisions. They need an assurance of data integrity to accept that the data powering their decisions is trustworthy.

What does data integrity mean?

Data integrity refers to the completeness, accuracy, consistency, and security of data throughout its entire life. It is indicated by no difference between any two instances of data, signifying that the data is intact. Data integrity uses a collection of processes, rules, and standards designed to keep data undamaged and accurate over its life, wherever it may move.

You will find that integrity is not just about data but also about its relationships. For example, if a customer address is changed, all relationships of that attribute must get updated. As data moves and gets transformed across enterprise systems, integrity ensures that data remains intact and correctly connected.

Data integrity is traditionally considered a dimension of data quality. But operationally, you will find it aligned more to data governance. It implements rules and processes to assure data quality while data is entered, stored, moved, and used across systems.

In other words, data integrity uses rules and processes to protect your data from damage during enterprise operations. And it leverages data security to defend your data from any outside damage.

What are the different types of data integrity? 

You can maintain data integrity at two different levels, physical and logical.

  • Physical integrity is all about the completeness and accuracy of your data. When data is stored, moved, or used, it can lose its integrity. Common factors affecting physical integrity include power outages, hardware failure, hacking, malicious attacks, and natural disasters. Physical data integrity is essential for maintaining business continuity.  
  • Logical integrity ensures that your data is unchanged during its movement across databases. It is enforced in database models during design and data use. Factors affecting logical integrity are diverse. They also include human errors and hacking. Continuously checking for errors and using validation methods can help maintain logical data integrity. 

Logical data integrity is essential for assuring trusted decisions and regulatory compliance. Largely a feature of the database, logical data integrity is of four different types. 

Entity integrity

Entity integrity ensures that no data is redundant, no fields are null, and no data is duplicated. It uses the concept of primary keys, the unique values that identify pieces of data. It is a feature of relational systems that store data in tables, which can be linked and used in a variety of ways.

Referential integrity

Referential integrity ensures that data is stored and used uniformly. It uses the concept of foreign keys, which either refer to a primary key value of another table or are null. The null value of a foreign key indicates either no or an unknown relationship. Rules are embedded into the database structure about foreign keys. These rules can define constraints to eliminate duplicate data and guarantee data accuracy.

Domain integrity

Domain integrity ensures the accuracy of each piece of data in a domain. A domain is a set of acceptable values a column can contain. The defined constraints limit the format, type, and amount of data entered. For example, the rules prevent users from entering wrong information in the birth date field.

User-defined integrity

User-defined integrity provides additional rules and constraints to align with the specific user requirements. It is typically used when entity, referential, and domain integrity are not sufficient to safeguard data. Business rules are sometimes part of the user-defined integrity constraints.

Benefits of Data Integrity

  • Make trusted decisions
  • Maintain business continuity
  • Achieve regulation compliance
  • Improve system stability and performance
  • Prevent data loss

How can you achieve high data integrity?

As data volume, variety, and speed of arrival increase, managing data becomes challenging. It also exposes data to more risks affecting its integrity. 

  • Human errors: Individuals can enter information incorrectly, make errors in following the data integrity procedures, or slip in following the protocols. They can also delete or duplicate data by mistake.
  • Data transfer errors: A piece of data present in the destination table but not in the source table indicates errors during a data transfer.
  • Security errors: Spyware, malware, and viruses can alter, delete, or steal data. Poor access or password management can also make data vulnerable. 
  • Hardware errors: Sudden failures of computers, malfunctions in devices, or performance issues of servers affect data integrity to a great extent. Such faults can impact data accuracy, completeness, and access. 

You can minimize the impact of these factors by limiting data access, using error detection software, and validating data. Regular data backups and data audits also help to improve data integrity.

What are the best practices for data integrity?

Best practices for achieving high data integrity are focused on the processes of data handling. They also take into account the related practices of data quality, governance, and security.

  1. Validate input data: At the point of entry of data into your systems, validate it. Data from different sources can contain human, security, and data transfer errors. Validation ensures that data is correct, relevant, and secure.
  2. Make data backups: Data can get lost due to software bugs or viruses, hardware failures, or human errors. Backing up enterprise data regularly ensures that an alternate copy is available for recovery. Raw data arriving from different sources is irreplaceable, and having an additional secure backup copy can help retrieve raw data in case of any failures.
  3. Implement access controls: Access to data is critical in any organization. But it should be controlled to ensure that data is not misused. Proper access control makes data available to persons who need it to perform defined tasks. Restricting unauthorized data access and securing sensitive data helps reduce misuse.  
  4. Establish data quality practice: A data quality practice helps streamline procedures to resolve issues and improve trust in data.
  5. Operationalize data governance: Data governance enables enterprise-wide policies for access control. It can ensure regulatory compliance and mitigate the risks of handling enterprise data.
  6. Adopt security best practices: In addition to electronic access, physical access to data needs to be controlled. Following security best practices ensures that the integrity of data is not compromised. 

Getting your business data integrity ready

Data integrity relies on a set of rules and procedures to improve system stability and performance. High data integrity requires that data must be unaltered and used correctly. You can use a few questions to assess how your organization scores. 

  • How is the data entered?
  • What are the probabilities of entering the wrong data?
  • Are rules and checks in place to ensure the correct data type, format, range, and amount?
  • How is the data being transferred?
  • What are the risks in the data transfer processes?
  • Is the data safeguarded against corruption during the transfer?
  • Are the data security measures in place?
  • Is the data access limited to the right persons?
  • Is the sensitive data protected?
  • Does the data remain accurate and consistent during updates?

Once you assess what your data processes look like, you can use a simple 5-step process to get your business data integrity-ready.

  1. Begin by identifying and controlling the factors affecting data integrity in your organization. They could be human, data transfer, security, or hardware errors.
  2. Keep data integrity, quality, and security at the heart of your data strategy.
  3. Create awareness and train your teams to monitor data for integrity failures.
  4. Invest in the right tool that leverages AI and ML technology to observe data constantly and look for potential errors proactively. Collibra Data Quality & Observability validates data integrity between source and target systems. With the integrated Collibra Platform, you can leverage data catalog, governance, and lineage, to deliver scalable data quality and integrity.
  5. Follow best practices. 

Improve the overall accuracy, completeness, and reliability of your data sets with data integrity. With Collibra, you are always assured of data integrity to meet the stringent requirements of regulatory compliance. You can also take the pain out of your data movements and migrations. The comprehensive platform scales as your business grows, assuring that your data integrity is always safeguarded. 

In summary

Good business decisions are founded on the integrity of data. A thorough understanding of data integrity at different levels prepares you to achieve it in your organization. The best practices for data integrity go hand in hand with data quality, governance, and security.

Choosing the right comprehensive platform helps you with trusted data that can power all your business initiatives.

Want to learn more about data quality & observability?

Unlock the Gartner report “The State of Data Quality Solutions”

Related resources


What is data quality and why is it important?


The 6 data quality dimensions with examples

View all resources

Want to learn more about data quality & observability?

Unlock the Gartner report “The State of Data Quality Solutions”

More stories like this one

Jul 15, 2024 - 4 min read

How to observe data quality for better, more reliable AI

Read more
Jul 2, 2024 - 4 min read

Collibra AI Governance and de-risking unstructured data at Ohalo

Read more
Jun 27, 2024 - 5 min read

Defining responsible AI governance with UCLA Health

Read more