Database Keys and Keys for Data Warehouse

6 min readJan 26, 2025

A database is a structure that stores information in an organized, consistent, reliable, and searchable way. And there are different kinds of databases intended for different use cases. The kind we’ll focus on here is the relational database.

Keys are one of the basic requirements of a relational database model. We require keys in a database management system (DBMS) to ensure that data is organized, accurate, and easily accessible.

Keys help to uniquely identify records in a table, which prevents duplication and ensures data integrity. Keys also establish relationships between different tables, allowing for efficient querying and management of data [1, 2].

Different Types of Database Keys

1. Primary Key

Primary Key in each table ensures that each record is unique within that table.

Unique values allow us to unlock the power that databases provide. In a database, a unique value is a value that doesn’t show up in any other row in a given column. So there’s one and only one of any particular value for that particular field. [1]

A primary key is a unique identifier for each record in a table
It must contain unique values and cannot contain NULL
Primary key doesn’t change over time

Example: In a table of customers, an CustomerID column could serve as the primary key since each customer has a unique ID

2. Foreign Key

The relationship between primary and foreign key allows for the creation of relational databases, where data across multiple tables can be interconnected and queried in meaningful ways. Foreign keys are essential in maintaining consistency and coherence in database systems.

A foreign key is a field (or collection of fields) in one table that refers to the primary key in another table.
It establishes a link between two tables and is used to maintain referential integrity.
A foreign key can have duplicate values and NULL values (if not explicitly restricted).

Example: In a Sales table, an CustomerID column could serve as a foreign key referring to the CustomerID in the Customers table, establishing a relationship between orders and customers.

3.Composite Key

In some situations, we might need to use two or more fields in the data to act as a key. This is called a composite key. The combination of values in these columns must be unique for each row in the table.

A composite key is a primary or unique key that consists of more than one column to uniquely identify a record.
It’s used when no single column can uniquely identify a row.

Example: Imagine that you have no CustomerID column in your Customers table. The combination of Name and SocialSecurityNumber can be used as a composite key to create CustomerID column.

Composite keys can increase complexity when managing relationships with foreign keys in other tables. For example, any table referencing the Customers table would need to include both Name and SocialSecurityNumber as foreign keys.

However, a composite key is useful when no single column can uniquely identify a record, and the combination of columns provides the necessary uniqueness.

4. Candidate Key

A candidate key represents a unique combination of one or more columns within a table that can uniquely identify each row and can potentially become the primary key. Each candidate key ensures that no two rows in the table can have the same combination of values for the candidate key columns. One candidate key is chosen to be the primary key, and the others are then alternate keys.

A candidate key is a field or combination of fields that could serve as a primary key, meaning it can uniquely identify records in a table.
A table can have multiple candidate keys, but one of them is chosen as the primary key.

Example: In a Customer table, both CustomerID and SocialSecurityNumber could be candidate keys since each uniquely identifies a customer. (but here, we assume that SocialSecurityNumber in customers table can not be NULL).

CustomerID is best suited for the primary key, so selected as primary key. The rest candidate keys are called alternate keys.

5. Alternate Key

A unique identifier for a record that is not the primary key. It can be used as a secondary way to look up data in table.

(An alternate key is a candidate key that is not chosen to be the primary key.)

An alternate key is any candidate key not chosen as the primary key.
It provides an alternative way to uniquely identify records in a table.

Example: If CustomerID is chosen as the primary key in a Customers table, then SocialSecurityNumber could be an alternate key.

It meets the criteria of uniqueness and identification.

Each of these keys plays an important role in defining the structure and relationships within a database, ensuring data consistency, uniqueness, and referential integrity. [3,4,5]

Keys for Data Warehouse

In this part of the article, we will mention about the keys used in the data warehouse.

1. Natural Key

In a data warehouse, a natural key is a key that is directly from the data in the source system and has inherent business meaning. It is also known as a business key or domain key. This key is typically a unique identifier used in the operational source system, like a Customer ID.

A natural key has business meaning and is used in the business world to identify an entity
Natural keys originate from the source system and are often used to track records.
In some cases, natural keys can change in the source system due to business reasons (e.g., a reorganization of codes). This can cause issues in maintaining relationships in a data warehouse.
Data from multiple source systems can result in duplicates if the same natural key exists in more than one system (e.g., two systems with different customers using the same CustomerID).

Natural keys aren’t generated. They come from the source systems.

Example: In a Customer table, an CustomerID is the natural key from the source system.

2. Surrogate Key

A surrogate key is a system generated (could be GUID, sequence, unique identifier, etc.) value with no business meaning (unlike natural keys such as Social Security Numbers or Customer IDs) that is used to uniquely identify a record in a table. The key itself could be made up of one or multiple columns (i.e. Composite Key).

Surrogate keys are widely used in data warehousing because they improve query performance, simplify joins, and enable effective data tracking, even when source system keys change.

Example: In a Customer table, an CustomerSurrogateKey is the system generated surrogate key.

3. Version Key

A version key is used where historical data is preserved by creating a new version of a row each time a significant attribute changes. Each version of a record has a unique version key, which helps track different versions of the same business entity over time.

Example: In the Customer table:

If we want to keep track of each customer’s historical data (like address changes) in the customer table, we would add a version key that increments with each new version of the customer data. [3, 6, 7, 8]