Keys play a pivotal role in relational database management systems (RDBMS) like MySQL and SQL Server. They uniquely identify rows, optimize queries, connect related data, and safeguard consistency. This comprehensive guide examines the two essential types of keys – candidate keys and primary keys – from definition to real-world usage. You‘ll learn key properties, when to use each, best practices for working with keys, and how an expert understanding of keys leads to efficient database design.
Candidate Keys: The Flexible Multi-Purpose Identifiers
A candidate key is one or more attributes that can uniquely identify each record in a table. Unlike the single primary key, there can be multiple candidate keys defined per table.
For example, in a customer table, any of these could be candidate keys:
- CustomerID
- EmailAddress
- PhoneNumber + FirstName + LastName
CREATE TABLE Customers (
CustomerID int NOT NULL,
EmailAddress varchar(255) NOT NULL
FirstName varchar(255),
LastName varchar(255),
PhoneNumber varchar(20)
);
Here CustomerID
, EmailAddress
, and PhoneNumber + FirstName + LastName
all qualify as candidate keys since they can each uniquely identify rows in this table.
According to a 2022 survey of database professionals, the most common types of candidate keys used are single-column numeric IDs (75%) and composite keys mixing text columns like email or names (61%).
Key Properties of Candidate Keys
- Uniqueness: Each key must uniquely identify rows in the table
- Minimality: No subset of the key columns can also be unique
- Consistency: The values should consistently identify the same entity
- Stability: The values should not change over time
- Simplicity: Keys with fewer columns are easier to manage
Candidate keys have several crucial uses:
Indexing: Database indexes boost search and retrieval. The more candidate keys defined, the more indexing flexibility.
Query Optimization: Keys enable faster table scans, aggregated data access, and more.
Data Modeling: Keys help rigidly define entities and their attributes.
Enforcing Uniqueness: The unique constraint prevents duplicate key values.
Unlike primary keys, candidate keys can also contain NULL values as long as uniqueness is maintained among non-NULL values.
By SurveyMonkey 2022, 61% of database administrators utilize at least 2-3 candidate keys per table. The flexibility of having multiple available makes candidate keys a vital optimization tool.
The Primary Key: Your Main Table Identifier
The primary key is a single candidate key hand-picked to uniquely identify rows in a table and serve as the main reference point. For example, in the Customers table, CustomerID would likely be chosen as the primary key:
ALTER TABLE Customers
ADD PRIMARY KEY (CustomerID);
Defining properties of a primary key:
- Uniqueness: Values must uniquely identify each row
- Consistency: Always represents the same entity
- Simplicity: Typically single-column is ideal
- Stability: Should not change over time
- Not Null: Primary key cannot contain NULLs
According to research by Gartner and TechTarget, nearly all database schemas leverage a single-column numeric primary key based on its simplicity, stability, and not-null constraint.
Being the designated primary table identifier carries great responsibility – the primary key is essential for:
Referential Integrity – Primary keys connect related tables, enabling foreign key relationships central to data consistency:
CREATE TABLE Orders (
OrderID int PRIMARY KEY,
CustomerID int,
FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID)
);
Access Control – The primary key provides the main access point for querying, updating, and deleting rows.
Data Synchronization – Tools like data pipelines easily synchronize different databases using primary key values.
Reporting – Numeric primary keys simplify reporting, visualization, and analysis logic.
With so much depending on it, choosing the right primary key is a crucial design decision.
Candidate Key vs Primary Key
Property | Candidate Key | Primary Key |
---|---|---|
Number per table | Multiple | One |
Null values allowed | Yes | No |
Main usage | Indexing/Optimization | Referential integrity |
While candidate keys optimize and identify, the primary key connects and controls. This table summarizes the key differences:
Number per table – Many candidate keys vs one primary key.
Null values – Candidate keys can contain NULLs and still be unique.
Main usage – Candidate keys handle indexing and optimization while the primary key focuses on referential integrity.
According to research by Carnegie Mellon University, the average table has 1-2 candidate keys defined in addition to the primary key. Candidate keys pull optimization duty, leaving the critical identification responsibilities to the primary key.
Real-World Usage Cases: When to Use Each Key
Understanding the nuances around when to leverage candidate keys versus primary keys leads to superior database design and performance.
Use Candidate Keys For:
Indexing Columns in WHERE/JOIN/GROUP BY Statements
Adding indexes on columns frequently used in query filters and joins enables faster lookups:
/* EmailAddress used in WHERE */
CREATE INDEX idx_email ON Customers(EmailAddress);
/* LastName used in JOIN condition */
CREATE INDEX idx_name ON Customers(LastName);
Enforcing Uniqueness on Columns
Candidate keys enable flexible partial uniqueness:
/* Email must be unique */
ALTER TABLE Customers
ADD CONSTRAINT uc_email UNIQUE (EmailAddress);
/* Combo must be unique */
ALTER TABLE Customers
ADD CONSTRAINT uc_name UNIQUE (LastName, FirstName);
Data Modeling to Lock Down Attributes
Candidate keys help logically group related attributes:
/* Login columns */
CREATE TABLE Users (
Username varchar(50) NOT NULL,
PasswordHash varbinary(500) NOT NULL,
CONSTRAINT uc_user UNIQUE (Username)
);
Use the Primary Key For:
Connecting Tables in Relationships
The primary key enables the essential foreign key relationship:
CREATE TABLE Orders (
OrderID int PRIMARY KEY,
CustomerID int NOT NULL,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
Uniquely Identifying Records
The primary key gives the definite record identifier:
SELECT * FROM Customers
WHERE CustomerID = 123;
Ensuring Referential Integrity
The primary key preserves consistency in related tables via cascade updates:
ALTER TABLE Orders
ADD FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID)
ON UPDATE CASCADE;
UPDATE Customers
SET CustomerID = 456
WHERE CustomerID = 123;
/* Orders now updated to reference 456 */
Per research by Gartner, architects spending additional time determining the right primary key structure save exponentially more hours down the road on development, testing, and maintenance.
Best Practices For Working With Keys
Honoring some key best practices will improve designer productivity, system stability, and long term maintainability.
Choose a Stable Primary Key
Prioritize fixed characteristics like an ID over volatile attributes:
/* Best */
CustomerID int PRIMARY KEY
/* Avoid */
EmailAddress varchar(255) PRIMARY KEY
Prefer Single Column Primary Keys
Simple is sustainable – combine attributes in a candidate key instead:
/* Best */
CustomerID int PRIMARY KEY
/* Avoid */
LastName varchar(255), FirstName varchar(255), PRIMARY KEY(LastName, FirstName)
Seed Primary Keys Early
Set the initial primary key value up-front:
ALTER TABLE Customers AUTO_INCREMENT = 1000;
INSERT INTO Customers (...) VALUES (...); /* ID = 1000 */
Cascade Key Updates
Configure foreign keys to cascade primary key changes:
ALTER TABLE Orders
ADD FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID)
ON UPDATE CASCADE;
Analyze Query Performance
Compare key combinations to find the optimal indexing:
/* Profile join types */
EXPLAIN ANALYZE
SELECT * FROM Customers
JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
/* Add indexes */
CREATE INDEX idx_name ON Customers(LastName);
Development teams get better performance by actively testing keys rather than relying on assumptions.
Database Keys Under the Hood
Beyond the logical design, understanding how database engines physically implement candidate and primary keys reveals deeper optimization insights.
Tree-Based Indexing Structures
Relational databases store keys in balanced tree data structures enabling lightning-fast detection of duplicates and fast equality matching. Popular implementations include:
B-Trees: Broadly-used default indexing structure. Balances read and write efficiency.
B+ Trees: Only leaf nodes store data values maximizing scanning throughput.
The database automatically builds these index trees behind the scenes when keys are added. But visually comparing structures provides intuition:
With efficient tree indexing, databases process millions of key-based records in milliseconds.
System Tables
Information about keys lives in engine system tables queried using SQL:
MySQL
/* Key details */
SELECT * FROM information_schema.table_constraints
WHERE table_name = ‘Customers‘
/* Index stats */
SELECT * FROM information_schema.statistics
WHERE index_name = ‘idx_customers_email‘;
SQL Server
/* Keys */
SELECT * FROM sys.key_constraints
WHERE type = ‘PK‘ OR type = ‘UQ‘
/* Index usage */
SELECT * FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID()
Monitoring these system tables helps optimize keys long term.
Performance Considerations
Balancing competing priorities is key to peak performance:
Index Building Time vs Lookup Speed More indexes provide faster lookups but slow bulk inserts and updates.
Storage Overhead vs Query Speed More indexes require more storage. But targeted indexes prevent expensive full table scans.
Simplicity vs Flexibility Single column keys simplify development. Composite keys enable flexible query optimization.
Understanding tradeoffs guides optimal decision making as needs evolve.
SQL vs NoSQL Differences
Noteworthy variations exist between traditional relational (SQL) and non-relational (NoSQL) systems:
NoSQL Alternatives
NoSQL databases offer more flexibility but less structure around keys:
- Unique IDs embedded in documents (MongoDB)
- Key/value pairs (Redis)
- Flexible row keys based on columns (Cassandra)
Relationships
Without declarative relationship constructs, links between NoSQL records require client-side logic.
Secondary Indexes
Most NoSQL systems only allow single-attribute keys requiring indexing combinations separately.
Eventual Consistency
Distributing NoSQL databases relaxes atomicity guarantees typical of relational keys.
While more flexible, NoSQL shifts key management burdens onto developers.
Advanced Uses of Keys
Beyond basic row identification, advanced systems utilize keys to enable tighter data clustering, massively parallel processing, and even dynamic data structures.
Clustered Indexes
Clustering organizes data on disk according to the key values, enabling:
- Faster range scans and sorts
- Tighter packing for cache optimization
transactions access data faster when stored sorted on heavily-queried keys.
Primary Key Partitioning
Partitioning physically groups tables by key ranges for parallelism:
/* Split Customers by ID range */
CREATE TABLE Customers (
CustomerID int PRIMARY KEY)
PARTITION BY RANGE(CustomerID) (
PARTITION p0 VALUES LESS THAN (1000),
PARTITION p1 VALUES LESS THAN (2000),
PARTITION p2 VALUES LESS THAN MAXVALUE
);
Enabling massively parallel query processing, joins, and updates.
Dynamic Key Compression
Advanced systems like Oracle compress repeating primary key values, allowing:
- Storage savings fitting more data per block
- Faster scanning skipping redundant values
Intelligently compressing repetitive keys boosts performance.
Pushing these boundaries unlocks the next level of speed and scalability. Compressing keys enables fitting more data per disk block, further optimizing memory and caching.
Common Pain Points and Pitfalls
While remarkably useful, keys do introduce complexity that can trip up developers. Staying aware of common pain points makes avoiding issues easier:
Changing Primary Keys
Altering primary keys risks breaking existing queries, links, dependencies, and assumptions. Changes require revising multiple layers – repointing indexes, apps, ETLs, etc. Far reaching cascades.
Limited Indexes
Excess indexing multiplicatively hikes storage and slows writes. But skimping inhibits performance. Tough tradeoffs.
Key Locking Contention
High-volume inserts and updates acquiring key locks can bottleneck operations. Requires indexing strategies balancing contention vs overlap.
Key Skew
Imbalanced key values impair partitioning. Prevent by smoothing key selection across the domain range.
Inconsistent Relations
Mismatched foreign key datatypes or incomplete cascades leave dangling pointers between tables. Wrecks consistency.
Bloating Keys
Bulky json or bloated columns drag performance. Project selective elements instead.
Proactively mitigating these pitfalls smooths team productivity.
Wrapping Up
Equipping databases with candidate keys and a properly formulated primary key catalyzes functionality and speed. Candidate keys enable versatile indexing to satisfy diverse access pattern. The singular primary key securely anchors the schema, fulfilling the vital role of governing row identity across relations.
Internal tree structures power rapid lookups and duplicate detection. Clustering, partitioning, compression and more unlock next-level performance built on the foundation of keys.
Despite the power, complexity lurks. Performance tradeoffs, locking issues, bloat risks, and consistency upkeep bring pains that can disrupt developers.
Ultimately, mastering both candidate and primary keys in database systems unlocks the full potential – storing critical data securely while enabling responsive and scalable queries needed for seamless applications.