Thursday, September 19, 2024

Introduction to Relational Database Systems

If you collect and use any kind of data you probably have some kind of organizing system. Whether you use index cards, a filing cabinet, Excel spreadsheets, or some kind of database program, your system should let you add and change data, delete data, and retrieve data, and it should work faster and more efficiently than if you had to do it by hand.

When your record keeping needs outgrow a filing cabinet or bundle of index cards, a computerized system can help you manage the growing quantity of data, and let you use your data more effectively. Spreadsheet programs can manage small, simple databases. Desktop programs such as Microsoft Access and FileMaker can handle databases too big or too complicated for a spreadsheet. Larger databases, databases accessed by many people at once, and databases that feed client/server and web applications need real database servers.

The Relational Model

The relational model, invented by IBM researcher Ted Codd in 1970, wasn’t turned into a commercial product until almost 1980. Since then database systems based on the relational model, called relational database management systems or RDBMSs, have come to dominate the database software market. Today few people know about any other kind of database management system.

People usually find relational databases alien and forbidding. The relational model derives from a mathematical conception of data and relationships; it doesn’t model or imitate any organizing system people use in real life. If you sat down with a pile of index cards to organize your CD collection you wouldn’t choose a relational model. If you sat down with a pile of index cards to organize all of the transactions a typical bank handles every day, you’d give up on the index cards and look for a better solution. If you worked long and hard, you’d invent the relational model.

I won’t get into the theory behind the relational model, except to point out that few RDBMS implement the relational model completely. Although commercial RDBMSs have a lot in common, each system has quirks and non-standard extensions. You must understand relational theory to correctly design a database — just learning a particular RDBMS won’t get you all the way there.

A good RDBMS and a well-designed relational database give you some important benefits:

  • Data integrity and consistency maintained and/or enforced by the RDBMS.
  • Redundant data eliminated or kept to a practical minimum.
  • Data retrieved by unique keys.
  • Relationships expressed through matching keys.
  • Physical organization of data managed by RDBMS.
  • Optimization of storage and database operation execution times.
  • Concurrency: database users don’t corrupt each other’s work.
  • Scalable: can spread load across multiple CPUs or servers.

If you don’t use an RDBMS you have to forego some of those benefits, or write code yourself.

Normalization

If you know anything about relational databases, you probably know about normalization. The process of normalization transforms data into forms that conform to the relational model. Normalized data enables the RDBMS to enforce integrity rules, guarantee consistency, and optimize database access. Learning how to normalize data takes significant time and practice. Data modelers spend a lot of time understanding the meaning of data so they can properly normalize it, but programmers frequently downplay normalization, or dismiss it outright as an academic problem. Most databases come from power users and programmers, not data modelers, and most databases suffer from un-normalized data, redundancy, integrity and performance problems. Un-normalized databases usually need a lot of application code to protect the database from corruption.

If you find yourself working with a programmer or database designer who dismisses normalization as academic or theoretical, beware. You wouldn’t get into a plane with a pilot who dismissed physics as unnecessary theory, so don’t trust your data to someone who doesn’t understand database theory and normalization.

The ACID properties

Large database systems that support many simultaneous users must insure that changes to the database can’t corrupt the database or leave it in an inconsistent state. Most database systems can bundle multiple updates (adds, changes, or deletes) together into transactions. Transactions have four important features usually referred to as the ACID properties:

Atomic: Transactions either completely succeed or completely fail. If the system crashes before the transaction completes the database’s state does not change.

Consistent: Transactions preserve database consistency. A transaction transforms the database from a consistent state to another consistent state.

Isolated: A transaction’s updates don’t interfere with other transactions, or other users of the database. Until a transaction completely succeeds the database system conceals the individual updates from other transactions.

Durable: Once a transaction completes (commits), the updates survive in the database.

All of the commercial RDBMSs support transactions. MySQL, a popular open-source system, does not fully support transactions.

SQL

As commercial relational databases entered the market and RDBMSs proliferated, Structured Query Language (SQL) became the most popular language for querying and manipulating relational databases — so popular that people often use the terms SQL and relational database interchangeably. All of the important commercial and open source RDBMSs support most or all of the SQL standard.

As a computer languages SQL has simple syntax and few keywords, because SQL merely expresses relational operations in an English-like syntax. But the relational operations can get complicated, and SQL soon gets in the way. If we used relational algebra and mathematical notation to query databases, we wouldn’t have to live with SQL’s shortcomings, but few people would ever learn how to use the simplest database.

Some sample SQL statements:

INSERT INTO Songs (song, artist) VALUES (‘I Feel Good’, ‘James Brown’)

UPDATE Cupboard SET qty=2 WHERE itemname=’Captain Crunch cereal’

SELECT year,color,price FROM Cars WHERE make=’Ford’ AND model=’Falcon’

You can use SQL without really understanding database design or relational theory. Casual database users can get by with a little SQL, but programmers and database designers should understand relational theory and the principles of database design. Unfortunately employers and recruiters often choose candidates with two years experience with, say, Oracle PL/SQL over candidates who have a deeper understanding of relational theory and database design.

Relational Database Software

On the desktop, Microsoft Access has enough relational qualities to work for small databases, but doesn’t hold up well under load. Access doesn’t support transactions, so it can’t scale to handle many users or multiple servers. FileMaker doesn’t support the relational model or transactions, though it can organize and manipulate small and medium-size databases without needing a lot of programming support.

Commercial RDBMSs:

  • Oracle
  • Microsoft SQL Server
  • IBM’s DB/2
  • Sybase

All of the commercial products cost too much for desktop users or small businesses. Large companies almost always own one (or more) commercial RDBMS. All of the commercial RDBMSs conform to the relational model (more or less), and they all implement the ANSI/ISO SQL language standard (with some exceptions and lots of vendor-specific extensions).

Free/Open Source RDBMSs:

  • MySQL
  • PostgreSQL

MySQL backs a lot of web sites and seems the database of choice for web developers. MySQL has some big holes in its implementation of the relational model, though, and lacks the integrity and scalability features of the commercial databases. PostgreSQL doesn’t have the same shortcomings as MySQL, but it also doesn’t really compare to the commercial products. Both MySQL and PostgreSQL can handle large “real” databases and work fine for a wide range of applications. MySQL and PostgreSQL don’t replace the commercial products (yet); but they do replace ad-hoc data storage systems such as flat files and hash tables.

Books

The Database Relational Model, C. J. Date
Practical Issues in Database Management: A Reference for the Thinking Practitioner,Fabian Pascal
Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design,Michael Hernandez
A Guide to SQL Standard, C. J. Date and Hugh Darwen
SQL Queries for Mere Mortals, Michael Hernandez and John L. Viescas
SQL: Visual QuickStart Guide, Chris Fehily

Web Sites

Free SQL course and lots more: http://www.sqlcourse.com

SQL Tutorial: http://www.w3schools.com/sql/default.asp

All about SQL: http://www.sql.org

WebMonkey database tutorials:hotwired.lycos.com/w ebmonkey/backend/databases/index.html

Yet another intro: http://www.databasejourn al.com/sqletc/article.php/1469521

Database Debunkings: http://www.pgro.uk7.net/index.htm

Products

MySQL: http://www.mysql.com

PostgreSQL: http://www.postgresql.org

Oracle: http://www.oracle.com

Microsoft SQL Server: http://www.microsoft.com/sql/default.asp

Sybase: http://www.sybase.com

IBM DB/2: http://www-3.ibm.com/software/data/db2

Originally published by Computer Bits magazine
Enhancing the computing experience since 1991

Greg Jorgensen develops web sites and databases, and consults on web strategy, usability, and security. For more information see http://www.pdxperts.com. Greg also started Geek School to help kids and teens learn computer programming. Please feel free to email the author with your comments about this article: gregj@pdxperts.com.

Related Articles

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles