SQL and the relational algebra

This page describes the relationship between The SQL query language and the "relational algebra", upon which it was built. It is intended as a very quick overview of SQL, giving its motivating principles, so that new users can better see SQL as an integrated tool, rather than a disjoint collection of commands.

Several references that present SQL in an entirely different manner are given at the end of the document.

Basic relational algebra

The relational algebra operates on "relations", which are sets of tuples of the same "arity", which is to say, collections of lists of the same length. Here are two 4-tuples:
   ( 1, 2, 3, 4 )
   ( 8, 7, 9, 4 )

Relations are commonly represented as tables, since we are used to viewing them in that format.

There are 5 primitive operations within the relational algebra:

  • Projection: extract specific columns from a relation

  • Selection: extract specific rows

  • Set union: create a new table composed of all the rows of two other tables

  • Set difference: remove the rows in one relation that appear in another

  • Cartesian product: "multiply" two tables to create a third

The first 4 of these are straight-forward, but the Cartesion Product deserves more attention. Here are 2 relations:

Relation1: arity of 4, length of 3

8791
1234
7623

 

Relation2: arity of 3, length of 2

347
198

and here is the Cartesion product of the two. It has arity of 4 + 3, and length of 3 * 2:

8791347
8791198
1234347
1234198
7623347
7623198

Relations and query languages

Database management systems based on the relational algebra were described by Edward F. Codd working for IBM during the early 1970s.

Codd's formulatin included:

  • Indexes and keys
  • Decomposition into normal forms, and
  • Integrity constraints

Multiple languages and interfaces wer developed to query and modify collections of relations, among them the Structured English Query Language, SEQUEL, devloped by Chamberlain and Boyce.

The most successful language, SQL, was based on SEQUEL. SQL requires that each relation has a "tablename", and each tuple postition has a "fieldname".

Here are 2 tables, Players and Teams:

Players with arity 4, length 3

PlayerInningsHitsTeamnumber
8791
1238
7623

 

Teams with arity 3, length 2

t_numgamesrank
347
198

SQL and the basic relational operators

SQL commands can be used to perform the basic relational operations. The table below lists the syntax for SQL commands to perform each relational operation discussed above. In these SQL commands the character "*" designates all fields in a table, and can be replaced by a comma-separated list of fieldnames, if desired:

Projection select fieldname_list from tablename

Example: select tnum,rank from Teams

Selection select * from tablename where

Example: select * from Players where Teamnumber = 1

Union ( select fieldname_list from tablename1 )
union
( select fieldname_list from tablename2 )

Note: Use the keyword ALL to keep duplicates

Set difference select * from ( tablename1 except tablename2 )
Cartesian product select * from tablename1, tablename2

Note that SQL does not specify how to perform a query; only what the result should be. It is a "declarative", rather than "procedural", language.

The SQL join operation

An SQL "join" is a Cartesian product followed by a selection, as in:
   select 
      * 
   from 
      Players, Teams
   where 
      Players.Teamnumber = Teams.t_num
The comma between "Players" and "Teams" directs the SQL processor to perform a Cartesion product on those two relations (tables) and then perform a selection operation reducing the size of the resulting table. Here is the Cartesian product with the two rows to be selected in red:

PlayerInningsHitsTeamnumbert_numgamesrank
8791347
8791198
1234347
1234198
7623347
7623198

and here is the result of the specified join operation:

8791198
7623347

Note that you can "project" any of the fieldnames (columns) by including them in a list in place of the "*" in the select statement. If necessary to avoid ambiguity, you can provide an alias name for each tablename, and use the alias as part of the fieldname.

For example, the following SQL query aliases Players as "a" and Teams as "b" and uses the aliases to uniquely identify fieldnames (even though the fieldnames in THIS example happen to all be unique):

      select 
         a.Player, b.t_num 
      from 
         Players a, Teams b
      where 
         a.Teamnumber = b.t_num
An alternate syntax for this join operation would look like:

      select 
         a.Player, b.t_num
      from 
         Players a 
      join 
         Teams b
      on 
         a.Teamnumber = b.t_num
Note that these commands can be entered on a single line, but have been placed on multiple lines in these examples for readability.

Additional information

There are many sources of additional information about SQL, including some good online tutorials:

In additon, many examples of using SQL with CLSD data are presented in: Using SQL within CLSD.