SQL and the relational algebra
This page describes the relationship between The SQL query language and the "relational algebra", upon which it was built. It is intended as a very quick overview of SQL, giving its motivating principles, so that new users can better see SQL as an integrated tool, rather than a disjoint collection of commands.Several references that present SQL in an entirely different manner are given at the end of the document.
Basic relational algebra
The relational algebra operates on "relations", which are sets of tuples of the same "arity", which is to say, collections of lists of the same length. Here are two 4-tuples:( 1, 2, 3, 4 ) ( 8, 7, 9, 4 )
Relations are commonly represented as tables, since we are used to viewing them in that format.
There are 5 primitive operations within the relational algebra:
- Projection: extract specific columns from a relation
- Selection: extract specific rows
- Set union: create a new table composed of all the rows of
two other tables
- Set difference: remove the rows in one relation that appear in
another
- Cartesian product: "multiply" two tables to create a third
The first 4 of these are straight-forward, but the Cartesion Product deserves more attention. Here are 2 relations:
| 8 | 7 | 9 | 1 |
| 1 | 2 | 3 | 4 |
| 7 | 6 | 2 | 3 |
Relation2: arity of 3, length of 2
3 4 7 1 9 8
and here is the Cartesion product of the two. It has arity of 4 + 3, and length of 3 * 2:
| 8 | 7 | 9 | 1 | 3 | 4 | 7 |
| 8 | 7 | 9 | 1 | 1 | 9 | 8 |
| 1 | 2 | 3 | 4 | 3 | 4 | 7 |
| 1 | 2 | 3 | 4 | 1 | 9 | 8 |
| 7 | 6 | 2 | 3 | 3 | 4 | 7 |
| 7 | 6 | 2 | 3 | 1 | 9 | 8 |
Relations and query languages
Database management systems based on the relational algebra were described by Edward F. Codd working for IBM during the early 1970s.Codd's formulatin included:
- Indexes and keys
- Decomposition into normal forms, and
- Integrity constraints
Multiple languages and interfaces wer developed to query and modify collections of relations, among them the Structured English Query Language, SEQUEL, devloped by Chamberlain and Boyce.
The most successful language, SQL, was based on SEQUEL. SQL requires that each relation has a "tablename", and each tuple postition has a "fieldname".
Here are 2 tables, Players and Teams:
| Player | Innings | Hits | Teamnumber |
| 8 | 7 | 9 | 1 |
| 1 | 2 | 3 | 8 |
| 7 | 6 | 2 | 3 |
Teams with arity 3, length 2
| t_num | games | rank |
| 3 | 4 | 7 |
| 1 | 9 | 8 |
SQL and the basic relational operators
SQL commands can be used to perform the basic relational operations. The table below lists the syntax for SQL commands to perform each relational operation discussed above. In these SQL commands the character "*" designates all fields in a table, and can be replaced by a comma-separated list of fieldnames, if desired:
| Projection | select fieldname_list from tablename
Example: select tnum,rank from Teams |
| Selection | select * from tablename where Example: select * from Players where Teamnumber = 1 |
| Union | ( select fieldname_list from tablename1 )
union ( select fieldname_list from tablename2 ) Note: Use the keyword ALL to keep duplicates |
| Set difference | select * from ( tablename1 except tablename2 ) |
| Cartesian product | select * from tablename1, tablename2 |
Note that SQL does not specify how to perform a query; only what the result should be. It is a "declarative", rather than "procedural", language.
The SQL join operation
An SQL "join" is a Cartesian product followed by a selection, as in:
select
*
from
Players, Teams
where
Players.Teamnumber = Teams.t_num
The comma between "Players" and "Teams" directs the SQL processor to
perform a Cartesion product on those two relations (tables) and then
perform a selection operation reducing the size of the resulting table.
Here is the Cartesian product with the two rows to be selected
in red:
| Player | Innings | Hits | Teamnumber | t_num | games | rank |
| 8 | 7 | 9 | 1 | 3 | 4 | 7 |
| 8 | 7 | 9 | 1 | 1 | 9 | 8 |
| 1 | 2 | 3 | 4 | 3 | 4 | 7 |
| 1 | 2 | 3 | 4 | 1 | 9 | 8 |
| 7 | 6 | 2 | 3 | 3 | 4 | 7 |
| 7 | 6 | 2 | 3 | 1 | 9 | 8 |
and here is the result of the specified join operation:
| 8 | 7 | 9 | 1 | 1 | 9 | 8 |
| 7 | 6 | 2 | 3 | 3 | 4 | 7 |
Note that you can "project" any of the fieldnames (columns) by including them
in a list in place of the "*" in the select statement.
If necessary to avoid ambiguity, you can provide an alias name for each tablename,
and use the alias as part of the fieldname.
For example, the following SQL query aliases Players as "a" and Teams as "b" and
uses the aliases to uniquely identify fieldnames (even though the fieldnames
in THIS example happen to all be unique):
In additon, many examples of using SQL with CLSD data are presented in:
Using SQL within CLSD.
select
a.Player, b.t_num
from
Players a, Teams b
where
a.Teamnumber = b.t_num
An alternate syntax for this join operation would look like:
select
a.Player, b.t_num
from
Players a
join
Teams b
on
a.Teamnumber = b.t_num
Note that these commands can be entered on a single line, but have been placed
on multiple lines in these examples for readability.
Additional information
There are many sources of additional information about SQL, including some
good online tutorials:



