Queries in SQL

As we've already mentioned, the heart of SQL is indeed the detailed and satisfying queries that one can generate. Here we look at such queries in all (well, quite a lot of) their magnificent complexity.

Simple select queries, where, and distinct

Look at the following table:

DrugDosing

Dosing DoseMg Frequency

D1 30 OD

D2 10 OD

D3 200 TDS

We have already shown a simple query that 'pulls out' all the data in a table:

select * from DrugDosing;

You can see that here the star (*) acts as a wild card, selecting all of the columns in the table. It doesn't take a great leap of imagination to work out how to select individual columns:

select DoseMg, Dosing from DrugDosing;

.. will select the two columns specified, and present the information for all rows but just those two columns. Also see how we've specified the columns in a different order from the one used in the original data table definition. The information that is extracted will be presented in the new column order:

DOSEMG	DOSING

30	D1
10	D2
200	D3

How do we select individual rows? Twice before, we've already come across the where command - when we talked about deletion of rows (remember the catastrophe that results if you leave out where?), and also in our discussion of the update command. Here's another use of where:

select DoseMg, Dosing from DrugDosing where Frequency = 'OD' ;

DOSEMG	DOSING

30	D1
10	D2

See how the condition (Frequency = 'OD') didn't have to involve one of the columns selected.

Distinct rows

What happens if we say ..

select Frequency from DrugDosing;

.. do we get two rows or three? Let's see:

FREQUENCY

TDS

.. three rows as you expected. But let's say we only want all the distinct rows, with no duplication. Then we say:

select distinct Frequency from DrugDosing;

.. and we get:

FREQUENCY

TDS

Very simple arithmetic

SQL allows you to perform basic arithmetic on numeric rows. You can add, subtract, multiply and divide either by the value in another column, or by a constant. Let's say that we wanted to list all the drug doses halved. We could say:

select DoseMg / 2 from DrugDosing;

.. and we would be given a table with dose values divided by two. With more complex expressions, our "usual" precedence rules (BODMAS) apply - when there is any possibility of doubt, use parentheses. There is also a unary minus that gives the negative of a value (e.g. select - DoseMg .. would give the values -30, -10 and -200).

Conditions and Logic

We've already encountered the where statment (in the previous section). Let's look at it in a bit more detail.

First note that when we said

.. where Frequency = 'OD' ;

we used the "=" operator to test for equality (We've previously mentioned why some seasoned (heh) programmers might find this strange). There is a number of other operators you will come across. Here they are:

Operator	Meaning
<>	inequality - the two items are not equal (avoid operators like != and ^= which only work on some systems)
<	Less than
>	Greater than
<= >=	Less than or equal to, and greater than or equal to, respectively
between firstvalue and secondvalue	Is the value between the two stated limits? Note that the limits are NOT included, i.e. 4 is not between four and ten.

In, like and instr

There are other (rather sneaky operators). One of these is in. This looks for an item in a list that you have provided. For example

where Frequency in ('OD', 'BD', 'TDS')

will evaluate as true if the value for Frequency is say 'OD', but will be false if Frequency is 'QID'.

Another extremely useful operator is like. In an analogy of the way we used the star character (*) above to represent any or all columns, like works with wildcard characters. The wildcards used by like in standard SQL are:

%	which is a short way of saying "give me any substring of zero or more characters"; and
_	which is our way of asking for exactly one character of any description.

For example

.. where Exclamation like 'sh_t';

would match up with the words 'shot', 'shut' and even 'shxt', while

.. where Exclamation like 'sh%t';

would match a whole lot more - 'sht', 'shot', 'shoot', 'shxxxxxxxxt' and so on! Note that some obscene flavours of SQL use other wildcards.

Similar to like is something called instr. The most significant difference is that we have become used to the notation like uses:

something like somethingelse

This is called infix notation. But instr doesn't work this way - it is a function that returns a result, and takes two arguments within brackets, thus:

instr ( wheretosearch, whattosearchfor)

So, for example, if we wanted to test whether a string contains the letters 'it', we could say something along the lines of:

.. where instr ( Exclamation, 'it' ) > 0 ;

This would work for Exclamations such as 'Oh quoit', 'it works', and 'dammit'.
(Although by now, the C programmers are again clutching their foreheads, as they have just deduced that, yes, SQL starts counting the position of substrings at 1, and not zero as would any modern computer language).
The important question you have to ask yourself is "Will this work for the Exclamation 'I hate IT!' ?"

Case Sensitivity

The short answer is 'No' {check on how the different SQLs handle things} Most SQL applications are, at least in part, case sensitive. It's always best to assume case sensitivity, and compensate for this. SQL has at least two functions available that allow case conversions:

upper (string) converts string to upper case;
lower (string) is similar, converting to lower case.

{check that these aren't Oracle pty ????????????}

To ensure that we picked up lower and upper case It's, we'd have to say:

.. where instr ( upper(Exclamation), 'IT' ) > 0 ;

or, the equally valid:

.. where instr ( lower(Exclamation), 'it' ) > 0 ;

What about !*&"@# %_special characters_% ?

Most special characters can be put in between quotes, and are handled just fine. The only two you really need to worry about are percent and underscore. SQL has a clumsy convention for representing these. The convention is as follows:

Choose some infrequently used character as an 'escape character';
Insert the escape character in the string just before the special character (_ or %) you want to represent literally, rather than using it as a wildcard;
Tell SQL that you have just used an escape character, using the escape command.

An example will clear things up. The phrase:

.. where Whatever like '%50^%' escape '^'

should match values of Whatever such as 'A 50% increase', and '50% of them came' but not 'I turn 50 today'. The explanation is that we chose '^' as the escape character, so like '%50^%' translates as "give me anything at all, followed by the characters five and then zero and then a percent sign" and not "give me anything followed by five and then zero and then anything again".

Boolean logic in SQL

SQL supports the usual Boolean logic that we associate with computers (although it doesn't have an "exclusive or" function). You can therefore combine multiple conditions in one where statement. In short,

A and B

will succeed (evaluate to true) if and only if both A and B are true;

A or B

will succeed if either A or B is true, or indeed if both are true;

not B

will evaluate to true only if B is not true. Not reverses the sense (truth) of an assertion!

And that's really that. The only other thing you need to know is that you can use parenthesis to group together logical statements into "compound statements" such as:

A and ( C or D )

where (C or D) will be evaluated first, and then the result will be anded together with A. A rather silly example is:

select DoseMg from DrugDosing where Frequency = 'OD' and (DoseMg = 30 or DoseMg = 10);

A note on having

There's another way of selecting things that is superficially similar to where. It's called having and differs in that it's only applied after the select has been performed. We will only consider having later, because it's mainly used with group functions, which we haven't discussed yet!

Selecting between multiple options - the SQL92 case statement

This has nothing whatsoever to do with case sensitivity! The SQL92 case statement is vaguely similar to the switch statements of languages such as C++ and Java - it allows you to make one of many choices, based on a single datum value. (To confuse you further, single options within a switch statement in these languages start with the keyword case. To really confuse you, Visual Basic has a select .. case structure that broadly corresponds to the switch statement)!

{Need to explore this in detail over here}

Ordering and grouping data

Sorting results

One can also sort the output from a select statement. The default is to sort in ascending order - small numbers come before larger ones. What happens with text? Well, as you might expect, the sort is alphabetical (A..Z) but take note that all characters (including special ones like percentage, full stops [periods], and blanks) are actually sorted according to their ASCII values. This may cause confusion, for example a blank (ASCII code 32 decimal) comes before 'A' (ASCII code 65 decimal). {check up on details of SQL sorting algos}.

Practically, we say:

select DoseMg, Dosing from DrugDosing order by DoseMg;

and we would hope to obtain..

DOSEMG	DOSING

10	D2
30	D1
200	D3

One can also sort in descending order, using the desc operator:

select DoseMg, Dosing from DrugDosing order by DoseMg desc;

(The default is asc for an ascending sort). There is something to be strongly disparaged over here. You can actually specify the column number (eg 1) instead of the column name, but there are two good reasons why you shouldn't:

It violates Dr Codd's rule number nine;
With most computer languages, we count starting from zero. In SQL, the counting is from one - column number 1 is the leftmost column!! A sad relic.

You can even sort on several columns (first sort by the first column mentioned, then sort by the next one, and so on). All you need do is list the columns after the order by statement..

select DoseMg, Dosing from DrugDosing order by Frequency, DoseMg ;

A further note..

Remember how we could say select .. where and mention columns in the where statement even though they didn't appear in the final output? The same applies to sort! You can sort on a column even if the result doesn't contain that column.

Grouping Data

We've already encountered the distinct command, which allows us to look at all the rows we've generated, and select out those which are distinct. There are other ways of grouping rows together. Before we learn these, let's find out about a few functions that deal with groups. Here they are:

Function	What it does
avg	Take the average (mean) of a set of numeric data
min	Return the smallest of a set of data
max	Like min, but return the largest value
count	Count the number of rows; note that you can pass any column argument to count, and it still returns the number of rows; here is one case where count(1) is of value, rather than annoying our friend Dr Codd! Much more sexy is the ability to count distinct rows, for example one might say count (distinct whatever ), and the number of distinct items in the whatever column will be returned to you!
sum	Add up the values in a column
(many other functions)	Beware of proprietary functions that lock you into a particular vendor's product!

With this under our belt, let's explore groups. The basic idea is:

select Frequency, avg(DoseMg), min(DoseMg), max(DoseMg) from DrugDosing
group by Frequency ;

If we apply this command to our boring old table:

DrugDosing

Dosing DoseMg Frequency

D1 30 OD

D2 10 OD

D3 200 TDS

Then we'll get something along the lines of:

FREQUENCY	AVG(DOSEMG)	MIN(DOSEMG)	MAX(DOSEMG)

OD	20	10	30
TDS	200	200	200

See how using group by Frequency allows us to break up the table into sub-groups, and then (using our new-found knowledge of group functions) extract summary information about each group!

Take note that if you have an 'single' column (one that doesn't describe a group statistic) in the list you provide to select (Frequency, in the example above) and you don't include this column name in the group by section, then you'll get an error! Our example above is trivial, but the group by function is extremely powerful. You can even specify several columns, to aggregate subsets of subsets..

Having

Similar to where, having is only applied after the select statement has been processed, and is used to ELIMINATE rows from the selection. Generally you should use where, because it's more efficient. When, you ask yourself, should you use having?

Think about the case where we need to make a decision based on an aggregate function (performed on a group). You can only make the decision to select certain records based on the result of the aggregate function once it's been done, so where is useless for this purpose. Enter (taraa!) having. Here's an example:

select Frequency, avg(DoseMg), min(DoseMg), max(DoseMg) from DrugDosing
group by Frequency having count(1) > 1 ;

Although again trivial, this example illustrates how you can use having to eliminate subgroups - if there is only one item in a subgroup, it is now eliminated (presumably because the average, max, and min of a single item is of little interest to us)! Note how we use count(1) - this is convenient and quick, as discussed above.

Joins (multiple tables)

Remember the tables we discussed when we were talking about foreign keys, and even earlier when we talked about common sense in normalising data, (not that we are displaying conspicuous amounts of this with the following tables which are, after all, only for demonstration purposes)! How does one amalgamate the data in the tables? (In dataspeak, we call the relationship between the tables master-detail, or sometimes, parent-child). Well, let's look at the tables..

DrugRegimen
Regimen	Drug	Dosing
R1	Carbimazole	D1
R2	Carbimazole	D2
R3	Carbamazepine	D3

DrugDosing
Dosing	DoseMg	Frequency
D1	30	OD
D2	10	OD
D3	200	TDS

You might think that a natural extension of the good old select statement is the following:

select * from DrugRegimen, DrugDosing;

and you would be perfectly correct, but what does the above statement give us when we actually use it? Here we go..

REGIMEN	DRUG	DOSING	DOSING	DOSEMG	FREQUENCY

R1	CARBIMAZOLE	D1	D1	30	OD
R2	CARBIMAZOLE	D2	D1	30	OD
R3	CARBAMAZEPINE	D3	D1	30	OD
R1	CARBIMAZOLE	D1	D2	10	OD
R2	CARBIMAZOLE	D2	D2	10	OD
R3	CARBAMAZEPINE	D3	D2	10	OD
R1	CARBIMAZOLE	D1	D3	200	TDS
R2	CARBIMAZOLE	D2	D3	200	TDS
R3	CARBAMAZEPINE	D3	D3	200	TDS

Wooops! Every single row of the first table has been joined with each and every row of the second table, not just the rows that we think should correspond! (This is called a Cartesian join or cross join, and can rapidly generate enormous tables - as Ladányi points out, if you perform a cross join on three tables, each with a thousand rows, then - voila - you have 1000 * 1000 * 1000 = one billion rows, enough to bring most databases to their knees).

For our purposes, most of the rows in the above cross join are meaningless, but we can easily reduce the rows to only those we are interested in. We simply use a where statement to join the tables on the Dosing column, thus:

select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing = DrugDosing.Dosing;

REGIMEN	DRUG	DOSING	DOSING	DOSEMG	FREQUENCY

R1	CARBIMAZOLE	D1	D1	30	OD
R2	CARBIMAZOLE	D2	D2	10	OD
R3	CARBAMAZEPINE	D3	D3	200	TDS

It's so important to always have a where condition with your Cartesian joins, let's make it into a rule:

If the from clause in a select statement has a comma in it, check the where clause.
Then check the where clause again. And again! ... Rule #4.

Also note that the two tables each had a column with the same name - "Dosing". We easily sidestepped this one by simply talking about DrugRegimen.Dosing = DrugDosing.Dosing, rather than, say, Dosing = Dosing, which would have forced an error! Needless to say, you can select individual columns from the cross join, rather than having to say select *.

Inner versus Outer Join

The above is an example of an inner join. What this means is that if, for every value in the DrugRegimen.Dosing column, there's a corresponding value in the DrugDosing.Dosing column, and vice versa, then everything's fine. However, if (due to some silly person not enforcing relational integrity) there is no matching value in the corresponding column, the whole row with its unmatched value will disappear from the final report - it will softly and suddenly vanish away! Apart from being a goad to ensure relational integrity in all of your databases, this should alert you to the possibility that you might trustingly run a query on a database, and get complete garbage out, because you used an inner join! The solution is an outer join.

Needless to say, few vendors have stuck to the SQL-92 standard as regards outer joins. For example, Oracle sneaks three tiny characters into the where statement thus:

select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing (+) = DrugDosing.Dosing;

The (+) tells SQL to "join in a NULL row if you can't find anything that matches a problem row in DrugDosing.Dosing " - all very convenient, but not standard SQL-92. Also note that the (+) is on the same side of the equals sign as the table that is 'augmented', not the one that's causing the problem. It should be clear why this is called a left outer join, and

select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing = DrugDosing.Dosing (+);

.. is a right outer join.

{Here explore the SQL-92 standard, and other ways it's not been implemented}

Many vendor SQLs also do not implement the SQL standard for a full outer join, that lists all rows (whether matched or not) from all tables. The SQL-92 syntax for the from clause is:

from table1 FULL OUTER JOIN table2

There are other ways of achieving a full outer join in most SQLs. Remember that the way to avoid all this is to meticulously enforce constraints on integrity! Also note that an outer join will NOT help you if there are duplicate entries in one of the tables you are using for the join (which can only occur in a 'relationally challenged' database).

How to combine queries using set operators

Early on in school you were probably encouraged to draw "Venn diagrams" showing intersecting sets, and then colour in the union or intersection of sets, or even the complement of a set (everything outside the set).

{Could here have pix}

The set operators in SQL are based on the same principles, except they don't have a complement, and can determine the 'difference' between two sets. Here are the operators which we apply to combine two queries:

union - all elements in both queries are returned;
intersect - elements common to both queries are returned;
except - elements in the first query are returned excluding any that were returned by the second query.

These are powerful ways of manipulating information, but take note: you can only apply them if the results of the two queries (that are going to be combined) have the same format - that is, the same number of columns, and identical column types! (Although many SQLs try to be helpful by, for example, coercing one data type into another, an idea which is superficially helpful and fraught with potential for errors). The general format of such queries is illustrated by:

select columns1 from table1
union
select columns2 from table2

Different strokes..

Different vendor implementations of SQL have abused the SQL-92 standard in different ways. For example, Oracle uses minus where SQL-92 uses except. {give other examples}.

An outer join could be used (with modification for NULLs if these little monstrosities are present) to achieve the same result as except.

Similarly, an inner join (with select distinct) can do what intersect does.

Set operators can be combined (as you would expect when playing around with sets) to achieve results that simply cannot be obtained using a single set operator.

Note that there are some restrictions on using order by with set operators - order by may only be used once, no matter how big the compound statement, and the select list must contain the columns being used for the sort.

Pseudoset operators

Not content with implementing set operators, SQL database creators have also introduced what are called "pseudoset operators". These operators don't fit conveniently into set theory, because they allow multiple rows (redundancies) which are forbidden in true sets.

We use the pseudoset operator union all to combine the outputs of two queries (all that is done is that the results of the second query are appended to the results of the first). Union all does exactly what we required from a FULL OUTER JOIN, which as we've already mentioned, is not implemented in many nominally "SQL-92 compliant" databases!

Subqueries

Wouldn't it be nice if you could perform one query, temporarily store the result(s), and then use this result as part of another query? You can, and the trickery used is called a subquery. The basic idea is that instead of a 'static' condition, you can insert a query as part of a where clause! An example is:

select * from tablename
where value
>
( insert select statement here);

Note that in the above query, the inner select statement must return just one value (for example, an average). There are other restrictions - the subquery must be in parenthesis, and it must be on the right side of the conditional operator (here, a greater than sign). You can use such subqueries with =, >, <, >=, <= and <>, but not {to the best of my knowledge?} with between .. and.

Multiple select subqueries can be combined (using logical operators) in the same statement, but avoid complex queries if you possibly can!

Subqueries that return multiple values

In the above, we made sure that our subquery only returned a single value. Can you think of a way you might use a subquery that returns a list of values? (Such a thing is possible)! Yes, you need an operator that works on lists. An example of such an operator is in, which we've encountered before. The query should look something like:

select * from tablename
where value
in
( insert select statement here);

The assumption is that the nested select statement returns a list. The outer shell of the statement can then use in to get cracking on the list, looking for a match of value within the list! There is a surprisingly long list of operators that resemble in, and can be used in a similar fashion. Here it is:

Operator	What it does
not in	There is no match with any value retrieved by the nested select statement.
in	We know how this works. Note that = any is a synonym for in that you'll sometimes encounter!
> any	The value is greater than any value in the list produced by the inner submit statement. This is a clumsy way of saying "Give me the value if it's bigger than the smallest number retrieved"!
>= any < any <= any	Similar to >. Usage should be obvious.
> all	Compare this with > any - it should be clear that the condition will only succeed if the value is bigger than the largest value in the list returned by the inner select statement!
>= all < all <= all	If you understand > all, these should present no problem!
= all	You're not likely to use this one much. It implies that (to succeed) all the values returned by the inner subquery are equal to one another and the value being tested!

(It is even possible in some SQL implementations to retrieve a 'table' (multiple columns) using the inner select statement, and then use in to simultaneously compare multiple values with the rows of the table produced by this inner select. You'll probably never need to use something like this. Several other tricks are possible, including the creation of virtual views by using a subquery [See Ladányi p 409 if you're interested]. Views are discussed in the next section.)

Correlated Subqueries

Whew, we're nearly finished with the subqueries, but there is one more distinct flavour! The correlated subquery is a nested select statement that can (using trickery) refer to the outer select statement containing it. By so doing, we can successively apply the inner select statement to each line generated by the outer statement! The trick that we use is to create an alias for the outer select statement, and then refer to this alias in the inner select statement, thus constraining the inner select to dealing with the relevant row. For an example, we return to our tedious drug table:

DrugDosing

Dosing DoseMg Frequency

D1 30 OD

D2 10 OD

D3 200 TDS

Let's say we wanted (for some obscure reason) all doses that are greater than average dose, for each dosing frequency. [Meaningless, but it serves the purposes of illustration].

select DoseMg, Frequency
from DrugDosing fred
where DoseMg >
(select avg(DoseMg)
from DrugDosing
where Frequency = fred.Frequency) ;

The sense of this statement should be clear - we use the outer select to choose a row (into which we put DoseMg, and Frequency). We then check whether this row is a candidate (or not) using the where statement. What does the where statement check? Well, it makes sure that DoseMg is greater than a magic number. The magic number is the average dose for a particular Frequency, the frequency associated with the current row. The only real trickery is how we use the label fred to refer to the current line from within the inner select statement. This label fred is called an alias. We'll learn a lot more about aliases later (Some will argue that aliases are so important you should have encountered them long before, but we disagree).

Correlated subqueries are not the only way of doing the above. Using a temporary view is often more efficient, but it's worthwhile knowing both techniques. We discuss views next.

Views

As the name suggests, a view gives a particular user access to selected portions of a table. A view is however more than this - it can limit the ability of a user to update parts of a table, and can even amalgamate rows, or throw in additional columns derived from other columns. Even more complex applications of views allow several tables to be combined into a single view!

How do we make a view?

Interestingly enough, you use a select statement to specify the view you wish to create. The syntax is:

create view nameofview as
select here have details of select statement

A variant that you will probably use rather often is:

create or replace view nameofview as
select here have details of select statement

(Otherwise you have to explicitly destroy a view - SQL won't simply overwrite a view without the or replace instruction, but will instead give you an irritating error).

Remember that if you alter the view, you alter the underlying table at the same time!

You cannot use an order by statement (or something else called a for update clause) within a view. There is a whole lot of other convenient things you can do to views. Where you include summary statistics (eg count, sum, etc) in a view it is termed an aggregate view. Likewise, using distinct, you can have a view on the possible values of a column or grouping of columns.

You can even create a view that is derived from several tables (A multi-table view). This is extremely sneaky, as you can largely avoid complex join statements in code which pulls data out of several tables! Ladányi puts things rather well:

"Pre-joined and tested views reduce errors .. that subtly or obviously undermine the accuracy of reports, and thus the credibility and subsequent professional well-being of the people creating them".

How do we limit access to a view?

This is implicit in the way we sneakily use a select statement - to limit access to certain columns, for example, we just select the column-names we want access to, and ignore the rest! There is of course a catch (Isn't there always?) - if you insert a row, then SQL doesn't know what to put into the column entry that's not represented in the view, so it will insert either NULL, or the default value for that column. (Likewise, with delete, the entire row will be deleted, even the column entry that is invisible).

It is also obvious how we limit access to certain rows - we use a where clause that only includes the rows we want in the view. Note that (depending on your selection criterion) it is possible to insert a row into a view (and thus the underlying database) and then not be able to see this row in the view! With (in)appropriate selection criteria for the view, one can also alter the properties of rows visible in the view so that they now become hidden!

More draconic constraints are possible. The option

with read only

.. prevents any modifications to the view (or the underlying database); while with check option prevents you from creating rows in the view that cannot be selected (seen) in the view itself. If you use the "with check option", then you should follow this with a name, otherwise SQL will create an arbitrary and quite meaningless name for the constraint that will only confuse you when an error occurs!

How do we remove a view?

{check this out. presumably same as dropping a table}

More things we can do with views

One trick is to create a "virtual view" and then use this as a "table" which you can update. Instead of specifying a table name, you specify (in parenthesis) the select statement that defines the "virtual view", and all actions are performed on this temporary table! This is particularly useful where you don't have the system authority to create a view, yet need the power of a view.

{Does SQL92/98 support the "comment on" facility for views??}

An under-utilised but rather attractive use of views is to make them based on set (or pseudo-set) operators, for example the union of two sets.

{ Note: have something on exists and not exists; also need to talk about recursive joins and tree queries (a la SQL92 and also eg Oracle) }

Next - SQL frills, bells & whistles

Home

Date of First Draft: 2001/11/11

Date of Last Update: 2001/-/-

Web page author: jo@anaesthetist.com