DrugDosing | ||
Dosing | DoseMg | Frequency |
D1 | 30 | OD |
D2 | 10 | OD |
D3 | 200 | TDS |
We have already shown a simple query that 'pulls out' all the data in a table:
select * from DrugDosing;
You can see that here the star (*) acts as a wild card, selecting all of the columns in the table. It doesn't take a great leap of imagination to work out how to select individual columns:
select DoseMg, Dosing from DrugDosing;
.. will select the two columns specified, and present the information for all rows but just those two columns. Also see how we've specified the columns in a different order from the one used in the original data table definition. The information that is extracted will be presented in the new column order:
|
How do we select individual rows? Twice before, we've already come across the where command - when we talked about deletion of rows (remember the catastrophe that results if you leave out where?), and also in our discussion of the update command. Here's another use of where:
select DoseMg, Dosing from DrugDosing where Frequency = 'OD' ;
|
See how the condition (Frequency = 'OD') didn't have to involve one of the columns selected.
select Frequency from DrugDosing;
.. do we get two rows or three? Let's see:
|
.. three rows as you expected. But let's say we only want all the distinct rows, with no duplication. Then we say:
select distinct Frequency from DrugDosing;
.. and we get:
|
select DoseMg / 2 from DrugDosing;
.. and we would be given a table with dose values divided by two. With more complex expressions, our "usual" precedence rules (BODMAS) apply - when there is any possibility of doubt, use parentheses. There is also a unary minus that gives the negative of a value (e.g. select - DoseMg .. would give the values -30, -10 and -200).
First note that when we said
.. where Frequency = 'OD' ;
we used the "=" operator to test for equality (We've previously mentioned why some seasoned (heh) programmers might find this strange). There is a number of other operators you will come across. Here they are:
Operator | Meaning |
<> | inequality - the two items are not equal (avoid operators like != and ^= which only work on some systems) |
< | Less than |
> | Greater than |
<= >= | Less than or equal to, and greater than or equal to, respectively |
between firstvalue and secondvalue | Is the value between the two stated limits? Note that the limits are NOT included, i.e. 4 is not between four and ten. |
where Frequency in ('OD', 'BD', 'TDS')
will evaluate as true if the value for Frequency is say 'OD', but will be false if Frequency is 'QID'.
Another extremely useful operator is like. In an analogy of the way we used the star character (*) above to represent any or all columns, like works with wildcard characters. The wildcards used by like in standard SQL are:
% | which is a short way of saying "give me any substring of zero or more characters"; and |
_ | which is our way of asking for exactly one character of any description. |
For example
.. where Exclamation like 'sh_t';
would match up with the words 'shot', 'shut' and even 'shxt', while
.. where Exclamation like 'sh%t';
would match a whole lot more - 'sht', 'shot', 'shoot', 'shxxxxxxxxt' and so on! Note that some obscene flavours of SQL use other wildcards.
Similar to like is something called instr. The most significant difference is that we have become used to the notation like uses:
something like somethingelse
This is called infix notation. But instr doesn't work this way - it is a function that returns a result, and takes two arguments within brackets, thus:
instr ( wheretosearch, whattosearchfor)
So, for example, if we wanted to test whether a string contains the letters 'it', we could say something along the lines of:
.. where instr ( Exclamation, 'it' ) > 0 ;
This would work for Exclamations such as 'Oh quoit', 'it works', and
'dammit'.
(Although by now, the C programmers are again clutching their
foreheads, as they have just deduced that, yes, SQL starts counting
the position of substrings at 1, and not zero as would any modern
computer language).
The important question you have to ask yourself is "Will this work for
the Exclamation 'I hate IT!' ?"
Case Sensitivity
The short answer is 'No' {check on how the different SQLs handle things}
Most SQL applications are, at least in part, case sensitive. It's always
best to assume case sensitivity, and compensate for this. SQL has at
least two functions available that allow case conversions:
{check that these aren't Oracle pty ????????????}
To ensure that we picked up lower and upper case It's, we'd have to say:
.. where instr ( upper(Exclamation), 'IT' ) > 0 ;
or, the equally valid:
.. where instr ( lower(Exclamation), 'it' ) > 0 ;
What about !*&"@# %_special characters_% ?
Most special characters can be put in between quotes, and are handled
just fine. The only two you really need to worry about are percent
and underscore. SQL has a clumsy convention for representing these.
The convention is as follows:
An example will clear things up. The phrase:
.. where Whatever like '%50^%' escape '^'
should match values of Whatever such as 'A 50% increase',
and '50% of them came' but not 'I turn 50 today'. The explanation is
that we chose '^' as the escape character, so like '%50^%'
translates as "give me anything at all, followed by the characters five
and then zero and then a percent sign" and not "give me anything
followed by five and then zero and then anything again".
Boolean logic in SQL
SQL supports the usual Boolean logic that we associate with computers
(although it doesn't have an "exclusive or" function). You can therefore
combine multiple conditions in one where statement. In short,
A and B
will succeed (evaluate to true) if and only if both A and B are true;
A or B
will succeed if either A or B is true, or indeed if both are true;
not B
will evaluate to true only if B is not true. Not reverses the sense (truth) of an assertion!
And that's really that. The only other thing you need to know is that you can use parenthesis to group together logical statements into "compound statements" such as:
A and ( C or D )
where (C or D) will be evaluated first, and then the result will be anded together with A. A rather silly example is:
select DoseMg from DrugDosing where Frequency = 'OD' and (DoseMg = 30 or DoseMg = 10);
{Need to explore this in detail over here}
Practically, we say:
select DoseMg, Dosing from DrugDosing order by DoseMg;
and we would hope to obtain..
|
One can also sort in descending order, using the desc operator:
select DoseMg, Dosing from DrugDosing order by DoseMg desc;
(The default is asc for an ascending sort). There is something to be strongly disparaged over here. You can actually specify the column number (eg 1) instead of the column name, but there are two good reasons why you shouldn't:
You can even sort on several columns (first sort by the first column mentioned, then sort by the next one, and so on). All you need do is list the columns after the order by statement..
select DoseMg, Dosing from DrugDosing order by Frequency, DoseMg ;
Function | What it does |
avg | Take the average (mean) of a set of numeric data |
min | Return the smallest of a set of data |
max | Like min, but return the largest value |
count | Count the number of rows; note that you can pass any column argument to count, and it still returns the number of rows; here is one case where count(1) is of value, rather than annoying our friend Dr Codd! Much more sexy is the ability to count distinct rows, for example one might say count (distinct whatever ), and the number of distinct items in the whatever column will be returned to you! |
sum | Add up the values in a column |
(many other functions) | Beware of proprietary functions that lock you into a particular vendor's product! |
With this under our belt, let's explore groups. The basic idea is:
select Frequency, avg(DoseMg), min(DoseMg), max(DoseMg) from DrugDosing
group by Frequency ;
If we apply this command to our boring old table:
DrugDosing | ||
Dosing | DoseMg | Frequency |
D1 | 30 | OD |
D2 | 10 | OD |
D3 | 200 | TDS |
Then we'll get something along the lines of:
|
See how using group by Frequency allows us to break up the table into sub-groups, and then (using our new-found knowledge of group functions) extract summary information about each group!
Take note that if you have an 'single' column (one that doesn't
describe a group statistic) in the list you provide to
select (Frequency, in the example above) and you don't include this
column name in the group by section, then you'll get an error!
Our example above is trivial, but the group by function is
extremely powerful. You can even specify several columns, to aggregate
subsets of subsets..
Having
Similar to where, having is only applied after the select
statement has been processed, and is used to ELIMINATE rows from the selection.
Generally you should use where, because it's more efficient. When,
you ask yourself, should you use having?
Think about the case where we need to make a decision based on an aggregate function (performed on a group). You can only make the decision to select certain records based on the result of the aggregate function once it's been done, so where is useless for this purpose. Enter (taraa!) having. Here's an example:
select Frequency, avg(DoseMg), min(DoseMg), max(DoseMg) from DrugDosing
group by Frequency
having count(1) > 1 ;
Although again trivial, this example illustrates how you can use having to eliminate subgroups - if there is only one item in a subgroup, it is now eliminated (presumably because the average, max, and min of a single item is of little interest to us)! Note how we use count(1) - this is convenient and quick, as discussed above.
|
|
You might think that a natural extension of the good old select statement is the following:
select * from DrugRegimen, DrugDosing;
and you would be perfectly correct, but what does the above statement give us when we actually use it? Here we go..
|
Wooops! Every single row of the first table has been joined with each and every row of the second table, not just the rows that we think should correspond! (This is called a Cartesian join or cross join, and can rapidly generate enormous tables - as Ladányi points out, if you perform a cross join on three tables, each with a thousand rows, then - voila - you have 1000 * 1000 * 1000 = one billion rows, enough to bring most databases to their knees).
For our purposes, most of the rows in the above cross join are meaningless, but we can easily reduce the rows to only those we are interested in. We simply use a where statement to join the tables on the Dosing column, thus:
select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing = DrugDosing.Dosing;
|
It's so important to always have a where condition with your Cartesian joins, let's make it into a rule:
Also note that the two tables each had a column with the same name - "Dosing". We easily sidestepped this one by simply talking about DrugRegimen.Dosing = DrugDosing.Dosing, rather than, say, Dosing = Dosing, which would have forced an error! Needless to say, you can select individual columns from the cross join, rather than having to say select *.
Needless to say, few vendors have stuck to the SQL-92 standard as regards outer joins. For example, Oracle sneaks three tiny characters into the where statement thus:
select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing (+) = DrugDosing.Dosing;
The (+) tells SQL to "join in a NULL row if you can't find anything that matches a problem row in DrugDosing.Dosing " - all very convenient, but not standard SQL-92. Also note that the (+) is on the same side of the equals sign as the table that is 'augmented', not the one that's causing the problem. It should be clear why this is called a left outer join, and
select * from DrugRegimen, DrugDosing
where DrugRegimen.Dosing = DrugDosing.Dosing (+);
.. is a right outer join.
{Here explore the SQL-92 standard, and other ways it's not been implemented}
Many vendor SQLs also do not implement the SQL standard for a full outer join, that lists all rows (whether matched or not) from all tables. The SQL-92 syntax for the from clause is:
from table1 FULL OUTER JOIN table2
There are other ways of achieving a full outer join in most SQLs. Remember that the way to avoid all this is to meticulously enforce constraints on integrity! Also note that an outer join will NOT help you if there are duplicate entries in one of the tables you are using for the join (which can only occur in a 'relationally challenged' database).
{Could here have pix}
The set operators in SQL are based on the same principles, except they don't have a complement, and can determine the 'difference' between two sets. Here are the operators which we apply to combine two queries:
These are powerful ways of manipulating information, but take note: you can only apply them if the results of the two queries (that are going to be combined) have the same format - that is, the same number of columns, and identical column types! (Although many SQLs try to be helpful by, for example, coercing one data type into another, an idea which is superficially helpful and fraught with potential for errors). The general format of such queries is illustrated by:
An outer join could be used (with modification for NULLs if these little monstrosities are present) to achieve the same result as except.
Similarly, an inner join (with select distinct) can do what intersect does.
Set operators can be combined (as you would expect when playing around with sets) to achieve results that simply cannot be obtained using a single set operator.
Note that there are some restrictions on using order by with
set operators - order by may only be used once, no matter how
big the compound statement, and the select list must contain the
columns being used for the sort.
Pseudoset operators
Not content with implementing set operators, SQL database creators have
also introduced what are called "pseudoset operators". These operators
don't fit conveniently into set theory, because they allow multiple rows
(redundancies) which are forbidden in true sets.
We use the pseudoset operator union all to combine the outputs of two queries (all that is done is that the results of the second query are appended to the results of the first). Union all does exactly what we required from a FULL OUTER JOIN, which as we've already mentioned, is not implemented in many nominally "SQL-92 compliant" databases!
select * from tablename
where value
>
( insert select statement here);
Note that in the above query, the inner select statement must return just one value (for example, an average). There are other restrictions - the subquery must be in parenthesis, and it must be on the right side of the conditional operator (here, a greater than sign). You can use such subqueries with =, >, <, >=, <= and <>, but not {to the best of my knowledge?} with between .. and.
Multiple select subqueries can be combined (using logical operators) in the same statement, but avoid complex queries if you possibly can!
select * from tablename
where value
in
( insert select statement here);
The assumption is that the nested select statement returns a list. The outer shell of the statement can then use in to get cracking on the list, looking for a match of value within the list! There is a surprisingly long list of operators that resemble in, and can be used in a similar fashion. Here it is:
Operator | What it does |
not in | There is no match with any value retrieved by the nested select statement. |
in | We know how this works. Note that = any is a synonym for in that you'll sometimes encounter! |
> any | The value is greater than any value in the list produced by the inner submit statement. This is a clumsy way of saying "Give me the value if it's bigger than the smallest number retrieved"! |
>= any
< any <= any | Similar to >. Usage should be obvious. |
> all | Compare this with > any - it should be clear that the condition will only succeed if the value is bigger than the largest value in the list returned by the inner select statement! |
>= all
< all <= all | If you understand > all, these should present no problem! |
= all | You're not likely to use this one much. It implies that (to succeed) all the values returned by the inner subquery are equal to one another and the value being tested! |
(It is even possible in some SQL implementations to retrieve a 'table' (multiple columns) using the inner select statement, and then use in to simultaneously compare multiple values with the rows of the table produced by this inner select. You'll probably never need to use something like this. Several other tricks are possible, including the creation of virtual views by using a subquery [See Ladányi p 409 if you're interested]. Views are discussed in the next section.)
DrugDosing | ||
Dosing | DoseMg | Frequency |
D1 | 30 | OD |
D2 | 10 | OD |
D3 | 200 | TDS |
Let's say we wanted (for some obscure reason) all doses that are greater than average dose, for each dosing frequency. [Meaningless, but it serves the purposes of illustration].
select DoseMg, Frequency
from DrugDosing fred
where DoseMg >
(select avg(DoseMg)
from DrugDosing
where Frequency = fred.Frequency) ;
The sense of this statement should be clear - we use the outer select to choose a row (into which we put DoseMg, and Frequency). We then check whether this row is a candidate (or not) using the where statement. What does the where statement check? Well, it makes sure that DoseMg is greater than a magic number. The magic number is the average dose for a particular Frequency, the frequency associated with the current row. The only real trickery is how we use the label fred to refer to the current line from within the inner select statement. This label fred is called an alias. We'll learn a lot more about aliases later (Some will argue that aliases are so important you should have encountered them long before, but we disagree).
Correlated subqueries are not the only way of doing the above. Using a temporary view is often more efficient, but it's worthwhile knowing both techniques. We discuss views next.
create view nameofview as
select here have details of select statement
A variant that you will probably use rather often is:
create or replace view nameofview as
select here have details of select statement
(Otherwise you have to explicitly destroy a view - SQL won't simply overwrite a view without the or replace instruction, but will instead give you an irritating error).
Remember that if you alter the view, you alter the underlying table at the same time!
You cannot use an order by statement (or something else called a for update clause) within a view. There is a whole lot of other convenient things you can do to views. Where you include summary statistics (eg count, sum, etc) in a view it is termed an aggregate view. Likewise, using distinct, you can have a view on the possible values of a column or grouping of columns.
You can even create a view that is derived from several tables (A multi-table view). This is extremely sneaky, as you can largely avoid complex join statements in code which pulls data out of several tables! Ladányi puts things rather well:
"Pre-joined and tested views reduce errors .. that subtly or obviously undermine the accuracy of reports, and thus the credibility and subsequent professional well-being of the people creating them".
It is also obvious how we limit access to certain rows - we use a where clause that only includes the rows we want in the view. Note that (depending on your selection criterion) it is possible to insert a row into a view (and thus the underlying database) and then not be able to see this row in the view! With (in)appropriate selection criteria for the view, one can also alter the properties of rows visible in the view so that they now become hidden!
More draconic constraints are possible. The option
with read only
.. prevents any modifications to the view (or the underlying database); while with check option prevents you from creating rows in the view that cannot be selected (seen) in the view itself. If you use the "with check option", then you should follow this with a name, otherwise SQL will create an arbitrary and quite meaningless name for the constraint that will only confuse you when an error occurs!
{Does SQL92/98 support the "comment on" facility for views??}
An under-utilised but rather attractive use of views is to make them based on set (or pseudo-set) operators, for example the union of two sets.
{ Note: have something on exists and not exists; also need to talk about recursive joins and tree queries (a la SQL92 and also eg Oracle) }
Next - SQL frills, bells & whistles | Home |
Date of First Draft: 2001/11/11 | Date of Last Update: 2001/-/- | Web page author: jo@anaesthetist.com |