SQL - a worked example

Let's try and apply our SQL skills to a real-life scenario.

As we write the SQL code, we'll enter it into a well-formed SQL parser to ensure completeness. We'll actually create and test the database as we go along! Which SQL should you use? We have taken great pains to make all the following code completely compliant with the SQL-92 standard. Any program that complies with this standard should work. (We don't generally get into the fancy frills of SQL-99).

If you're using Windows, we strongly recommend that you get a copy of Ocelot SQL. This is the most compliant parser we've come across, and has numerous other merits, which we mention elsewhere. If you're working under Linux, then PostgreSQL should do the trick! (We might occasionally refer to Ocelot along the way, to ease your passage in certain areas).

We will go about things as follows:

  1. Define Our Task
  2. 'Attack' our problem - look at goals, resources, our mandate, and administrative issues
  3. Analyse the nature of the problem
  4. "Implementation Planning"
  5. Specify a normalised database
  6. Write the SQL
  7. Define intermediate tables, and initialise database
  8. Move data to our tables from the intermediate ones!
  9. Checks
  10. Extract data
  11. A note on Perl Scripts

Our Task

We're going to take the website www.appliance4u.com, which is made up of a hierarchy of pages. We'll look at the data representation in the site, and then turn this data into a normalised relational database. The site is not being held up as a shining example of how a website should be designed - it has many flaws - but it does provide a non-trivial substrate on which we can work.

If you glance through the site, you'll see that there is a lot of denormalised data. There are over three hundred complex pages, with (to make things worse) several hierarchical levels. Let's try and break all this complexity down, and create a normalised database.

Please note that we are not creating active server pages (ASPs) - all we are doing is extracting and re-synthesizing data. The data thus extracted could of course be used in an ASP environment, but this is not our primary goal!


Attacking the Problem

Our problem

We have a hierarchical mess of pages, with no underlying database structure.

Goal statement

We wish to convert all of this information to a normalised SQL database, to allow us to create web-pages at will from data drawn from the database. We will then also be able to update information readily. At present the updates are done through complex Perl scripts.

Resources

Mandate

We have a manadate from the owners of the site and pages to "do whatever it takes to create a normalised SQL database".

Administration

Database administration is by a small core of competent staff !

A Brief Analysis

A brief analysis of the 'business'

Okay, the business is an experimental web-site that links consumers and suppliers. The business relies on data supplied by manufacturers and intermediates, on suppliers who actually provide the goods and services, and consumers who needless to say, flock to the website. Profit at present derives from nowhere. In the long term potential options include advertisements on the site (which will be limited in order to keep bandwidth low, and not irritate consumers), selling the site, or careful negotiation with suppliers and/or manufacturers. Costs are kept to a minimum due to virtual web hosting, and lots of goodwill on all sides.

In more detail:


"Implementation planning" - look long and hard!

1. Look at the data

We now turn to the data currently available. We find to our horror that there are at least seven hierarchies of menus, thus:

After contemplating this horrorshow, we decide that we are (initially at least) going to target Levels 3, 4, and 6, marked with *stars above. This decision is because the other pages are incredibly static, and although it will be fairly easy to extend our data extraction and webpage synthesis to include these pages, there is little point. We target the pages that will change the most - those with product and supplier details.

2. Individual levels

To facilitate our analysis, we'll look at our target levels:

2.1 Level 3: Summary tables

At this level, there is a further complication. Some tables simply allocate one row per item for example microwaves, while others, because the images are taller than they are wide, use a more complex format, (for example Stoves). We will call these formats Formats H (horizontal) and V (vertical) respectively. Apart from this wrinkle, the summary tables are all rather similar. They contain the following data items:

2.2 Level 4: Individual items in detail

These pages are in some ways simpler. They have fewer areas, thus:

But note that each component of the lists of item characteristics is a unique individual property. More of this later!

2.3 Level 6: Local suppliers

Data items are here well-defined and simple:

2.4 Planning for other levels

But wait a bit. Even if we're not going to actually re-create the pages on other levels, we need to plan for them. The important component here is to identify the level of each product within the hierarchy. We will be able to implement this level identification mainly through knowing the category, subcategory and sub-subcategory of each appliance.

3. Extract the data

Now, it's probably possible to read the individual tables into SQL, and, using fancy extended character manipulation within your variant of SQL, extract all the relevant data. Only a madman would try this! (Refer to my rule number six). We will use good old trusty Perl to do the data extraction for us. We will:
  1. Use a Perl script to extract all the relevant details for each individual appliance from both tables relevant to that appliance - the level 3 summary table, and the level 4 item detail. The Perl script will (magically) combine this information into a comma-delimited flat file, containing the information in these (seventy-one) columns:

  2. Use a Perl script to extract all relevant details for each supplier, thus:

4. Scrub the Data

We won't dwell on this. Suffice it to say (at this stage) that a lot of the scrubbing should be done at a Perl level (so you know precisely where the problem is), but we could check, for example, that vital information is defined. This vital information includes most of our data, perhaps apart from more than 1 summary feature, 1 left column detail, and the (optional) bottom section items.

5. Define the Normalised Database

We've allocated a whole separate section to this, below.

6. Define the format of exported Data

In order to re-create our tables from the normalised data, we need three things:
  1. Appropriate data extraction into tables;
  2. A Perl script to create HTML documents from the extracted table data;
  3. To then move the newly-created HTML documents to their correct places in the tree structure;
The last two functions will be performed by customised Perl scripts. {have xref to bottom of page where they are referenced}. The data extraction is done within SQL, as we shall see later.

A Normalised Database

Let's progressively normalise our data. We will start with the columns provided by the Perl script that pulls out appliance information (one row of seventy-one columns for each appliance), and then move on to the supplier information provided by the second script.

1. Appliance Information

Let's look at this again, but with [maximum field sizes in square brackets]. We will generally be generous in our field size allocation, and use varchar data types unless otherwise stated.

  • Category [40]
  • Type [1] A or B (depends on picture dimensions, and category)
  • Brand [20] (Manufacturer)
  • Model [15]
  • Price [decimal max 999 999.99]
  • thumbnail - name [15]
  •       - width [decimal 3]
  •       - height [decimal 3]
  •       - alt tag [30]
  • Big picture - name [15]
  •       - width [decimal 3]
  •       - height [decimal 3]
  •       - alt tag [30]
  • Summary features 1..10 [40 each] (max of 10)
  • Left column item details - 1..30 [255 max each] (max of 30 items)
  • Bottom section item details - 1..15 [255 max each] (max of 15)
  • Unique sequence number

Before we start, let's agree on the colour coding used below:

Colour Coding Conventions
Primary key Foreign key Has default
Only blue fields can take on null values

Without further ado, let's pull out the tables we consider relevant. We won't painstakingly go through the denormalised appliance and dealer information, successively lopping off bits and normalising. We'll simply give you the tables we think are required, followed by some justification of our choices..

MANUFACTURER {Brand}
Name Type Size Description
ManufacturerCode decimal 4,0 Up to 9 999 different manufacturers, database generates code
ManufacturerName varchar 20 The actual brand (manufacturer) name

Think about the BRAND of an item (eg a Sony television set, or whatever). Although at present we don't have any more detail about the manufacturer on our website, this doesn't mean that we won't in the future. It certainly doesn't mean that we won't want to store say, Manufacturer contact numbers in our database! So, right at the start, let's pull out the "Brand", give it the more appropriate name of "Manufacturer", and create a separate table for it. Now, it's clumsy to have a long name (which may even change) as a key, so we'd rather generate an internal code, and use this code when we refer other tables to the Manufacturer!

CATEGORY {includes sub and sub-sub categories}
Name Type Size Description
CatCode decimal 4,0 Database generates code. We will use zero as the code for the "ROOT CATEGORY", that is appliances!
CatParent decimal 4,0 SELF-REFERENTIAL
CatName varchar 40 name to be used for the category/sub..category
CatDesc varchar 64 An optional text description of the category {unused at present}
CatAlign char 1 Options are 'H' and 'V', corresponding to the two different menu formats (horizontal and vertical) in level 3

Categories are such things as TV sets and washing machines, or even say "Fridges with top freezer, 400 to 549 L". They are a bit of a blighter. One category can depend on another, and finally we get a list of appliances within the category. An appliance can be the child of a category, a sub-category, or a sub-sub-category. In making this table, we've been a bit sneaky, and turned things around, by making the CatParent field refer up to another "super-category", and so on. The table actually self-referentially (recursively) talks about itself! We can use this trick to define a whole 'tree' of categories and sub- categories, and pull out the relationships between them when we need to!

One more wrinkle! Remember that there were two different types of level 3 menu - simple ones with the picture and text information about an appliance all in a row (we called these format H, for horizontal), and those with tall pictures, where the pictures were alternately on the left and right of the text (format V). Think about it - each of our categories that contains such a table will therefore fall into one of two groups (H, or V). If we ever come across a category where V or H doesn't apply, then we'll simply conveniently put in an 'X' (for "don't care").

MODEL {of appliance}
Name Type Size Description
ModelCode decimal 8,0 Up to 99 999 999 models [excessive!], database generates code
ModelName varchar 15 External serial number or code of model {is 15 enough?}
ModelDesc varchar 64 {not used at present ???}
ModelParent {AS for CATEGORY.CatCode: the immediate parent}
ManuCode {As for MANUFACTURER.ManufacturerCode}
Price decimal 8,2 Price max of 999 999.99

Model is a biggie. Again, rather than using an unreliable external code (which may even be changed by the manufacturer, although the specifications are for our purposes identical!), we generate a unique internal code. We also include an (optional) field for a model description, as we anticipate its usefulness, despite the lack of such information on the web-page.

A question. How can we signal that a price is INVALID? (For example, we know the price should be updated, but we dont have the datum to hand, and we must update the rest of the database). Setting the value to 0 is too ugly to be contemplated. Our solution will be to store a negative number, (minus the previous valid price), to signal to the database that [INVALID] must be signalled. This would trigger two things (a) Someone would get a gentle reminder to find out the true price, and (b) On the web page we would be more genteel with something like "Price on application"! Should we disallow price values below or above a certain value (have a range check)? We should certainly disallow zeroes!

What would happen if rampant inflation hit our third world country? Unlikely, but even if it did, we would just increase the size of the price field. This would not affect the data.

PICTURE - small and large pictures of appliance, {provisional - see below}
Name Type Size Description
PicCode decimal 8,0 Up to 99 999 999 distinct pictures, database-generated code
PicName varchar 20 actual name of picture, including .jpg or .gif suffix, but not the standard "images/" prefix ie path.
PicAltTag varchar 30 Description of the picture, as in HTML Alt tag
PicWidth decimal 3,0 pixel width of picture, up to 999 [?!]
PicHeight decimal 3,0 pixel height of picture, up to 999 [?!]
ModCode {As for MODEL.ModelCode}
PicUsage char 1 big or thumbnail, coded as B or T.

Note the similarities between big and small pictures. They both have names, widths, heights, and alt tags. So we store them in the same menu. It would seem crazy to do otherwise, but note that there is a teeny normalisation problem, as (if you go back and look at the alt tags on the web-page), the alt tags actually contain serial numbers of appliances! We won't fuss about this. What we will do is smarten up the Perl script that pulls the data out of the HTML pages. It simply ignores appliance serial numbers in alt tags when making our giant denormalised table of data. Then, when we re-generate the tags from the database, we insert these by default. {Needs fleshing out, lots of thought! Note: we might have a check on picture size, with maxima and minima for the two main types of picture. A minor frill}

FEATURE {listed item characteristics}
Name Type Size Description
FeatureCode decimal 8,0 Up to 99 999 999 distinguishable item features; database- generated code
FeatureType varchar 15 Single word best describing feature
FeatureText varchar 255 Text description of feature

The long lists of features of each appliance (contained in the individual appliance HTML pages) are a major headache. They are non-standardised comments about the important features of all of the appliances. Punctuation is idiosyncratic. Clearly many of the comments are similar, but not identical. Automatically capturing this information in a standardised form is a pain. We will aim for at least some degree of standardisation, and in addition incorporate the facility for later trying to identify key words within each piece of blurb - for example, "width", "height", "VHS", "PAL", "NTSC" and so on. It is also possible that features in the left hand column of one page may appear on the bottom of the next, and vice versa, so we won't here distinguish between the locations of the features on individual web pages.

Clearly, summary features will usually be shorter than detailed features, but as we are using varchar for the field type, this shouldn't adversely affect performance or storage space.

{Note that automatically getting intelligible words into the FeatureType field is a non-trivial task. We will not here address this}

MOD_FEATURE {associate model and various features}
Name Type Size Description
FeCode {As for FEATURE.FeatureCode}
MoCode {As for MODEL.ModelCode}
PutMeAt decimal 1 Where do I put model feature? Coding might be L=left, B=bottom, S=summary, but then what about combinations. Best have S=1, L=2, B=4, then can combine!!
MyNumber decimal 8,3 If there is a decimal associated with the model feature, it goes here.
Priority decimal 2,0 Precedence of item. See comment below.

Aha! Here we associate features with particular models. The PutMeAt (put model feature at) field allows us to decide where the feature should go (in the summary text of a menu that describes several appliances, or in the detail menu for a single appliance).

{Note the field MyNumber. We won't here implement use of this field, but the intent is clear. Where a number is associated with a feature (e.g. 'Width = 333 mm'), we intend to eventually extract the number and have a generic feature ('Width = * mm'). Multiple numbers in a single line will clearly present a substantial problem - avoid this by breaking up the lines}

Note that we have another problem. The definition of a relational database doesn't allow row order, a completely alien concept. There is no doubt that we will want certain features to appear prominently (and even more importantly, fairly consistently) in our final HTML code. There is no guarantee that if we put features of a particular appliance into our database in a particular order, we will get them out in that order. So we need to "prioritise" our features. This explains the priority column. {explore consequences!}

{A small but irritating further problem is that summary features and other features may overlap (with redundant information)}

2. Supplier Information

  • Region [15]
  • Subregion [25]
  • Supplier name [30]
  • Supplier address line1 [30]
  •       - line 2 [30] (optional, unless line 3 exists)
  •       - line 3 [30] (most lack line 3, but have it)
  • Supplier email [15] {without "@appliance4u.com" after it}
  • supplier telephone code [6]
  • supplier telephone number [8] (could get by with seven)

You can see that there are a lot of implicit associations within the data. For example, the telephone code is related to the region and subregion in a complex fashion. (The first three digits of the telephone number will also provide detailed information about the location of the supplier). One of the address lines will probably be associated with subregion, and telephone code. There are other problems that may not immediately be apparent, for example a supplier may have more than one telephone, currently indicated by a slash/slashes in the (longer, denormalised) telephone number. Larger suppliers may be able to cover several subregions. And so on.. We do as follows:

PLACE
Name Type Size Description
PlaceCode decimal 7,0 Generated by database. Up to 9 999 999 distinct places.
PlaceName varchar 25 See text below
PlaceIsIn {As for PLACE.PlaceCode. Recursive}
PlaceSize decimal 1,0 Possible values <= 3 minor, 4=subregion, 6=region. See text.

Sneaky. We make PLACE recursive on itself. This allows us to build an implicit hierarchy of places. A supplier may live at The Mews, 33 Mandela Crescent, Little Suburb, Big Town, Gauteng; or simply in Johannesburg, Gauteng. Knowing just the first part of the address, we can reconstruct the whole, and conversely we can answer questions like "How many suppliers are there in Gauteng", applying the recursive argument in reverse. Some caution is advised because (a) The Mews (in our example) could be the name of locations around the country; (b) In this schema, 33 and 34 Mandela Crescent are not regarded as being spatially associated; (c) There may be breaks in the chain of continuity - that is, somebody may live in SmallSuburb, and we may not know where it is in the big scheme of things; (d) Once we start introducing the addresses of people other than suppliers, the logic becomes messy, because now if we work down from the top of the tree (Gauteng) to individual addresses, we get zillions of them, and then have to cross match these with supplier addresses, which becomes time-consuming. All of these are addressable [heh], (a) by confirming, where locations have the same names that they are the same - Perl will supply a chain, and we can compare the whole chain, creating two "The Mews"es if the match fails. This should also take into account the possibility of missing information in a match; (b) Numeric information could be extracted {We won't explore this!}; (c) If the chain doesn't end (upwards) in a Region, then query the administrator about the region to link to; (d) There are several solutions. I would be tempted to denormalise, and flag each terminal address that houses a supplier! Ugly but efficient, and not a great waster of space! We could, for example, have PlaceSize = 1 mean a non-dealer address, and 2 = a dealer.

The main purpose of the PlaceSize field is to be able to identify a place as a region or subregion!

TELNUM {includes area code}
Name Type Size Description
TShort decimal 7,0 {Generated by database. Up to 9 999 999 distinct numbers}
AreaCode decimal 5,0 {check maximum code size ???}
TelNumber decimal 8,0 {? just in case. 7 adequate. No "-" in number}

Rather than getting into a complex mess with relating telephone numbers to addresses, we sidestep the issue, lump the two together, and give each combination a unique code. This solution may be far from optimal, and (like a lot of what we've done) can be criticised in a dozen ways. Oh well. At least we can easily change a (whole) telephone number, and if the area code for a whole region changes, we only do one table update with minimal fuss.

SUPPLIER
Name Type Size Description
SupplierCode decimal 4,0 Up to 9 999 suppliers. Internally generated code.
SupplierName varchar 30 The formal name of the supplier.
SupplierEmail varchar 15 Email of supplier, without "@appliance4u.com" on end.
SupplierAddr {As for PLACE.PlaceCode}

The above table is fairly straightforward, as is the following which relates supplier and (one or more) telephone numbers.

SUPP_TEL
Name Type Size Description
STelShort {As for TELNUM.TShort}
SuppCode {As for SUPPLIER.SupplierCode}

SUPP_AREA
Name Type Size Description
SuppCode {As for SUPPLIER.SupplierCode}
AreaCode {As for PLACE.PlaceCode}
Status decimal 2,0 {Relative weighting of supplier in this area. See text. Default = 49, Range is 0..99}

The relationship between supplier and area is interesting. Here, we will constrain the regions - they must be a place-size that is a SUBREGION (PLACE.PlaceSize = 4). This arrangement allows us to link one supplier to many regions, and rank each supplier in each region (for example, based on customer feedback for that region).

3. What about customers?

There's something that is simply crying out to be done, and that is capturing information about customers. All customer enquiries are CC'd to appliance4u, so why not capture the individual customer enquiries into the database? Come to think of it, this is really the vital information. The information that customers submit is minimal, but usually more than sufficient. Here are the fields:

  • Customer Title [5] (optional)
  • Customer Surname [30]
  • Telephone code [6]
  • Telephone number [8]
  • eMail address [30]
  • Area [25]
  • Appliance code [15]
  • Price [8,2]
  • Supplier contacted [25]
  • (type of appliance = usually redundant)
  • Customer comment [255] may include alternative telephone number, if cellphone number inserted. (Note cell nos begin with 082, 083, OR **NOW** 072. To update). Perl to scan for this.
  • A unique customer sequence number.

CUSTOMER
Name Type Size Description
CustomerCode decimal 7,0 Up to 9 999 999 customers. Internally generated code.
CustomerSurname varchar 30 Self-explanatory
CustomerTitle varchar 5 Mr,Mrs,Miss etc. {check. Look for longer possibilities??}
CustomerEmail varchar 30 full Email of customer
CustomerArea {As for PLACE.PlaceCode; constrained to subregion PlaceSize=4}

The above table is straightforward. See how we use PLACE which we defined previously.

CUST_TEL
Name Type Size Description
cTelShort {As for TELNUM.TShort}
CustCode {As for CUSTOMER.CustomerCode}

CUST_TEL is so similar to SUPP_TEL that one is tempted to amalgamate the two. Hmm?

ENQUIRY {by customer of supplier about appliance}
Name Type Size Description
ModlCode {As for MODEL.ModelCode}
CustCode {As for CUSTOMER.CustomerCode}
SuppCode {As for SUPPLIER.SupplierCode}
Price decimal 8,2 Price max of 999 999.99
Date float (!) {no qualifier} Date stamp of the e-mail! See note below.
CustComment varchar 255 Comment from customer; Perl script will have extracted any telephone number(s)

Note the composite key. The price is the price that obtained when the enquiry was made, need not be identical to current price! One should probably augment the above table with fields for (a)Followed up (b)Sale made and (c)Sale price, with a Supplier Comment field, but we will leave this for future alteration, perhaps when/if we can get consistent supplier responses!

You might think it's a bit strange that we store a Date in a floating point number!! In fact this makes a lot of sense when you realise that every man and his dog (and certainly every SQL database creator) has defined a different standard. We will use a Julian date (simply a floating point number) to unequivocally and succinctly store the date and time.

{We won't here explore the conversion routines required, but will later refer [xref] to our Date and Time web-page, as well as "linking external function into SQL"}

4. Extending the model

It's tempting to now ask the question "How would the above model have to be altered to accommodate menus on other levels?" The other level menus are made up of fairly carefully formatted HTML code, with few items on each page. The front page (Level 0) is unlikely to be changed except through manual tweaking, and likewise for levels 1, 2, 3A, and 5. However, even these levels could be automated, with templates that could be filled in with the relevant category information, and an appropriate picture. The pictures could be stored in the PICTURE table, and a minor modification required there would be to create another option for PICTURE.Usage. The problem is that PICTURE has a ModCode field, while in this circumstance we would want to associate a picture with a CATEGORY! This is probably sufficient reason to convince us to remove the Modcodefrom the PICTURE table, and create a separate MOD_PICTURE table. We can then store all pictures together, and have one or more picture-associating tables, thus:

PICTURE2 {pictures of appliances}
Name Type Size Description
PictureCode decimal 8,0 Up to 99 999 999 distinct pictures, database-generated code
PicName varchar 15 actual name of picture, including .jpg or .gif suffix, but not the standard "images/" prefix ie path.
PicAltTag varchar 30 Description of the picture, as in HTML Alt tag
PicWidth decimal 3,0 pixel width of picture, up to 999 [?!]
PicHeight decimal 3,0 pixel height of picture, up to 999 [?!]
PicUsage char 1 big or thumbnail, coded as B or T. (? Option of C for Category)

MODEL_PIC
Name Type Size Description
ModCode {As for MODEL.ModelCode}
PicCode {As for PICTURE2.PictureCode}

What about Level 1A? At present these menus too are so small (and static) that manual alteration is an acceptable alternative to automation. However automation would be fun! Note that the detail menus for these items are the same as for other items, it's only the preceding menu that's a problem. Here, as for levels 1, 2, 3A and 5, a template could be filled with relevant text and pictures; a warning could be issued if the number of items exceeded the number of slots in the template (This overflow being determined by a Perl script).

Another thought: "What about having the HTML templates as part of the database too?" Although feasible, this seems excessive. Perl is doing the actual HTML generation, as this is what it's best at. My rule six comes to mind. We'll make text templates that Perl can handle, although we could of course generate these text templates from database images.

Yet another: "What about storing pictures in the database?" My personal view is that this is very silly. The images need to be stored on the web server anyway, and keeping them within the database is just going to bloat the database, and add another (vulnerable) step to an already complex process.

{ We won't here look into the trickery required to have, for example, Level 1 and 2 menus cross-referencing one another, for example, having a clickable link to "VCRs" from the main "TV" menu. The concept we need here is 'closely related categories', which is not encoded in our database at present. We could easily create a new table that associates related categories with one another! }

5. Generating unique keys

In the following sections where we actually write the SQL, we're going to do a wicked thing. We are going to use the external unique numbers provided by the Perl script (for the various data sets) to generate unique keys. We'll even explore several different approaches to generating these keys.

Why do we consider this to be wicked? Well, we think that wherever possible keys should be sequential. This means that there are no gaps in the numbers. You will soon appreciate that if we use external keys, no matter how smart the Perl script is that provides them, we have no hope of making one key number follow another, except under very unrealistic circumstances.

Now there are well-defined ways of generating unique sequential keys in most SQL implementations. Many of these break the rules embodied in SQL-92 and SQL-99. We will not break the rules, and (to keep things relatively simple) will generate our keys from the external data. This is not the best way to do things in real life, but remember, this whole big page is simply a teaching exercise. If we were to both stick to the SQL rules and use sequential keys, we'd have to introduce some fairly advanced things which we think are inappropriate at this stage.

{ later on we will have a page on keys using externally defined functions; will then have xref from here!}


Writing the SQL

We now have just sixteen basic tables that we need to create. We then need to write some fairly hefty SQL code to import data into our nice clean tables. Inserting information for a single new appliance will use the same format as the bulk capture of the whole site. Appliance deletion (with cascading) should be trivial, and editing items such as prices and descriptions should be straightforward, in the main.

Here is our table list:

  1. MANUFACTURER
  2. CATEGORY
  3. MODEL
  4. PICTURE2
  5. FEATURE
  6. MOD_FEATURE
  7. MODEL_PIC
  8. PLACE
  9. TELNUM
  10. SUPPLIER
  11. SUPP_TEL
  12. SUPP_AREA
  13. CUSTOMER
  14. CUST_TEL
  15. ENQUIRY

All of the above are clickable, taking you to the above table structures.

Let's write the SQL code for each of these in turn. We won't burden each SQL definition with comments, as the table definitions and minor constraints we have attached should be self-explanatory. {Note that we could have added a whole lot more check constraints. We haven't. Naughty us!}

MANUFACTURER

create table MANUFACTURER (
          ManufacturerCode decimal(4,0) ,
        constraint
manuf_key1 primary key (ManufacturerCode),
          ManufacturerName varchar(20) ,
              constraint manuf_name_null check (ManufacturerName is not null)
          );

CATEGORY

create table CATEGORY (
         CatCode decimal(4,0) ,
        constraint category_key1 primary key (CatCode),
         CatParent decimal(4,0) default 0 ,
              constraint category_parent_null check (CatParent is not null),
              constraint category_fgn_key foreign key (CatParent) references CATEGORY,
         CatName varchar(40) ,
              constraint category_name_null check (CatName is not null),
         CatDesc varchar(64),
         CatAlign char(1) ,
              constraint category_align_null check (CatAlign is not null)
          );

MODEL

create table MODEL (
         ModelCode decimal(8,0) ,
        constraint model_key1 primary key (ModelCode),
         ModelName varchar(15) ,
              constraint model_name_null check (ModelName is not null),
         ModelDesc varchar(64) ,
         ModelParent decimal(4,0) ,
              constraint model_parent_null check (ModelParent is not null),
              constraint model_parent_fgn_key foreign key (ModelParent) references CATEGORY,
         ManuCode decimal(6,0) ,
              constraint model_manucode_null check (ManuCode is not null),
              constraint mocel_manucode_fgn_key foreign key (ManuCode) references MANUFACTURER,
         Price decimal(8,2) ,
              constraint model_price_null check (Price is not null)
          );

PICTURE2

create table PICTURE2 (
         PictureCode decimal(8,0) ,
        constraint picture_key1 primary key (PictureCode),
         PicName varchar(20) ,
              constraint picture_name_null check (PicName is not null),
         PicAltTag varchar(30),
         PicWidth decimal(3,0) ,
              constraint picture_width_null check (PicWidth is not null),
         PicHeight decimal(3,0) ,
              constraint picture_hgt_null check (PicHeight is not null),
         PicUsage char(1) ,
              constraint picture_use_null check (PicUsage is not null)
          );

FEATURE

create table FEATURE (
         FeatureCode decimal(8,0) ,
        constraint feature_key1 primary key (FeatureCode),
         FeatureType varchar(15) ,
              constraint feature_type_null check (FeatureType is not null),
         FeatureText varchar(255) ,
              constraint feature_text_null check (FeatureText is not null)
          );

MOD_FEATURE

create table MOD_FEATURE (
         FeCode decimal(8,0),
              constraint mf_fecode_fgnkey foreign key (FeCode) references FEATURE,
         MoCode decimal(8,0),
              constraint mf_mocode_fgnkey foreign key (MoCode) references MODEL,
         PutMeAt decimal (1,0),
              constraint mf_putmfa_null check (PutMeAt is not null),
         MyNumber decimal(8,3) ,
         Priority decimal(2,0) ,
              constraint mf_priority_null check (Priority is not null),
        constraint mf_key1 primary key (FeCode, MoCode)
          );

MODEL_PIC

create table MODEL_PIC (
         ModCode decimal(8,0),
              constraint mp_modcode_fgnkey foreign key (ModCode) references MODEL,
         PicCode decimal(8,0),
              constraint mp_piccode_fgnkey foreign key (PicCode) references PICTURE2,
        constraint mp_key1 primary key (ModCode, PicCode)
          );

PLACE

create table PLACE (
         PlaceCode decimal(7,0) ,
        constraint place_key1 primary key (PlaceCode),
         PlaceName varchar(25) ,
              constraint place_name_null check (PlaceName is not null),
         PlaceIsIn decimal(7,0) ,
              constraint place_in_null check (PlaceIsIn is not null),
              constraint place_in_fgnkey foreign key (PlaceIsIn) references PLACE,
         PlaceSize decimal(1,0) ,
              constraint place_size_null check (PlaceSize is not null)
          );

TELNUM

create table TELNUM (
         TShort decimal(7,0) ,
        constraint telnum_key1 primary key (TShort),
         AreaCode decimal(5,0) ,
              constraint telnum_area_null check (Areacode is not null),
         TelNumber decimal(8,0) ,
              constraint telnum_num_null check (TelNumber is not null)
          );

SUPPLIER

create table SUPPLIER (
         SupplierCode decimal(4,0) ,
        constraint supp_key1 primary key (SupplierCode),
         SupplierName varchar(30) ,
              constraint supp_name_null check (SupplierName is not null),
         SupplierEmail varchar(15) ,
              constraint supp_email_null check (SupplierEmail is not null),
         SupplierAddr decimal(7,0) ,
              constraint supp_addr_null check (SupplierAddr is not null),
              constraint supp_addr_fgnkey foreign key (SupplierAddr) references PLACE
          );

SUPP_TEL

create table SUPP_TEL (
         STelShort decimal(7,0),
              constraint st_telshort_fgnkey foreign key (STelShort) references TELNUM,
         SuppCode decimal(4,0),
              constraint st_supp_fgnkey foreign key (SuppCode) references SUPPLIER,
        constraint st_key1 primary key (STelShort, SuppCode)
          );

SUPP_AREA

create table SUPP_AREA (
         SuppCode decimal(4,0),
              constraint sa_suppcode_fgnkey foreign key (SuppCode) references SUPPLIER,
         AreaCode decimal(7,0),
              constraint sa_area_fgnkey foreign key (AreaCode) references PLACE,
         Status decimal(2,0) default 49 ,
              constraint sa_area_null check (Status is not null),
        constraint sa_key1 primary key (SuppCode, AreaCode)
          );

CUSTOMER

create table CUSTOMER (
         CustomerCode decimal(7,0) ,
        constraint cust_key1 primary key (CustomerCode),
         CustomerSurname varchar(30) ,
              constraint cust_surname_null check (CustomerSurname is not null),
         CustomerTitle varchar(5) ,
              constraint cust_title_null check (CustomerTitle is not null),
         CustomerEmail varchar(30) ,
              constraint cust_email_null check (CustomerEmail is not null),
         CustomerArea decimal(7,0) ,
              constraint cust_area_null check (CustomerArea is not null),
              constraint cust_area_fgnkey foreign key (CustomerArea) references PLACE
          );

CUST_TEL

create table CUST_TEL (
         cTelShort decimal(7,0),
              constraint ct_telshort_fgnkey foreign key (cTelShort) references TELNUM,
         CustCode decimal(7,0),
              constraint ct_custcode_fgnkey foreign key (CustCode) references CUSTOMER,
        constraint ct_key1 primary key (cTelShort, CustCode)
          );

ENQUIRY

create table ENQUIRY (
         ModlCode decimal(8,0),
              constraint enq_modlcode_fgnkey foreign key (ModlCode) references MODEL,
         CustCode decimal(7,0),
              constraint enq_custcode_fgnkey foreign key (CustCode) references CUSTOMER,
         SuppCode decimal(4,0),
              constraint enq_suppcode_fgnkey foreign key (SuppCode) references SUPPLIER,
         Price decimal(8,2) ,
              constraint enq_price_null check (Price is not null),
         EnqDate float ,
              constraint enq_date_null check (EnqDate is not null),
         CustComment varchar(255),
        constraint enq_key1 primary key (ModlCode, CustCode, SuppCode)
          );

Initialisation and Loading of Data into Intermediate Tables

  1. First, we set up root CATEGORY and and PLACE entries. We will give both of these a key of 0 (zero) which we will not otherwise permit. Ultimately, all other categories and places will point back to our root rows. We do this:

    insert into CATEGORY (
            CatCode, CatParent, CatName, CatDesc, CatAlign )
            values
            ( 0, 0, 'APPLIANCE', 'The root category', 'X');

    and also..

    insert into PLACE (
            PlaceCode, PlaceName, PlaceIsIn, PlaceSize)
            values
            ( 0, 'Southern Africa', 0, 7 );

    Notice how we refer both 'APPLIANCE' and 'Southern Africa' back to themselves. This signals that we are at the 'root of the tree', but we must exercise caution as there is potential for infinite regress (we can look up the 'parent' and get back to the same spot if we write stupid code). The PlaceSize for our entry we set at 7, which is not one of the standard sizes we agreed upon, but follows logically after 6 (for regions) and still leaves space at the top (for eg. 'Africa' and 'World')!

    It is important that CATEGORY is well-defined. It's all very well to just pull out the categories and subcategories from Perl generated data (We'll later come across these Category, .SubCategory, and .SubSubCat fields), but better would be to pre-define these in the CATEGORY table (more work, but more security!). At the same time we can link them vertically through the CATEGORY.CatParent field. Then any non-existent value IS an error. We will take this somewhat more tedious but rigorous approach..


    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 1, 0, 'Microwaves', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 2, 0, 'Ovens & Stoves', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 3, 0, 'Fridges', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 4, 0, 'Freezers', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 5, 0, 'Washing Machines', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 6, 0, 'Tumble Dryers', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 7, 0, 'Dishwashers', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 8, 0, 'Television Sets', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 9, 0, 'Video Recorders', 'all', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 10, 1, 'Small', 'Microwaves', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 11, 1, 'Medium', 'Microwaves', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 12, 1, 'Large', 'Microwaves', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 13, 1, 'Extra Large', 'Microwaves', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 14, 2, 'Extractor Hoods', 'oven and stove', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 15, 2, 'Hobs & Combinations', 'oven and stove', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 16, 2, 'Built in ovens', 'oven and stove', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 17, 2, 'Stoves', 'oven and stove', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 18, 2, 'Gas cooking including hobs and stoves', 'oven and stove', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 19, 3, 'Gas Fridges', 'fridges', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 20, 3, 'Fridges with a top freezer', 'fridges', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 21, 3, 'Side by side fridge and freezer', 'fridges', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 22, 3, 'Fridges: freezer compartment on the bottom', 'fridges', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 23, 3, 'One Door Fridges', 'fridges', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 24, 4, 'Gas Freezers', 'freezers', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 25, 4, 'Smaller Chest Freezers: up to 220 L', 'freezers', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 26, 4, 'Chest freezers 221 to 320 L', 'freezers', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 27, 4, 'Large chest freezers', 'freezers', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 28, 4, 'Upright freezers', 'freezers', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 29, 5, 'Frontloading Washing machines', 'washing machines', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 30, 5, 'Top loading washing machines', 'washing machines', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 31, 5, 'Twin Tubs', 'washing machines', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 32, 8, 'Television sets 36 to 44 cm', 'tv', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 33, 8, 'Television Sets: 51 to 55 cm', 'tv', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 34, 8, 'Television Sets: 63 to 65 cm', 'tv', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 35, 8, 'Television Sets: 70 to 75 cm', 'tv', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 36, 8, 'Extra large Television Sets', 'tv', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 37, 9, 'Double head Video Cassette Recorders', 'vcr', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 38, 9, 'Four head Video Cassette Recorders', 'vcr', 'X');

    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 39, 9, 'Six head VCRs', 'vcr', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 40, 20, 'Fridges with a top freezer: 200 to 299 L', 'top freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 41, 20, 'Fridges with top freezers: 300 to 399 L', 'top freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 42, 20, 'Fridges with top freezer: 400 to 549 L', 'top freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 43, 20, 'Large top freezer fridges', 'top freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 44, 22, 'Bottom freezer fridges under 300 L', 'bottom freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 45, 22, 'Bottom freezer fridges 300 to 349 L', 'bottom freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 46, 22, 'Bottom freezer fridges 350 to 399 L', 'bottom freezer fridge', 'X');
    insert into CATEGORY ( CatCode, CatParent, CatName, CatDesc, CatAlign )
        values ( 47, 22, 'Larger bottom freezer fridges: 400+ L', 'bottom freezer fridge', 'X');

    Whew! That took longer than I thought it would (thank heavens for cut and paste)! Note that as things stand, there is redundant (and denormalised) information due to similar CatDesc fields with similar CatParents, but the CatDesc descriptive fields can contain anything you want!

    Again, it's infinitely preferable to pre-define the Regions, and probably also the Subregions before we start capturing data. This is a manageable task and will diminish errors substantially. Let's do what we did for category. First we will do regions, then subregions.

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (1, 'Eastern Cape', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (2, 'Free State', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (3, 'Gauteng', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (4, 'KwaZulu Natal', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (5, 'Mpumalanga', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (6, 'Namibia', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (7, 'North West', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (8, 'Northern Cape', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (9, 'Northern Province', 0, 6);

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (10, 'Western Cape', 0, 6);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (11, 'western cape strand', 10, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (12, 'western cape paarl', 10, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (13, 'western cape country', 10, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (14, 'western cape cape town', 10, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (15, 'northern province tzaneen', 9, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (16, 'northern province louis trichardt', 9, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (17, 'northern province country', 9, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (18, 'northern cape kimberley', 8, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (19, 'northern cape danielskuil', 8, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (20, 'northern cape country', 8, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (21, 'northern cape colesburg', 8, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (22, 'north west rustenburg', 7, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (23, 'north west mafikeng', 7, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (24, 'north west country', 7, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (25, 'namibia windhoek', 6, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (26, 'mpumalanga witbank', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (27, 'mpumalanga trichardt', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (28, 'mpumalanga nelspruit', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (29, 'mpumalanga middelburg', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (30, 'mpumalanga country', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (31, 'mpumalanga balfour', 5, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (32, 'kwazulu natal umgeni', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (33, 'kwazulu natal stanger', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (34, 'kwazulu natal pinetown', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (35, 'kwazulu natal pietermaritzburg', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (36, 'kwazulu natal durban', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (37, 'kwazulu natal country', 4, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (38, 'gauteng west rand', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (39, 'gauteng vereeniging', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (40, 'gauteng soweto', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (41, 'gauteng pretoria', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (42, 'gauteng johannesburg', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (43, 'gauteng heidelburg', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (44, 'gauteng east rand', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (45, 'gauteng country', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (46, 'gauteng alberton', 3, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (47, 'free state welkom', 2, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (48, 'free state country', 2, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (49, 'free state bloemfontein', 2, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (50, 'free state bethlehem', 2, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (51, 'eastern cape uitenhage', 1, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (52, 'eastern cape port elizabeth', 1, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (53, 'eastern cape middelburg', 1, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (54, 'eastern cape graaff reinet', 1, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (55, 'eastern cape east london', 1, 4);
    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) values (56, 'eastern cape country', 1, 4);

    See how we've linked the subregions to the regions. There is a lot of redundancy, and in fact we could clip the region names out of the subregions without further ado, provided we make allowances in our Perl scripts for this alteration.

    {ALSO WHAT ABOUT CASE SENSITIVITY???}

  2. We now need to create three "flat" tables, to receive the temporary data that we get from our Perl script. There may be other better ways of doing this, but we will simply create the following humungous tables, and then populate them:

    First Table - Appliance Data

    create table BIG_FLAT_APPL (
        Category varchar (30) ,
        PicType varchar (1) ,
        Brand varchar (20) ,
        Model varchar (15) ,
        Price decimal (8,2) ,
        ThumbName varchar (15) ,
        ThumbW decimal (3,0) ,
        ThumbH decimal (3,0) ,
        ThumbAlt varchar (30) ,
        PixName varchar (15) ,
        PixWidth decimal (3,0) ,
        PixHeight decimal (3,0) ,
        PixAlt varchar (30) ,
        Summ1 varchar (40) ,
        Summ2 varchar (40) ,
        Summ3 varchar (40) ,
        Summ4 varchar (40) ,
        Summ5 varchar (40) ,
        Summ6 varchar (40) ,
        Summ7 varchar (40) ,
        Summ8 varchar (40) ,
        Summ9 varchar (40) ,
        Summ10 varchar (40) ,
        Left1 varchar (255) ,
        Left2 varchar (255) ,
        Left3 varchar (255) ,
        Left4 varchar (255) ,
        Left5 varchar (255) ,
        Left6 varchar (255) ,
        Left7 varchar (255) ,
        Left8 varchar (255) ,
        Left9 varchar (255) ,
        Left10 varchar (255) ,
        Left11 varchar (255) ,
        Left12 varchar (255) ,
        Left13 varchar (255) ,
        Left14 varchar (255) ,
        Left15 varchar (255) ,
        Left16 varchar (255) ,
        Left17 varchar (255) ,
        Left18 varchar (255) ,
        Left19 varchar (255) ,
        Left20 varchar (255) ,
        Left21 varchar (255) ,
        Left22 varchar (255) ,
        Left23 varchar (255) ,
        Left24 varchar (255) ,
        Left25 varchar (255) ,
        Left26 varchar (255) ,
        Left27 varchar (255) ,
        Left28 varchar (255) ,
        Left29 varchar (255) ,
        Left30 varchar (255) ,
        Bott1 varchar (255) ,
        Bott2 varchar (255) ,
        Bott3 varchar (255) ,
        Bott4 varchar (255) ,
        Bott5 varchar (255) ,
        Bott6 varchar (255) ,
        Bott7 varchar (255) ,
        Bott8 varchar (255) ,
        Bott9 varchar (255) ,
        Bott10 varchar (255) ,
        Bott11 varchar (255) ,
        Bott12 varchar (255) ,
        Bott13 varchar (255) ,
        Bott14 varchar (255) ,
        Bott15 varchar (255) ,
        Seq decimal(8)
        );

    Second Table - Dealers (Suppliers)

    create table BIG_FLAT_DEAL (
        Region varchar (15),
        Subregion varchar (25),
        SupplierName varchar (30),
        SupplierAddr1 varchar (30),
        SupplierAddr2 varchar (30),
        SupplierAddr3 varchar (30),
        SupplierEmail varchar (15),
        SupplierTelCode decimal (6),
        SupplierTelNum1 decimal (10),
        SupplierTelNum2 decimal (10),
        SupplierTelNum3 decimal (10),
        Seq decimal (8)
        );

    { ??? if supplier gives a cellphone number???}

    Third Table - Customers

    create table BIG_FLAT_CUST (
        CustTitle varchar(5),
        CustSurname varchar(30),
        CTelCodeA decimal (6),
        CTelCodeB decimal (6),
        CTelNumA decimal (10,0),
        CTelNumB decimal (10,0),
        CEMail varchar(30),
        CustArea varchar(25),
        CApplCode varchar(15),
        CPrice decimal (8,2),
        CSupplier varchar(25),
        CustComm varchar(255),
        CDate float,
        Seq decimal(8,0)
        );

    {NOTE: need to fix up scripts iro CtelNum1 etc}

  3. Next, we populate the tables. For this we need to write and invoke our Perl scripts. These scripts are discussed below. We will assume that they automagically generate a SQL command, for each of the above three choices. Let's look at fictitious examples of these, first the customer data..

    insert into BIG_FLAT_CUST (
        CustTitle, CustSurname, CTelCodeA, CTelCodeB, CTelNumA, CTelNumB,
        TelNum2, CEMail, CustArea, CApplCode,
        CPrice, CSupplier, CustComm, CDate, Seq
        )   values (
        'Ms', 'van der Merwe', 083, 01234, 9997654,
        5551234, 'me@somebodyinsa.co.za', 'Gauteng Pretoria', 'Defy 631t',
        2716.00, 'golddollpta', 'I want it and i want it now!', 0000000.0, 1
        );

    Next, dealer information..

    insert into BIG_FLAT_DEAL (
        Region, Subregion, SupplierName, SupplierAddr1,
        SupplierAddr2, SupplierAddr3, SupplierEmail, SupplierTelCode,
        SupplierTelNum1, SupplierTelNum2, SupplierTelNum3
        )   values (
        'Gauteng', 'Pretoria', 'Gold Dollar Furnishers', '211 Du Toit Street',
        'Pretoria', '', 'golddollpta', 012,
        3227766, 3227767, 3227768
        );

    And finally, the most important of the lot..

    insert into BIG_FLAT_APPL (
        Category, PicType,   Brand, Model, Price, ThumbName,
        ThumbW, ThumbH, ThumbAlt, PixName,   PixWidth, PixHeight, PixAlt, Summ1,
        Summ2, Summ3, Summ4, Summ5,   Summ6, Summ7, Summ8, Summ9,
        Summ10, Left1, Left2, Left3,   Left4, Left5, Left6, Left7,
        Left8, Left9, Left10, Left11,   Left12, Left13, Left14, Left15,
        Left16, Left17, Left18, Left19,   Left20, Left21, Left22, Left23,
        Left24, Left25, Left26, Left27,   Left28, Left29, Left30, Bott1,
        Bott2, Bott3, Bott4, Bott5,   Bott6, Bott7, Bott8, Bott9,
        Bott10, Bott11, Bott12, Bott13,   Bott14, Bott15, Seq
          )   values (
        'Fridges with a top freezer: 200 to 299 L', 'V',   'KIC', 'kt5021a', 1730.00, 'kt5021a_s.jpg',
        41, 61, 'KIC 5021a fridge', 'kt5021a.jgp',   119, 251, 'kt5021a - large image', '215 L',
        'all-round magnetic seals', '', '', '',   '', '', '', '',
        '', 'total capacity 215 L', 'fridge 143 L', 'freezer 72L',   'Powder coated galvanized steel inner and outer cabinets', 'Rust-resistant', 'Electrostatic paint finish', 'Modern-looking smooth finish',
        'Locks on both doors', 'Curved door soft-line finish', '100% CFC free and environment friendly', 'Adjustable front feet compensate for uneven floors',   'One piece galvanized steel expoxy-coated inner liners', 'Magnetic gaskets on all four edges of doors seal doors tightly keeping cold air inside', 'Adjustable steel shelves for individual packing needs', 'Adjustable thermostat in fridge section for different climatic needs',
        'Interior lights', 'Cyclomatic defrost in fridge section', 'Large crisper drawer', 'Condiment and butter compartments in fridge',   '', '', '', '',
        '', '', '', '',   '', '', '', 'FREEZER SECTION',
        'Magnetic gaskets for a tight seal, as for fridge', 'GUARANTEE - One year guarantee (misuse excluded)', 'Two-year warranty on working parts', 'DIMENSIONS (mm)',   '1423', '500', '593', '',
        '', '', '', '',   '', '', 1
      );

    The above insert statement is so unwieldy that only automatic generation makes sense. Once the commands are generated, we will run them within SQL to insert the data!

  4. Next, we should scrub our data. But we will relegate this function to the Perl script, also relying on the constraints we have set, and on our privilege of being the only person to update the table, to minimise data errors (naive fools that we are)! Perl data scrubbing should include elimination of leading and terminal blanks, and peculiar punctuation. Ideally however, we should also have spell checks, and comparison of similar fields.. quite a mission! One could even cross correlate areas and telephone codes (!)

  5. Finally, we run commands to extract data from the flat tables into the normalised tables. This is covered in the next section. Rows that have been used will then be deleted.

Moving Data to our Tables

Assuming reasonable data quality in our intermediate tables, we now farm the data out from our monstrous big denormalised tables to our 'normalised' tables. A summary of the data movement is:

Data Movement
Move From .. To
BIG_FLAT_APPLMANUFACTURER
BIG_FLAT_APPLCATEGORY
BIG_FLAT_APPLMODEL
BIG_FLAT_APPLPICTURE2
BIG_FLAT_APPLMODEL_PIC
BIG_FLAT_APPLFEATURE
BIG_FLAT_APPLMOD_FEATURE
BIG_FLAT_DEALPLACE
BIG_FLAT_DEAL, BIG_FLAT_CUSTTELNUM
BIG_FLAT_DEALSUPPLIER
BIG_FLAT_DEALSUPP_TEL
BIG_FLAT_DEALSUPP_AREA
BIG_FLAT_CUSTCUSTOMER
BIG_FLAT_CUSTCUST_TEL
BIG_FLAT_CUSTENQUIRY

Note that there are multiple fields for each of the above 'summary transactions'. We will examine and implement each of these transactions in a separate section below.

NOTE: In the following, to make things more legible, we will refer to BIG_FLAT APPLIANCE, BIG_FLAT_DEAL and BIG_FLAT_CUST as BFA, BFD and BFC respectively! {LOOK UP HOW TO ALIAS INTELLIGENTLY}.

  1. BIG_FLAT_APPL to MANUFACTURER
  2. This transaction seems simple. If BFA.Brand doesn't exist in the MANUFACTURER table, then create a new key (using KEYSOURCES), and insert the new brand into the MANUFACTURER table. (Ideally we should have extensive checking for mis-spelled brands, or perhaps even define the whole MANUFACTURER table upfront, and then ring bells if a new brand mysteriously pops up)!

    We will sneakily get our unique number from the unique Sequence number of the appliance!! Remember that we are doing this to keep things both simple and SQL-standard compliant, as a teaching exercise. Much later on, we'll find out more elegant ways of stamping in a unique (and sequential) number. Here goes..

    insert into MANUFACTURER (ManufacturerName, ManufacturerCode)
          select Brand, min(Seq) from BIG_FLAT_APPL group by BRAND;

    See how we used the insert command to copy from one table to another. We group by Brand to identify individual manufacturers, and then give each a unique but non-sequential number. (Clumsy!)

  3. BIG_FLAT_APPL to CATEGORY
  4. We next perform a check here that there are no erroneous fields in Category (or SubCategory, or SubSubCat). Something along the lines of:


      select Category, Brand, Model from BIG_FLAT_APPL where
             Category not in (select CatName from CATEGORY);

  5. BIG_FLAT_APPL to MODEL
  6. Here we insert the appliances into the MODEL table. There are five main components that need to be inserted into MODEL:

    Our basic approach is as follows:

    insert into MODEL
      (Price, ModelCode, ModelName, ModelParent, ManuCode, ModelDesc)
         select Price, Seq, Model, CatCode, ManufacturerCode, ''
            from BIG_FLAT_APPL, CATEGORY, MANUFACTURER
            where Category = CatName AND Brand = ManufacturerName;

    Note several things about the above statement:

    1. We are doing a cross-join on three tables, and have used two where conditions to constrain things;
    2. We have focussed on SubSubCat, and have ignored BFA.SubCategory and .Category! More of this later.
    3. Although we have no value for ModelDesc, we keep NULL out of things by inserting a null string (which is not the same as NULL)!

  7. BIG_FLAT_APPL to PICTURE2
  8. Here we have two pictures that must be inserted for each row in the BFA table. Each has similar properties (.PicName, .PicAltTag, .PicWidth and .PicHeight) but they derive from two different sets of columns, .ThumbName, .ThumbW, .ThumbH, and .ThumbAlt, versus PixName, PixWidth, PixHeight, .PixAlt. They will only be distinguished by different codes T or B, in the PICTURE2.PicUsage field.

    Note that because {--for the time being, to update--} we have abandoned the use of the KEYSOURCES table to generate keys, and are instead using BFA.Seq, we have a problem with pictures, for each row provides two of them. We will use a very ugly fix.

    insert into PICTURE2
     (PictureCode, PicName, PicAltTag, PicWidth, PicHeight, PicUsage)
        select 2*Seq, ThumbName, ThumbAlt, ThumbW, ThumbH, 'T'
               from BIG_FLAT_APPL;

    insert into PICTURE2
     (PictureCode, PicName, PicAltTag, PicWidth, PicHeight, PicUsage)
        select 1 + 2*Seq, PixName, PixAlt, PixWidth, PixHeight, 'B'
               from BIG_FLAT_APPL;

    We create unique picture key codes for the thumbnails from the sequence number*2, or add 1 to this for the main pictures!

    As described for PICTURE2, the codes 'T' and 'B' refer to thumbnail sized pictures and big pictures, respectively.

  9. BIG_FLAT_APPL to MODEL_PIC
  10. After we update PICTURE2, we should associate the image with the appliance by updating MODEL_PIC, .ModCode for appliance, .PicCode for picture. Note how excessively clumsy the following is:

    insert into MODEL_PIC (ModCode, PicCode) select ModelCode, PictureCode from BIG_FLAT_APPL, MODEL, PICTURE2 where Model = ModelName AND PixName = PicName;

    The idea is that we join the three tables mentioned, extracting lines where the Model matches the model we previously inserted into MODEL, and the picture BFA.PixName matches PICTURE.PicName, inserted in the previous section. We now have to do the same for the thumbnail pictures:

    insert into MODEL_PIC (ModCode, PicCode) select ModelCode, PictureCode from BIG_FLAT_APPL, MODEL, PICTURE2 where Model = ModelName AND ThumbName = PicName;

  11. BIG_FLAT_APPL to FEATURE
  12. This update works hand in hand with the next one. It's a bit of a blighter. For each of the .Summ (.Summ1 to .Summ10), .Left (.Left1 to .Left30), and .Bott (.Bott1 to .Bott15) fields of the BFA table, we need to (a)check if the field is already present in FEATURE, and if not, (b) insert it into FEATURE with a new key from KEYSOURCES. In any event, we must associate the feature with the MODEL (next section). First we

    create table FE1 ( FeCode1 decimal(8,0), FeText1 varchar(255), FeMod1 decimal(8,0), FePos1 char(1), FeStatus1 decimal(2) );

    We will give just three examples of the sort of insert statement we need, but something similar must be done for all 55 columns!!

    insert into FE1 (FeCode1, FeText1, FeMod1, FePos1, FeStatus1) select 0 + 55 * Seq, Summ1, ModelCode, 'S', 0 from BIG_FLAT_APPL, MODEL where BIG_FLAT_APPL.Model = MODEL.ModelName AND Summ1 <> '' ;

    insert into FE1 (FeCode1, FeText1, FeMod1, FePos1, FeStatus1) select 10 + 55 * Seq, Left1, ModelCode, 'L', 0 from BIG_FLAT_APPL, MODEL where BIG_FLAT_APPL.Model = MODEL.ModelName AND Left1 <> '' ;

    insert into FE1 (FeCode1, FeText1, FeMod1, FePos1, FeStatus1) select 40 + 55 * Seq, Bott1, ModelCode, 'B', 0 from BIG_FLAT_APPL, MODEL where BIG_FLAT_APPL.Model = MODEL.ModelName AND Bott1 <> '' ;

    Okay, there's some trickery here:

    1. We create a temporary table FE1 to help us with processing;
    2. We generate a unique code for each feature by multiplying BFA.Seq by 55 and then adding 0 for Summ1, 1 for Summ2, and so on..
    3. We do a cross join of BFA on the MODEL table, and then store the ModelCode in FE1.FeMod1 - this is temporary, but helps us immensely in the following section!
    4. We ensure that the source column is ignored if it contains a null string;
    5. With sequential columns, we will alter the values put into FePos1 and FeStatus1.

    Next, we must pull out all features, and retain a unique code for each. We (again, clumsily) do this by:

    insert into FEATURE (FeatureCode, FeatureType, FeatureText)
          select min(FeCode1), '', FeText1 from FE1 group by FeText1;

    {Note that we've copped out and left the FeatureType field a null string}

  13. BIG_FLAT_APPL to MOD_FEATURE
  14. This would seem tricky. Once we've gone through the tasks in the above section, we need to associate .FeCode and .MoCode, as well as saying whether the feature is a summary, left or bottom feature. In Addition, and this is messy, we need to assign a priority to the feature, depending on the BFA column it occurs in (the leftmost column has highest priority). No mean task! Fortunately, our above intermediate table FE1 makes it a cinch: [NOOO! to fix]

    update FE1 set FePos1 = '1' where FePos1 = 'S';
    update FE1 set FePos1 = '2' where FePos1 = 'L';
    update FE1 set FePos1 = '4' where FePos1 = 'B';

    create table MODX ( modcodeX decimal (8,0), featcodeX decimal (8,0), featposX decimal (1,0), fepriorX decimal(1,0));

    insert into MODX (modcodeX, featcodeX, featposX, fepriorX) select FeMod1, FeatureCode, cast(FePos1 as decimal), FeStatus1 from FE1, FEATURE where FeText1 = FeatureText;

    insert into MOD_FEATURE (FeCode, MoCode, PutMeAt, Priority) select featcodeX, modcodeX, sum(featposX), min(fepriorX) from MODX group by featcodeX, modcodeX;

    drop table FE1;

    Some more trickery here, and a tradeoff. We've chosen to represent PutMeAt as a decimal, allowing us to have one entry in MOD_FEATURE with several meanings. For example, say we have a video recorder with the same feature in both the summary menu and the detail menu. If we were to have two entries in MOD_FEATURE, we would have to include PutMeAt as another component of the key, or create another table. We cop out and allow the PutMeAt field to tell us that the item is represented in two places. The tradeoff is that the Priority field potentially has multiple values (and can you see the knock-on effect of this..?).

    Our implementation above reflects our decision. We group by Appliance code and feature, and then add up all the "PutMeAts" which we conveniently turn into numeric codes (Note the use of cast)! We also find the lowest priority, and attach this priority to the row!

    See that here we DO leave MOD_FEATURE.MyNumber as NULL.
    {This is indeed wicked, but inserting an arbitrary number into the field would be more wicked! }

    Notice how we clean up, removing the temporary tables FE1 and FE2.


  15. BIG_FLAT_DEAL to PLACE
  16. This section is intriguing. We can have multiple levels, as one PLACE references back to a PLACE it is within. The SupplierAddr1 is the terminal 'leaf', and will usually be unique for each supplier. This will then generally point back to a suburb, town or city, generally represented in BFD as SupplierAddr2. (We can generally ignore SupplierAddr3, as this was put in for completeness. AFAIK, there is no third line in any address up on the website at present). But then the SupplierAddr2 will be referred to Subregion and thus to Region.

    We now have to do the BIG_FLAT_DEAL thing! The relevant fields are Region and Subregion (which should already exist, or force an error), and then SupplierAddr1 and SupplierAddr2. Note that (for our purposes) we will assume that SupplierAddr1 is unique.
    {This is a gross over-simplification}.
    It is important that we realise that SupplierAddr2 may be the same as the subregion! Let's make allowances for the possibility that SupplierAddr2 = Subregion in our data insertion!

    A. Check regions and subregions

    select Region, SubRegion, SupplierName from BIG_FLAT_DEAL where (Region not in select PlaceName from PLACE where PlaceSize = 6) OR (SubRegion not in select PlaceName from PLACE where PlaceSize = 4);

    The script is easily understood - if the dealer Region or Subregion isn't already in our tables in the appropriate place, add it to a list of duds. Give us the supplier name too, so we can easily find the erroneous item. If the select statement returns any item at all, it's a dud!

    B. Logically associate SupplierAddr2 with SubRegion

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) select 2 * BIG_FLAT_DEAL.Seq, SupplierAddr2, PlaceCode, 3 from BIG_FLAT_DEAL, PLACE where Subregion = PLACE.PlaceName AND SupplierAddr2 not in (select PlaceName from PLACE where PlaceSize = 4);

    We do a cross join of BIG_FLAT_DEAL on PLACE, and insert the relevant row (with it's associated parent PLACE code). The value of 3 for PlaceSize is a convenience that simply says "it's in between a SubRegion and an address". The final AND condition is simply to prevent the case where SupplierAddr2 is already represented as a SubRegion (PlaceSize = 4)! See how the placecode is created from BFD.Seq. Compare this with the next section - we now do something very similar for SupplierAddr1:

    C. Logically associate SupplierAddr1 with SupplierAddr2

    insert into PLACE (PlaceCode, PlaceName, PlaceIsIn, PlaceSize) select 1 + 2 * BIG_FLAT_DEAL.Seq, SupplierAddr1, PlaceCode, 1 from BIG_FLAT_DEAL, PLACE where SupplierAddr2 = PLACE.PlaceName;

    See how we generate a unique code for PlaceCode from BFD.Seq! We've chosen the value of 1 for PlaceSize - this leaves us leeway to later insert another code of 2 for an intermediate line in the address! { Note that the implicit assumption that the name SupplierAddr2 can never be the same as an item in PLACE with a PlaceSize of six}.

    D. Map supplier to SupplierAddr1

    See Section ten, below.

  17. BIG_FLAT_DEAL, BIG_FLAT_CUST to TELNUM
  18. This should be fairly simple, with a few wrinkles. SupplierTelNums are all likely (but not guaranteed) to be unique, and each require a new key. The .SupplierTelCode must be inserted into the TELNUM table

    We defer the "BIG_FLAT_CUST -> TELNUM" transaction until later.

    insert into TELNUM (TShort, AreaCode, TelNumber) select 3 * Seq, SupplierTelCode, SupplierTelnum1 from BIG_FLAT_DEAL;

    insert into TELNUM (TShort, AreaCode, TelNumber) select 1 + 3 * Seq, SupplierTelCode, SupplierTelnum2 from BIG_FLAT_DEAL where SupplierTelnum2 <> '';

    insert into TELNUM (TShort, AreaCode, TelNumber) select 2 + 3 * Seq, SupplierTelCode, SupplierTelnum3 from BIG_FLAT_DEAL where SupplierTelnum3 <> '';

    See how we exclude the cases where a null string has been provided for the second and third telephone numbers.
    {There is a potential problem of redundancy where say a customer works for a Dealer and provides the same telephone number, which won't be picked up in the above. }.

  19. BIG_FLAT_DEAL to SUPPLIER
  20. This is plain extraction of the BFD.SupplierName and SupplierEmail, to SUPPLIER.SupplierName and .SupplierEmail, with a new key from KEYSOURCES. There's also that tricky PLACE field for SupplierAddr.

    insert into SUPPLIER (SupplierCode, SupplierName, SupplierEmail, SupplierAddr) select Seq, BIG_FLAT_DEAL.SupplierName, BIG_FLAT_DEAL.SupplierEmail, PlaceCode from BIG_FLAT_DEAL, PLACE where SupplierAddr1 = PlaceName;

    We've already done all the setup of SupplierAddr1 and its chain of association. Here we actually create a dealer row. We need the inner join to pick out the Placecode.

  21. BIG_FLAT_DEAL to SUPP_TEL
  22. A clean association between the TELNUM and SUPPLIER tables, using keys defined above. The only hassle is that there are three different dealer telephone numbers.

    insert into SUPP_TEL (STelShort, SuppCode) select TShort, SupplierCode from BIG_FLAT_DEAL, SUPPLIER, TELNUM where BIG_FLAT_DEAL.SupplierName = SUPPLIER.SupplierName AND SupplierTelCode = TELNUM.AreaCode AND (SupplierTelNum1 = TELNUM.TelNumber OR SupplierTelNum2 = TELNUM.TelNumber OR SupplierTelNum3 = TELNUM.TelNumber);

  23. BIG_FLAT_DEAL to SUPP_AREA
  24. Similar to the last association. The only addition is SUPP_AREA.Status, which we do not now alter from the default value of 49.

    insert into SUPP_AREA (SuppCode, AreaCode, Status) select SupplierCode, PlaceCode, 49 from BIG_FLAT_DEAL, SUPPLIER, PLACE where BIG_FLAT_DEAL.SupplierName = SUPPLIER.SupplierName AND BIG_FLAT_DEAL.SubRegion = PLACE.PlaceName;

    Okay, we put in the 49 anyway, more out of pedantry than for any other good reason.

    { What happens ????????? if we have a dealer represented in more than one place. CHECK THIS OUT CAREFULLY = HAVE TO ENSURE THAT DEALER ENTRY DOESN'T EXIST, AND IF IT DOES, DON'T DUPLICATE IT !!! Could be complex with a hierarcy of dealers or a central dealer controlling several stores?}

  25. BIG_FLAT_CUST to CUSTOMER
  26. Note the CustomerArea, which must be checked to be pre-defined! Otherwise, we're just transferring .CustomerSurname, .CustomerTitle, and .EMailAddress to the correct fields in .CUSTOMER, assuming the customer is not already defined.

    insert into CUSTOMER (CustomerCode, CustomerSurname, CustomerTitle, CustomerEmail, CustomerArea) select Seq, CustSurname, CustTitle, CEmail, PlaceCode from BIG_FLAT_CUST, PLACE where CustArea = PLACE.PlaceName AND PLACE.PlaceSize = 4;

    We ensure that the customer area supplied is a SubRegion, and that it exists, upon which we insert the PlaceCode. But NOTE - The above makes no check as to whether the customer exists. Let's try:

    create table CUS1 ( CCode1 decimal(7,0), CSurname1 varchar(30), CTitle1 varchar(5), CEmail1 varchar(30), CArea1 decimal(7,0), CCodeA1 decimal(5), dCodeB1 decimal(5), CTelA1 decimal(8), CTelB1 decimal(8), Price1 decimal(8,2), Date1 float, Model1 varchar(15), Suppl1 varchar(25) );

    insert into CUS1 (CCode1, CSurname1, CTitle1, CEmail1, CCodeA1, CCodeB1, CTelA1, CTelB1, CArea1, Price1, Date1, Model1, Suppl1) select Seq, CustSurname, CustTitle, CEmail, CTelCodeA, CTelCodeB, CTelNumA, CTelNumB, PlaceCode, CPrice, CDate, CApplCode, CSupplier from BIG_FLAT_CUST, PLACE where CustArea = PLACE.PlaceName AND PLACE.PlaceSize = 4;

    create table CUS2 ( OLDcode decimal(7,0), CCode2 decimal(7,0), CSurname2 varchar(30), CTitle2 varchar(5), CEmail2 varchar(30), CArea2 decimal(7,0), CCodeA2 decimal(5), dCodeB2 decimal(5), CTelA2 decimal(8), CTelB2 decimal(8) Price2 decimal(8,2), Date2 float, Model2 varchar(15), Suppl2 varchar(25) );

    insert into CUS2 (CCode2, CSurname2, CTitle2, CEmail2, CCodeA2, CCodeB2, CTelA2, CTelB2, CArea2, OLDcode, Price2, Date2, Model2, Suppl2) select CCode1, CSurname1, CTitle1, CEmail1, CCodeA1, CCodeB1, CTelA1, CTelB1, CArea1, CustomerCode Price1, Date1, Model1, Suppl1 from CUS1 left join CUSTOMER on CUS1.CEmail1 = CUSTOMER.CustomerEmail

    update CUS2 set CCode2 = OLDcode where OLDcode is not NULL;

    We do what we did in section 6. above, so that we now have replaced new customer numbers by old ones if they exist!

    { NB. This is very simplistic. The match to old customer depends on the customer's email being the same. Very clusmy, but we are not going to (YET) go into the complexity of other matching!

    WHAT ABOUT NEW TEL NUMBER FOR OLD CUSTOMER?!}

  27. BIG_FLAT_CUST to CUST_TEL
  28. Here we move the customer telephone number(s) from BIG_FLAT_CUST to TELNUM as promised above, and then associate customer code and telephone number in the relevant (CUST_TEL) table:

    insert into TELNUM (TShort, AreaCode, TelNumber) select 2 * CCode2, CCodeA2, CTelA2 from CUS2 where CCodeA2 || CTelA2 not in (select AreaCode || TelNumber from TELNUM);

    insert into TELNUM (TShort, AreaCode, TelNumber) select 1 + 2 * CCode2, CCodeB2, CTelB2 from CUS2 where CCodeB2 || CtelB2 not in (select AreaCode || TelNumber from TELNUM);

    insert into CUST_TEL (cTelShort, CustCode) select TShort, CCode2 from CUS2, TELNUM where CCodeA2 = AreaCode AND CTelA2 = TelNumber;

    insert into CUST_TEL (cTelShort, CustCode) select TShort, CCode2 from CUS2, TELNUM where CTelB2 is not NULL AND CCodeB2 = AreaCode AND CTelB2 = TelNumber;

    drop table CUS1;

    The first insert statement takes out all the telephone numbers (with their area codes) from CUS2, provided that the codes don't already exist within the TELNUM table. We use the clumsy concatenation because otherwise we would have difficulty if by chance two customers had the same telephone number but different area codes!
    {Note that there is still a problem! If a customer has changed their email address, they will be seen as a new customer. We still need to link them to their own unique telephone number that already (should) exist within the database. }

    The second insert statement does exactly the same for the second telephone code+number combination in CUS2.

    The last two insert statements associate the customer number (either old or new) with one or two telephone numbers. We then drop the temporary table CUS1, but keep CUS2 for later!

  29. BIG_FLAT_CUST to ENQUIRY
  30. The important one. Triple primary key of previously defined model (check), customer, and supplier. Simple insertion from BFC of .APrice, .CustComm, and (check) .SupplierCont. Remember the floating point date (The conversion will again be in Perl).

    insert into ENQUIRY (ModlCode, CustCode, SuppCode, Price, Date, CustComment) select ModelCode, CCode2, SupplierCode, Price2, Date2, CustComm from CUS2, MODEL, SUPPLIER where Suppl2 = SupplierName AND Model2 = ModelName;

    drop table CUS2;

    Note how Price2, Model2, Date2 and Suppl2, previously defined as apparently unnecessary components of CUS2, finally come into their own!

    {We have a potential problem if an appliance has been dropped from the database after the customer has ordered it?! What to do??}


Checks

There are several ways we can check our database. The most important is at design time, using logic, and test cases. We've inserted a few primitive examples into the database, and we have checked these by sequentially entering the SQL and verifying that it runs correctly. This is not sufficient - our best check will be when we extract the data using Perl scripts, and then re-synthesize the web-pages. Gross alterations in the pages will demonstrate conclusively that our database has failed. Even more diligent checking is required to validate processing of dealer information, and customer requests.

Implementation and Extracting Data

Once we've written the Perl scripts to both extract data from the original web-pages and then re-synthesise them, and the above checking has verified the validity of our process, we're finished! Well, that is, until we get around to the onerous task of re-writing the database in a more intelligent fashion. This is left as a task to the reader {??? have ref to our next page on advanced SQL}.

We therefore turn to the task of extracting data tables that will be useful to Perl in recreating HTML pages for the appliance4u website:

{here have scripts!!}


Appendix - The Perl Scripts

{here have them available, and later, once we've written our Perl page, a reference to that}


      Copyright © J van Schalkwyk, 2001