mSQL 2.0 User Guide

Mini SQL 2.0

User Guide

The mSQL Query Language

The mSQL language offers a significant subset of the features provided by ANSI SQL. It allows a program or user to store, manipulate and retrieve data in table structures. It does not support some relational capabilities such as views and nested queries. Although it does not support all the relational operations defined in the ANSI specification, mSQL provides a significant subset of the ANSI SQL standard and is capable of supporting the vast majority of applications.

The definitions and examples below depict mSQL key words in upper case, but no such restriction is placed on the actual queries.

The Create Clause The create clause as supported by mSQL 2 can be used to create tables, indices, and sequences. The three valid constructs of the create clause are shown below:

CREATE TABLE table_name (

col_name col_type [ not null ]

[ , col_name col_type [ not null ] ]**

)

CREATE [ UNIQUE ] INDEX index_name ON table_name (

field_name

[ , field_name ] **

)

CREATE SEQUENCE ON table_name [ STEP step_val ] [ VALUE initial_val ]

An example of the creation of a table is shown below:

CREATE TABLE emp_details (

first_name char(15) not null,

last_name char(15) not null,

comment text(50),

dept char(20),

emp_id int

)

The available types are:-

Type	Description
char (len)	String of characters (or other 8 bit data)
text (len)	Variable length string of characters (or other 8 bit data) The defined length is used to indicate the expected average length of the data. Any data longer than the specified length will be split between the data table and external overflow buffers. Note : text fields are slower to access than char fields and cannot be used in an index nor in LIKE tests.
int	Signed integer values
real	Decimal or Scientific Notation real values
uint	Unsigned integer values
date	Date values in the format of ?DD-Mon-YYYY? such as ?1-Jan-1997?
time	Time values stored in 24 hour notation in the format of ?HH:MM:SS?
money	A numeric value with two fixed decimal places

The table structure shown in the example would benefit greatly from the creation of some indices. It is assumed that the emp_id field would be a unique value that is used to identify an employee. Such a field would normally be defined as the primary key. mSQL 2.0 has removed support for the primary key construct within the table creation syntax and replaced it with the more powerful and flexible indexing scheme. Similarly, a common query may be to access an employee based on the combination of the first and last names. A compound index (i.e. constructed from more than one field) would improve performance. Naturally, such a compound index may have multiple entries with the same value (if more than one person called John Smith works for the same company) so a non-unique index would be required. We could construct these indices using : CREATE UNIQUE INDEX idx1 ON emp_details (emp_id)
CREATE INDEX idx2 ON emp_details (first_name, last_name)

These indices will be used automatically whenever a query is sent to the database engine that uses those fields in its WHERE clause. The user is not required to specify any special values in the query to ensure the indices are used to increase performance.

Sequences provide a mechanism via which a sequence value can be maintained by the mSQL server. Sequences are a numeric value that can be used as serial numbers, staff identifiers, invoice numbers, or any other application that requires a unique numeric value. Having the server maintain the index allows for atomic operations (such as getting the next sequence value) and removes the concerns associated with performing these operations in client applications. A client application would need to send two queries (one to read the current value and one to update the value) which introduces a "race condition" and the potential for the same sequence value to be assigned to multiple items.

A sequence is associated with a table and a table may contain at most one sequence. Once a sequence has been created it can be accessed by SELECTing the _seq system variable from the table in which the sequence is defined. For example

CREATE SEQUENCE ON test STEP 1 VALUE 5
SELECT _seq FROM test

The above CREATE operation would define a sequence on the table called test that had an initial value of 5 and would be incremented each time it is accessed (i.e. have a step of 1). The SELECT statement above would return the value 5. If the SELECT was issued again, a value of 6 would be returned. Each time the _seq field is selected from test the current value is returned to the caller and the sequence value itself is incremented.

Using the STEP and VALUE options a sequence can be created that starts at any specified number and is increased or decreased by any specified value. The value of a sequence would decrease by 5 each time it was accessed if it was defined with a step of -5.

The Drop Clause The Drop clause is used to remove a definition from the database. It is most commonly used to remove a table from a database but can also be used for removing several other constructs. In 2.0 it can be used to remove the definition of an index, a sequence, or a table. It should be noted that dropping a table or an index removes the data associated with that object as well as the definition. Dropping a table removes any indices or sequences defined for the table.

The drop clause cannot be used to remove an entire database. Dropping a database is achieved by using the msqladmin utility program that is included in the software distribution.

The syntax of the drop clause as well as examples of its use are given below.

DROP TABLE table_name
DROP INDEX index_name FROM table_name
DROP SEQUENCE FROM table_name

Examples of the use of the drop clause for removing an entire table, an index and a sequence are shown below.

DROP TABLE emp_details
DROP INDEX idx1 FROM emp_details
DROP SEQUENCE FROM emp_details

The Insert Clause The insert clause is used to insert or add data to the database. When inserting data you may either specify the fields for which you have provided data (if you are not providing data for every field in the data row) or you may omit the field names if you are providing data for every field. If you do not specify the field names they will be used in the order in which they were defined - you must specify a value for every field if you use this form of the insert clause. If you provide the field names then the number of data values provided must match the number of fields specified.

INSERT INTO table_name [ ( column [ , column ] ** ) ]

VALUES ( value [ , value ] ** )

for example INSERT INTO emp_details ( first_name, last_name, dept, salary)

VALUES ( 'David', 'Hughes', 'Development',12345.00)

INSERT INTO emp_details

VALUES ('David', 'Hughes', 'Development',12345.00)

The Select Clause The select clause is used to extract data from the database. It allows you to specify the particular fields you wish to retrieve and also a condition to identify the records or rows that are of interest. The ANSI SQL standard defines two features that are not supported by mSQL. The mSQL implementation of the select clause does not support

Nested selects
Implicit functions (e.g. count(), avg() )

It does however support:

Relational joins between multiple tables
Table aliases
DISTINCT row selection for returning unique values
ORDER BY clauses for sorting
Normal SQL regular expression matching
Enhanced regular expression matching including case insensitive and soundex
Column to Column comparisons in WHERE clauses
Complex conditions

The formal definition of the syntax for mSQL's select clause is

SELECT [table.]column [ , [table.]column ]**

FROM table [ = alias] [ , table [ = alias] ]**

[ WHERE [table.] column OPERATOR VALUE

[ AND | OR [table.]column OPERATOR VALUE]** ]

[ ORDER BY [table.]column [DESC] [, [table.]column [DESC] ]

OPERATOR can be <, >, =, <=, =, <>, LIKE, RLIKE, CLIKE or SLIKE
VALUE can be a literal value or a column name

The condition used in the where statement of a select clause may contain '(' ')' to nest conditions or to focus on parts of the conditional evaluation. e.g. "where (age <20 or age >30) and sex = 'male'" .

A simple select that returns the first and last names of anybody employed in the finance department would be

SELECT first_name, last_name FROM emp_details

WHERE dept = 'finance'

To sort the returned data we would add an ORDER BY statement to the select clause. mSQL supports sorting on multiple values in either ascending or descending order for each value. If a direction is not specified it defaults to ascending order. To sort the data from the previous query in ascending order by last_name and descending order by first_name we could use the query below. Note that the two sorting values are separated by a comma and that the first_name field includes the DESC attribute to indicate we sorting is required in descending order.

SELECT first_name, last_name FROM emp_details

WHERE dept = 'finance'

ORDER BY last_name, first_name DESC

A query such as the one presented above may return multiple of the same value. If for example there were two people named John Smith working in the finance department the name ?John Smith? would be returned twice from the query. You may remove any duplicates from the returned data by providing the DISTINCT attribute with the query. An example of using the DISTINCT attribute to remove duplicates from the above query is given below.

SELECT DISTINCT first_name, last_name FROM emp_details

WHERE dept = 'finance'

ORDER BY last_name, first_name DESC

mSQL provides four regular expression operators for use in where comparisons. These operators may be used to perform "fuzzy" matching on the data if you do not know the exact value for which you are searching. An example of such a search would be if you were looking for any employee with a last_name starting with ?Mc? such as McCormack or McDonald. In such a situation you cannot provide a complete value for the last_name field as you are only interested in part of the value.

The standard SQL syntax provides a very simplistic regular expression capability that does not provide the power nor the flexibility of which UNIX programmers or users will be accustomed. mSQL supports the "standard" SQL regular expression syntax, via the LIKE operator, but also provides further functionality if it is required. The available regular expression operators are:

LIKE - the standard SQL regular expression operator.
CLIKE - a standard LIKE operator that ignores case.
RLIKE - a complete UNIX regular expression operator.
SLIKE - a ?soundex? matching operator (i.e. phonetic matching)

Note : CLIKE, RLIKE, and SLIKE are not features of standard SQL and may not be available in other implementations of the language. If you choose to use them you may have problems porting your application to other database systems. They are, however, very convenient and powerful features of mSQL.

LIKE and CLIKE utilise the regular expression syntax as specified in the ANSI SQL standard. As mentioned above, the ANSI standard regular expression feature provides only a very simplistic implementation of regular expressions. It provides for only single and multiple character wildcards. It does not include enhanced features such as value ranges, value exclusions or value groups. The syntax of the LIKE and CLIKE operators is provided in the following table.

Operator	Description
_	matches any single character
%	matches 0 or more characters of any value
\	escapes special characters (e.g. '\%' matches % and '\\' matches \ ). All other characters match themselves

Two examples of using the LIKE operator are provided below. In the first we are searching for anyone in the finance department whose last name consists of any letter followed by 'ughes', such as Hughes. The second example shows the query for the ?Mc? example mentioned earlier in this section.

SELECT first_name, last_name FROM emp_details

WHERE dept = 'finance' and last_name like '_ughes'

SELECT first_name, last_name FROM emp_details

WHERE dept = ?finance? and last_name like ?Mc%?

The RLIKE operator provides access to the power of the UNIX standard regular expression syntax. The UNIX regular expression syntax provides far greater functionality than SQL's LIKE syntax. The UNIX regex syntax does not use the '_' or '%' characters in the way SQL's regex does (as outlined above) and provides enhanced functionality such as grouping, value ranges, and value exclusion. The syntax available in the RLIKE operator is shown in the table below. Tutorials for using UNIX regular expression matching are available in the manual pages of any UNIX system (such as the manual pages for grep or ed).

Because RLIKE utilises a complete UNIX regex implementation, the evaluation of a condition containing the RLIKE operator is quite complex. The performance of searches using the RLIKE operator will be slower than those using the LIKE or CLIKE operator. You should only use the RLIKE operator if you cannot achieve your desired matching using the more simplistic LIKE or CLIKE operators.

Operator	Description
.	The dot character matches any single character
^	When used as the first character in a regex, the caret character forces the match to start at the first character of the string
$	When used as the last character in a regex, the dollar sign forces the match to end at the last character of the string
[ ]	By enclosing a group of single characters within square brackets, the regex will match a single character from the group of characters. If the ']' character is one of the characters you wish to match you may specify it as the first character in the group without closing the group (e.g. '[]abc]' would match any single character that was either ']', 'a', 'b', or 'c'). Ranges of characters can be specified within the group using the 'first-last' syntax (e.g. '[a-z0-9]' would match any lower case letter or a digit). If the first character of the group is the '^' character the regex will match any single character that is not contained within the group.
*	If any regex element is followed by a '' it will match zero or more* instances of the regular expression. To match any string of characters you would use ?.? and to match any string of digits you would use ?[0-9]?

The SLIKE operator provides soundex matching of values (i.e. one value sounds like another value). It does not use an explicit syntax in the same way as the other LIKE operators. You simply provide the word you wish to match. If you wished to search for any name that sounded like ?Hughes?, such as ?Hues? you could use SLIKE ?Hughes?.

Relational joining is one of the most powerful features of a relational query language. The concept of "joining" relates to "merging" multiple database tables together and extracting fields from the merged result. As an example, if you had two tables defined, one containing employee details and another containing a list of all current, you may wish to extract a list of the projects that each employee was working on. Rather than duplicating the employee details in the projects table you could simply include the employees staff ID number in the projects table and use a join to extract the first and last names.

The query below is an example of such an operation. The logic behind the query is that we want to extract the first and last names of the employee, plus the name of the project on which the employee is working. We can identify which combinations of the merged table we are looking for as they will have a common value for the employee?s staff ID value. Because we are referencing multiple tables in the query, we must include the table name for each field when it is included in the query (e.g. emp_details.first_name rather than just first_name)

SELECT emp_details.first_name, emp_details.last_name, project_details.project

FROM emp_details, project_details

WHERE emp_details.emp_id = project_details.emp_id

ORDER BY emp_details.last_name, emp_details.first_name

It is important to understand the inner workings of a join. If we are joining table A with table B, a merged row will be created for all possible combinations of the rows of both tables. If table A contains only two rows and table B contains 10 rows, then 20 merged rows will be generated and evaluated against the where condition. If no where condition is specified then all 20 rows will be returned. If this example is extended so that table A contained 1,000 rows and table B contained 2,500 rows then the result would be 1,000 * 2,500 merged rows (that?s two and a half million rows!). Whenever a join is used there should normally be a common value in both tables (such as the employee ID in our example) and the condition must include a direct comparison between these two fields to ensure that the result set is limited to only the desired results.

mSQL places no restriction on the number of tables "joined" during a query so if there were 15 tables all containing information related to an employee ID in some manner, and each table included the employee ID field to identify the employee, data from each of those tables could be extracted, by a single query. As mentioned before, a key point to note regarding joins is that you must qualify all field names with a table name. Remember that you must qualify every column name as soon as you access more than one table in a single select.

mSQL also supports table aliases so that you can perform a join of a table onto itself. This may appear to be an unusual thing to do but it is a very powerful feature if the rows within a single table relate to each other in some way. An example of such a table could be a list of people including the names of their parents. In such a table there would be multiple rows with a parent/child relationship. Using a table alias you could find out any grandparents contained in the table using the query below. The logic is to find any person who is the parent of someone?s parent.

SELECT t1.parent, t2.child from parent_data=t1, parent_data=t2

WHERE t1.child = t2.parent

The table aliases t1 and t2 both point to the same table (parent_data in this case) and are treated as two different tables that just happen to contain exactly the same data. Like any other join, the possible result set size is the multiplication of the number of rows in each table. If a table is joined with itself, this equates to N² rows where N is the number of rows in the original table. Care must be taken to ensure that the result set is limited by the condition specified otherwise the query can take a very long time to complete and has the potential to fill your disk drive with temporary data as the query is processed.

The Delete Clause The SQL DELETE clause is used to remove one or more entries from a database table. The selection of rows to be removed from the table is based on the same where statement as used by the SELECT clause. In the SELECT clause, the where condition is used to identify the rows to be extracted from the database. In the DELETE clause, the where condition identifies the rows that are to be deleted from the database. As with all SQL queries, if no where condition is provided, then the query applies to every row in the table and the entire contents of the table will be deleted. The syntax for mSQL's delete clause is shown below

DELETE FROM table_name

WHERE column OPERATOR value

[ AND | OR column OPERATOR value ]**

OPERATOR can be <, >, =, <=, =, <>, LIKE, RLIKE, CLIKE, or SLIKE

An example of deleting a specific employee (identified by the employee ID number of the person) and also deleting every employee within a particular salary range is given below.

DELETE FROM emp_details WHERE emp_id = 12345

DELETE FROM emp_details WHERE salary > 20000 and salary < 30000

The Update Clause The SQL update clause is used to modify data that is already in the database. The operation is carried out on one or more rows as specified by the where construct. If the condition provided in the where construct matches multiple rows in the database table then each matched row will be updated in the same way. The value of any number of fields in the matched rows can be updated. The syntax supported by mSQL is shown below.

UPDATE table_name SET column=value [ , column=value ]**

WHERE column OPERATOR value

[ AND | OR column OPERATOR value ]**

OPERATOR can be <, >, =, <=, =, <>, LIKE, RLIKE, CLIKE or SLIKE

For example UPDATE emp_details SET salary=30000 WHERE emp_id = 1234

UPDATE emp_details SET salary=35000, dept=?Development? where emp_id = 1234