Mini SQL 2.0

User Guide



Appendix A - New Features in mSQL 2.0

Mini SQL 2.0 is the second generation of the mSQL database system. The first generation product, mSQL 1.x, was designed to provide high speed access to small data sets. The original goal was to perform 100 basic operations per second on an average UNIX workstation with small data sets using very few system resources (i.e. memory and CPU cycles). The original design goal was met and the software has proven to be popular because of this functionality.

During mSQL's life, people have used it for applications far beyond the scope of the original design. These high-end applications, containing up to 1 million rows of data, showed a need for better handling of complex queries and large data sets if the package was to be used in this way. The second generation of the mSQL server has been designed to suit these high-end applications while maintaining the original design goals of mSQL 1. It has been designed to meet three main criteria


 
 

Enhanced Indexing

One of the major flaws of mSQL 1.0 when used for larger applications was the simplistic indexing support. The original server supported only a single primary key per table and the key could consist of only one field. The internal use of the key was restricted to queries using a simple equality condition. As such, the vast majority of queries were processed without the aid of the key.

mSQL 2.0 provides much more sophisticated indexing support. Each table can have multiple indices defined for its data, with each index consisting of one to ten fields. Any index can be designated as a unique or non-unique index. The index information is stored in a series of AVL Tree structures that are mapped into the virtual memory address space of the mSQL server process. The use of AVL Trees in this way ensures that access to key data is extremely fast.

Although the Beta 1 release of mSQL 2.0 includes only the AVL indexing scheme, the database engine itself has been written to support multiple indexing formats. The underlying format of the index can be specified in the SQL command used to create the index. Other indexing schemes are under development and will be made available in subsequent releases of mSQL 2.0.

To aid in the use of the indices during query execution, a layer of abstraction know as the "candidate rows" system has been introduced into the server. The concept of the candidate rows abstraction is that during query processing, the module performing the query requests the next row from the data table that is a candidate for the selection criteria specified in the query. The requesting module is not aware of the mechanisms used to determine how that row was chosen or accessed. The "candidate row" routines are responsible for determining the best access method (based on the conditions specified in the where clause) and for retrieving the data rows as they are requested. This ensures that the optimum access method is used whenever a row of data is accessed without replicating the access logic in each module and without any "special case" algorithms.

Because the candidate row abstraction provides a single logic path for the acquisition of data rows, it can also be used to optimise the query. A simple query optimiser has been included in the candidate row abstraction and the functionality of the optimiser will be enhanced over time.
 
 
 
 

Data Types Another of the limiting factors of the performance of mSQL 1.0 was the size to which tables grew. Given an increasing number of rows, the amount of data that needed to be manipulated in memory increased proportionally. Unfortunately, the fixed length field structure of mSQL 1.0 usually forced a lot of white space and field padding to be included in the data.

To overcome this problem, mSQL 2.0 includes support for a variable length char type (text). The text type allows an unrestricted amount of data to be inserted into a field by using an overflow buffer scheme to hold data beyond the specified size of the field. This provides the best of both worlds in that the database designer can specify the average size of a char field ensuring that in most cases, the data will be held in the data table. If a value is inserted that is longer than average, it will be split between that data table and the overflow buffers. This eliminates the need to specify overly large fields (e.g. 255 character) for storage of URLs and filenames.

To provide a more complete SQL environment, future releases of mSQL will include more of the "standard" data types defined by the SQL standard. These will include date/time, currency, and various other types that are provided by the larger database systems.

System Variable / Pseudo Fields The 2.0 engine includes a framework for supporting system variable or pseudo fields. These fields are data elements maintained by the engine itself but are accessed using a normal select call. Some of the data pertains to an entire table, some to a particular row, and some to the current session. The 2.0 engine provides support for the following system variables
 
Name
Description
_rowid An internal value used to identify a row based on its location. The _rowid field can be used in where clauses during updates and deletes to specify a particular row. 
_timestamp An internal value indicating when the row was last modified
_sysdate The current time and date on the machine running the database engine returned in standard UNIX time format (e.g. seconds since the epoch) 
_user The username associated with the session over which the query was submitted 

 

Sequences

To overcome the problem of trying to manage sequences in client application code, mSQL 2.0 provides in-built, atomic operations for accessing and managing sequences. A sequence is a numeric counter that is automatically adjusted to the next value in the sequence each time it is accessed. The sequence is created using a version of the SQL CREATE command and can be created with a user defined initial value and also a user defined step value (i.e. the value added to the sequence after each access.

Any table can have a sequence created for it but a table can only contain a single sequence. Once the sequence has been created, accessing the sequence value is achieved using the _seq system variable, that is, by using a query such as

select _seq from foo The current sequence value is returned and the sequence is updated to the next value in the sequence. Access to and modification of the sequence is atomic.
 
Complex Expressions Unlike version 1.x of mSQL, 2.0 supports the notion of complex expressions in where clauses (i.e. the use of parenthesis and sub conditions in a condition). This removes a major limitation in mSQL as it was impossible to perform queries like SELECT name FROM staff

WHERE (staff_id <100 OR staff_id > 200)

AND dept = 'finance'

mSQL 2.0 supports the nesting of sub-conditions to any depth and there can be any number of sub-conditions in the where clause.
Regular Expressions ANSI SQL defines a very simple regular expression language for use in matching data with the LIKE operator. mSQL 1.x implemented a superset of this functionality by providing access to the complete UNIX regular expression syntax in a LIKE clause. Processing a UNIX regular expression is far more "expensive" than processing the simplistic functionality defined in the ANSI specification and as such LIKE based searching was quite slow in 1.x. To improve this, mSQL 2.0 provides a standard SQL LIKE operator and also offers the extended UNIX syntax via the RLIKE (i.e. regexp LIKE) operator. Most queries will require only the simple functionality offered by the standard SQL LIKE but the extended functionality is retained for compatibility. It should be noted that the new LIKE operator is much faster than the full UNIX version offered by 1.x.

One of the main uses of the UNIX regular expression syntax in 1.x was for performing case insensitive searches. In standard SQL, the way to perform case insensitive searches is to use functions like UPCASE() in the query. 2.0 offers a non-standard operator known as CLIKE which implements a case-insensitive version of the standard SQL LIKE operator which both solves the problem and provides much better performance.

ORDER BY and DISTINCT Most "real-world" applications utilise the sorting and DISTINCT functionality of an SQL server when presenting a list of data returned from the database. The implementation of ORDER BY and DISTINCT in mSQL 1.x proved to be a performance bottleneck for serious applications. To overcome this, 2.0 offers a new sorting implementation (based on the quicksort algorithm) and also has a faster DISTINCT implementation. Client Connections As the popularity of mSQL for use behind web servers increased it became apparent that the limit of 24 simultaneous client connections was a major problem. 2.0 overcomes this by reconfiguring the server's internal client connection tables at run-time to handle the maximum number of connections possible based on the operating system and the way the kernel is configured. On an average OS, mSQL 2.0 will reconfigure itself to handle over 200 simultaneous client connections. Run-time Configuration Configuration details, such as the location of the UNIX and TCP ports etc., was hard-coded into the server and API library in mSQL 1.x. To provide more flexibility, all configuration details are now included in a configuration file that is loaded at run-time by both the server and any client applications that talk to the server. By modifying a value in the config file, all applications will use the new value when they are next executed. All programs also offer a run-time flag to allow the loading of a non-default configuration file to allow for testing of new servers or applications.
 
 

To reduce the risk associated with root-owned daemon processes, mSQL 2.0 can be configured to run as any user (via the config file). By default, the server will run as a user called msql once it has started. If the server is run as root it will call setuid() to change to the desired user once it has initialised itself and performed any startup operations.

Lite & W3-mSQL 2.0 The mSQL 2.0 includes the Lite and W3-mSQL tools to aid in the development of applications. W3-mSQL 2.0, the second generation WWW interface package, is included as a standard tool. The new W3-mSQL code provides a complete scripting language, with full access to the mSQL API, within an HTML tag. This tool can be used to develop sophisticated GUI based applications that are platform independent and available as shared resources on a network. Along with the mSQL API, a library of nearly 60 other functions, including file I/O, strings handling and date/time manipulation are available to the scripts within a W3-mSQL enhanced web page.

To solve another problem associated with delivering "real" applications over the web, W3-mSQL provides an enhanced and flexible authentication system. Any page that is accessed via W3-mSQL is subjected to the new W3-auth access scrutiny. Access can be restricted via a combination of username/passwd and requesting host. Configuration of the security system, including management of user groups, definition of secure areas, and creation of authorised users, is via a graphical interface accessed via a web page.

Access to mSQL from scripting languages has become popular and virtually all major scripting languages provide an interface to the original mSQL server. Support for script based access to mSQL becomes standard in mSQL 2.0 with the inclusion of its own scripting language. The language, called Lite, is a stand-alone version of the language provided by W3-mSQL (i.e. the language that W3-mSQL offers inside the special HTML tags is Lite) and includes access to the mSQL API and the other functions mentioned above. Lite, as its name implies, is a lightweight language yet provides a powerful and flexible programming environment. The syntax of the language will be very familiar to C programmers (and ESL programmers) and provides shell-like ease of use. A future release of Lite will include support for ASCII forms to provide a rapid development environment for non-graphical mSQL-based applications.
 
 

Other tools mSQL 2.0 is bundled with a couple of new tools. To aid migration of data to and from mSQL, two new utilities have been added to the distribution for the 2.0 release. msqlimport and msqlexport provide a mechanism for the import and export of data as formatted text files. Migrating data from other databases into mSQL 2.0 will require just a simple export from the source database and a subsequent import into the new mSQL database. The tools have been developed to be flexible enough to support virtually any text based formatting of the exported data.

Other familiar tools have been modified to reflect the functionality of mSQL 2.0. relshow can provide detailed information about all the table structure elements, including the indices defined on tables. msqladmin has been modified to provide statistical information on a per connection basis (so you can monitor who's doing what and when).

Appendix B - mSQL Error Messages Listed below is a complete set of the error messages generated by the mSQL database engine and the client API library. Accompanying each error message is an indication of the cause of the error and actions that you can take to resolve the problem. The errors relate to user-generated problems, such as incorrect SQL query syntax, and also system related problems, such as running out of disk space. A user will not normally see many of these errors. API Library Error Messages Bad packet received from server

The server process received a data request that was incorrectly formatted. Such requests are ignored by the server and are probably caused by using an incorrect implementation of the client API library.
 
 

Can't find your username. Who are you?

The mSQL client library has attempted to translate the User ID (Unix UID) of the process running the client application into a username. The attempt failed. This will be the result if the UID of the client process is not listed in the system?s password file.
 
 

Can't create UNIX socket

The API library has attempted to communicate with an mSQL server running on the local machine. In doing so, it tried to create a UNIX domain socket over which the client / server communications would pass. The attempt to create this socket failed. If this error message is generated when starting the server process, it implies that the user running the server does not have permission to create the UNIX socket. Check the msql.conf file to determine whether the mSQL_User field and the UNIX_Port fields are set correctly. If the error is generated by a client application it can indicate that the client process has too many files open at the same time.
 
 

Can't connect to local mSQL server

An attempt was made to form a connection with an mSQL server running on the local machine. The attempt failed. The mSQL server process is probably not running on this machine.
 
 

Can't connect to mSQL server on

An attempt to connect to an mSQL server process running on a remote machine failed. This is usually due to not having an mSQL server running on the remote machine
 
 

Can't create IP socket

The API library attempted to create a TCP/IP socket for use in communicating with an mSQL process on a remote machine. The attempt failed. This is commonly caused by having too many open files in the client application.
 
 

mSQL server has gone away

While the API library was communicating with an mSQL server process, the server process closed the client / server connection. This can be caused by the server process being terminated, the machine on which the server is running being rebooted, or a network failure if the server is on a remote machine.

Protocol mismatch. Server Version = x Client Version = y

The version of the mSQL client / server protocol used by the client API library and the server process do not match. Upgrading the server software but not relinking the client applications that are communicating with the server can cause this problem.
 
 

Unknown mSQL error

An error occurred while communicating with the server process but the server was not able to determine the cause of the error.
 
 

Unknown mSQL Server Host

The hostname passed to the msqlConnect( ) function could not be resolved into an IP address. Ensure that the listed machine is in the nameserver or hosts file of the machine running the client applications
 
 
 
 

Server Error Messages Access to database denied

An attempt to access a database was rejected based on the contents of the access control list for that database.
 
 

Bad handshake

The initial handshake during connection establishment between a client application and the mSQL server process was incorrectly formatted.
 
 

Bad order field. Field "x" was not selected

The field x was used in an "order by" statement but it was not included in the list of fields selected from the table. You can only order by a field you have selected.
 
 

Bad type for comparison of 'x?

The value used within the where condition associated with the field x was of an incompatible type.
 
 

Can't perform LIKE on int value

An attempt was made to perform a regular expression match on an integer value. Regex searching can only be performed on character fields.

Can't perform LIKE on real value"

An attempt was made to perform a regular expression match on an real value. Regex searching can only be performed on character fields.
 
 

Can't get hostname for your address

The server process tried to resolve a hostname from the IP address of the machine running the client application. This will occur if the client machine is not listed in the nameserver. You may overcome this problem by setting the Host_Lookup field of msql.conf to false. Note that disabling this option will effect access control for the databases as the hostname of the client is used during ACL tests.
 
 

Can't open directory "x" (y)

An attempt to access the directory listed as x was not successful. The system error message is returned as y.
 
 

Can't use TEXT fields in LIKE comparison

A select query attempted to perform a regular expression match against a TEXT field. TEXT fields cannot be used in regex matches.
 
 

Couldn't create temporary table (y)

An attempt to create the files associated with a temporary table failed during the execution of a query. This is usually caused by having the permission of the msqldb/.tmp directory in the installation directory set incorrectly. The system error message is returned as y.
 
 

Couldn't open data file for x (y)

An attempt to open the data file for a table called x in the currently selected database failed. The system error message is returned as y.
 
 

Data write failed (x)

A write operation failed while attempting to store data into a table or temporary table. The system error message describing the error is returned as x. The usual reason for this problem is lack of disk space during an insert operation or a multi-table join that produced an overly large result set.
 
 

Error creating table file for "x" (y)

Creation of the data file for table x failed. A description of the error is returned in y
 
 

Error reading table "x" definition (y)

An error was encountered while the server attempted to read the table definition for the table called x in the currently selected database. The system error message is returned as y.
 
 

Field "x" cannot be null

An attempt was made during either an insert or an update operation to set the value of a NOT NULL field to NULL.
 
 

Index field "x" cannot be NULL

An attempt to insert a NULL value into an index was rejected by the server. Indices cannot be set to NULL.
 
 

Index condition for "x" cannot be NULL

Because indices cannot contain NULL values, the attempt to use a NULL condition for an index lookup was rejected.
 
 

Invalid date format "x"

The value "x" is not a valid date format and was rejected by the server. This may occur when inserting into a date field, updating the value of a date field, or when using a date value in a where condition
 
 

Invalid time format "x"

The value "x" is not a valid time format and was rejected by the server. This may occur when inserting into a time field, updating the value of a time field, or when using a time value in a where condition
 
 

Literal value for 'x' is wrong type

During either an insert or an update operation, an attempt was made to set the value of a field to a type that was not compatible with the defined type of the field (e.g. setting an integer field to a character value).
 
 

No Database Selected

A query was sent to the server before a database had been selected. The client application must call the msqlSelectDB function before submitting any queries.
 
 

No value specified for field 'x'

An insert operation sent to the database engine did not include a value for field x. Either the field x was specified in the field list and not enough values were provided or the number of values provided did not match the total number of fields in the table.
 
 

Non unique value for unique index

A value that was assigned to a unique index via either an insert or an update clause attempted to insert a duplicate value.
 
 

Out of memory for temporary table

An attempt by the server to malloc a memory segment was refused during the creation of a temporary table (during execution of a query).
 
 

Permission denied

An attempt to access a database in a particular manner was refused due to the ACL configuration of the selected database. Usually, write access (i.e. inserts or updates) are rejected because the database is listed as read-only for that client.
 
 

Reference to un-selected table "x"

A select query referenced a variable from table x when x was not listed as a table in the select clause.
 
 

Table "x" exists"

An attempt to create a table called x failed because the table already exists.
 
 

Too many connections

An attempt to connect to the mSQL server process was refused because the maximum number of simultaneous client connections has been reached.
 
 

Too many fields in query

The query attempted to reference more fields in a single query than the server permits. By default, up to 75 fields can be referenced in the same query.
 
 

Too many fields in condition

The number of conditions included in the where clause exceeded the allowed maximum of 75.
 
 

Unknown command

A unknown client / server protocol command was received by the server and was rejected.
 
 

Unknown database "x"

An attempt was made to access a database called x although no such database is defined on the server

Unknown table "x"

An attempt was made to access a table called x although no such table is defined in the currently selected database.
 
 

Unknown field "x.y"

An attempt was made to reference a field called x in a table called y. Table y of the currently selected database does not contain a field called x.
 
 

Unknown system variable "x"

A query referenced a system variable called x when no such system variable exists.
 
 

Unqualified field in comparison

An unqualified field name was used in the where clause of a join. You must fully qualify all field names if you reference more than one table.
 
 

Unqualified field "x" in join

The field x was not fully qualified in the field list of a join. All fields referenced in a join must be fully qualified.
 
 

Value for "x" is too large

The value provided for field x in either an insert or an update clause was larger than the defined length of the field