ExSite Developers Guide - Database

This document describes the ExSite data model and database schema, and provides important information for developers who are working with ExSite core data at a low level, or extending the database for other purposes.
  • DB Map
  • Extending/Modifying the Database
  • The ExSite Database API
  • Virtual Tables
  • Security
  • Foreign Keys
  • Using Multiple Databases
  • Alternative Database Engines

    Data Model

    ExSite prefers (but does not strictly require) that its data be divided into uniquely addressable tables, records, and columns. Tables and columns are addressed by name, while records are normally addressed by a unique primary key (by default, ExSite uses serial number primary keys under MySQL).

    Since, for 3rd-party databases, not all records necessarily have a unique primary keys, ExSite doesn't strictly require unique primary keys for all records, but some database API features may not be available on those records.

    Data relations are handled as simple references to the foreign records' primary keys. For instance, in the simple case where serial number primary keys are used, the reference is simply the integer value of the foreign key, stored in an integer column in the current record. The referenced record is the "parent record", and the record doing the referencing is the "child record". We sometimes refer to parent tables and child tables, when all records in those tables have these relationships. One-to-many data relations are a simple case of the parent-child record structure. Many-to-many data relations are handled using the conventional trick of using a join table that contains two (or more) foreign key references, to the tables being joined.

    Technical note: an example of the one-to-many data structure can be seen in the content table, which is a child of the page table. An example of the many-to-many data structure can be seen in the member_site_link table, which links the member table and the section table in a many-to-many relationship.

    Most of the above is fairly basic relational database theory, and can be implemented using a wide variety of low-level database engines. ExSite includes an abstraction layer on top of this that extends this basic data storage/retrieval functionality further:

    These features are independent of the underlying database engine, which can be SQL-based or other.

    A simplified set of data accessor functions is defined, for basic queries and data modification tasks: fetch (retrieve a single record), fetch_child (retrieve all child records of a given parent record), fetch_match (retrieve all records that match a set of column->value pairs), insert (add a new record to a table), update (modify an existing record, or set of matching records), delete (remove records permanently), get_ncol (get the number of columns in a table), count (count the records in a table), get_columns (get a list of column names in a table).

    Schema

    This section describes the schema of the core tables in the ExSite database. This schema can be extended with the addition of extra plug-ins, but plug-in tables are not described here. For every column, the ExSite datatype is given. This is an abstraction of the underlying database datatype, as it includes additional web application logic.

    Content Tables

    Authorization Tables

    Meta-Data Tables

    DB Map

    The DB Map is a collection of meta-data describing all tables and columns in the database. ExSite uses this meta-data for various purposes:

    There are two DB Map implementations supported by ExSite: internal and external.

    Internal maps store the meta-data in a special set of tables in the core database. The advantage of this approach is that the DB Map can be configured using the same database management tools that are used to operate on the core database. This makes it easier to change database configurations, especially using ExSite's web-driven database management tools.

    External maps store the meta-data in a separate database. The main advantage of this approach is that it is easy to add a DB Map to a pre-existing database that is unaware of ExSite, without having to modify that database.

    Internal Map Schema

    dbmap_datatype

    This table defines the datatypes recognized by ExSite.

    Columns:

    dbmap_datatype_id - the ID of this datatype record
    type - the name of the datatype
    tag - the HTML input tag that should be used for entering data of this type. This is used in cases where there are several possible choices (eg. select menus vs. radio buttons).
    validation_regexp - a regular expression that is used to validate inputs. In the case of columns that accept values from a predefined list, the acceptable values should be separated by a vertical bar, '|'.
    error - an error message to report in the event of a failed validation.

    dbmap_table

    This table defines every table that is recognized by ExSite. Although the core database may include other tables, ExSite will only operate on tables that are defined here.

    Columns:

    dbmap_table_id - the numeric ID of this table
    name - the name of the table, according to ExSite.
    alias - the name of the table, according to the underlying database engine.
    pkey - the primary key column of the table.
    title - a plain-English label for the table. This is typically given in singular form (eg. "Library").
    plural - the pluralized form on the title (eg. "Libraries")
    report - the default format to generate multi-record reports in. One of table, full, dump, index, list_numbered, list_bulleted, address, or description.
    allow_read - the minimum user access level to view records from this table.
    allow_insert - the minimum user access level to insert records into this table.
    allow_update - the minimum user access level to modify records in this table.
    allow_delete - the minimum user access level to delete records from this table.
    allow_search - the minimum user access level to search for records from this table.

    dbmap_column

    This table defines the meta-data for every column in the database.

    Columns:

    dbmap_column_id - the numeric ID of this column.
    dbmap_table_id - the ID of the table this column belongs to.
    name - the text ID of this column, unique within this table.
    dbmap_datatype_id - the ID of the datatype of this column.
    size - an integer giving the maximum size of the data expected for this column. This is used to (1) determine the size of the HTML input field, and (2) to generate SQL commands to generate the 'CREATE TABLE' commands to set up the tables and columns in a MySQL (or similar) database.
    label - the text string used as input prompts in forms and column headings in reports.
    display_rule - the logic used for deciding when to display this column in reports. One of: key (this column is the human-readable "key" for this table, and should always be displayed), brief (display this column in summary listings of multiple records), full (display only in views of individual records), or none (do not display in user reports).
    allow_read - the user access level needed to view data in this column.
    allow_write - the user access level needed to enter data into this column.
    validate - the logic used for validating data entered into this column. One of: hard (data must be present, according to underlying database engine - ExSite cannot override), soft (data must be present according to ExSite, but not underlying database engine - ExSite can override this), yes (data does not have to be present, but should be validated if it is), and no (data does not have to be present, and need not be validated).
    default - a default data value for this column.
    help - a help string explaining this data column to users of the ExSite database manager tool.

    External Map Schema

    By convention, external maps are stored in a plain text database - ie. tab-delimited text files with the first line consisting of column headings. These are placed in a subdirectory of cgi-bin (usually cgi-bin/dbmap). The following files comprise the DBmap:

    .table
    equivalent to dbmap_table, above
    .datatype
    equivalent to dbmap_datatype, above
    <table>
    other files correspond to tables of the same name as the file; each line in these files corresponds to a column in its respective table.

    The column names in the external map differ a little from those in the internal map. (The reasons for this are partly historical, and partly because some of the external map column names are reserved words in MySQL.)

    .datatype

    datatype - the name of the datatype
    tag - the HTML input tag that should be used for entering data of this type. This is used in cases where there are several possible choices (eg. select menus vs. radio buttons).
    regexp - a regular expression that is used to validate inputs. In the case of columns that accept values from a predefined list, the acceptable values should be separated by a vertical bar, '|'.

    .table

    table - the name of the table, according to ExSite.
    alias - the name of the table, according to the underlying database engine.
    key - the primary key column of the table.
    title - a plain-English label for the table. This is typically given in singular form (eg. "Library").
    plural - the pluralized form on the title (eg. "Libraries")
    report - the default format to generate multi-record reports in. One of table, full, dump, index, list_numbered, list_bulleted, address, or description.
    read - the minimum user access level to view records from this table.
    insert - the minimum user access level to insert records into this table.
    update - the minimum user access level to modify records in this table.
    delete - the minimum user access level to delete records from this table.
    search - the minimum user access level to search for records from this table.

    <table>

    column - the text ID of this column, unique within this table.
    datatype - the text name of the datatype of this column.
    size - an integer giving the maximum size of the data expected for this column. This is used to (1) determine the size of the HTML input column, and (2) to generate SQL commands to generate the 'CREATE TABLE' commands to set up the tables and columns in a MySQL (or similar) database.
    label - the text string used as input prompts in forms and column headings in reports.
    display - the logic used for deciding when to display this column in reports. One of: key (this column is the human-readable "key" for this table, and should always be displayed), brief (display this column in summary listings of multiple records), full (display only in views of individual records), or none (do not display in user reports).
    read - the user access level needed to view data in this column.
    write - the user access level needed to enter data into this column.
    validate - the logic used for validating data entered into this column. Accepted values are one of: hard (data must be present, according to underlying database engine - ExSite cannot override), soft (data must be present and validated according to ExSite, but not underlying database engine - ExSite can override this), yes (data does not have to be present, but should be validated if it is), and no (data does not have to be present, and need not be validated).
    default - a default data value for this column.

    Datatypes

    Every column in the database has an ExSite datatype, which is used for:

    The recognized datatypes are summarized below.

    Numeric Datatypes:

    int - an integer value. The size value is ignored for ints.

    decimal - a floating point value. The size is two comma-separated values giving the total number of digits, and the number of decimal places, eg. 8,2.

    money - a decimal value with 2 decimal places, for dollars/cents or equivalent.

    Text Datatypes:

    string - a brief, single-line string of text. The size is used for setting the size of the input field, and for configuring the size of the varchar() type used to store this data in underlying SQL DBs.

    text - an extended or multi-line block of text. The size is used for setting the size of the textarea used to enter data, but does not affect the amount of storage used to hold that data. In MySQL, these data are stored in a text type (64kb).

    bigtext - like text, but uses a larger storage type in MySQL (mediumtext).

    html - like text, but specifically for HTML. Will invoke the ExSite HTML editor in forms.

    Key Datatypes:

    key# - a serial number primary key (recommended).

    key - some other type of primary key

    key:table - a foreign key reference, pointing to a record in table, which is considered to be the parent of this record. Ie. this is a strong data relation, which should be used in data navigation and determining families of records.

    key=table - a foreign key reference, pointing to a record in table, which is not considered to be the parent of this record. Ie. this is an incidental data relation, which should not be used for data navigation and determining families of records.

    Time Datatypes

    time - a time of day.

    date - a date.

    datetime - a date and time combined.

    timestamp - a date-time value that is represented in an easily comparable or sortable format. For MySQL databases up to v4.0, the format is YYYYMMDDHHMMSS, and from 4.1 on it is YYYY-MM-DD HH:MM::SS. You can flag which of these you are using with the config setting timestamp_format ("num" or "text"). Also note that in MySQL databases, the first timestamp column in a table is automatically updated with the current date/time values when the column is updated. Other fields can be updated by setting them to NULL.

    File Datatypes

    file - an uploaded file. Internally, the file is mime-encoded and stored in a text column.

    bigfile - an uploaded file. Internally, the file is mime-encoded and stored in a mediumtext column.

    filepath - a path to a file on the local server.

    List and Set Datatypes

    List datatypes are named list:name, and consist of a sequence of values, any one of which may be selected. The set of acceptable values is taken from the validation regular expression (which contains all accepted values, delimited by the '|' character).

    Example: the datatype list:yn has a validate regular expression of Y|N, meaning that it will accept one of two values, 'Y' or 'N'.

    Set datatypes are named set:name, and consist of a sequence of values, any number of which may be selected. The set of acceptable values is taken from the validation regular expression (which contains all accepted values, delimited by the '|' character). The selected values are internally stored delimited by semicolon-space ('; ').

    Other Datatypes

    email - an email address. Otherwise, same as string.

    url - a URL. If no protocol is given, it is assumed to be http:. Otherwise, same as string.

    password - uses HTML password input to conceal data that is typed into it. Otherwise, same as string.

    crypt - same as text, except that data is securely encrypted before being stored. Decryption may only be performed by the owner of the record, or the system administrator. Encryption is performed by the ExSite::Crypt package, which uses the 128-bit TEA encryption algorithm.

    fixed - a value that cannot change (the default value for the column is used). Otherwise, same as string. This datatype has some application in virtual tables.

    Extending/Modifying the Database

    Adding New Tables

    It is safe to add new tables to the core database, and it is expected that this will be necessary as web applications with their own database requirements are installed. For the new tables to be manageable using the existing database administration tools, the DB Map will have to be updated to recognize the new tables. Ie. you must add a record to dbmap_table (internal maps) or .table (external maps) for each new table; add records to dbmap_column (internal maps) or <table> (external maps) for each column in these tables. It may also be necessary to add new datatypes (see below).

    Modifying Existing Tables

    It is safe to add new columns to existing core tables. (It is common to do this for the member table, to add user-specific information that is not included in the default ExSite database.)

    Generally it is not safe to remove columns from the core tables, unless you really know what you are doing.

    New Datatypes

    It is safe to add new datatypes. New list:, set:, and key: datatypes should automatically be understood by ExSite if they have been properly defined.

    Other new datatypes may require patching ExSite to understand the display, input, and validation rules for these datatypes. The internal ExSite routines that perform these steps are:

    ExSite::Report::show_data_noauth() - special rules for displaying datatypes. If no special rule is in place for displaying a datatype, the raw data is output.

    ExSite::Form::input_exsite() - special rules for generating HTML inputs for datatypes. If no special rule in place for a datatype, a simple text input is used.

    ExSite::Form::validate() - code for validating input data; in most cases, the validation regular expression is sufficient for validation and no specially-coded rules will need to be invoked.

    Instead of modifying the above kernel routines in order to install your datatype handling code (which would fork the ExSite distro), you can install some handlers in myConfig.pm to keep your custom code separated from the kernel. More information on handlers and kernel programming is available in the ExSite Kernel Developer's Guide.

    Modifying DB Map Logic

    The system administrator is free to reconfigure the preconfigured DB Map settings to vary the business logic of the database management tools. It is not unusual to vary these settings to tweak the rules for particular installations of ExSite.

    However, whether or not these changes are safe depends on the changes being made. For example, changing the access permission levels may inadvertently create security or data integrity holes in the system, or prevent important data from being updated.

    The ExSite Database API

    The ExSite database API consists of the following high-level methods, which are inherited by all objects of the ExSite::DB, ExSite::Report, and ExSite::Form classes. The parameters accepted by these calls are:

    $table
    the text name of a table
    $parent
    the text name of a parent table
    $id
    the primary key of a record (typically, a numeric ID)
    $datahash
    a full or partial representation of a record, consisting of a reference to a hash of column names => column values
    $matchhash
    a reference to a hash of column names => column values that is used as a mask for selecting matching records. When using MySQL, wildcards (eg. '%') may be used in the values.
    $sort
    a column name (or reference to a list of column names) used to sort the returned records.

    Methods that return a single record return a datahash, or reference to a datahash, depending on the context. Methods that return a set of records return an array of datahashes or a reference to an array of datahashes, depending on the context.

    get_tables() - returns an array of table names known to the system

    get_columns($table) - returns an array of column names in the named table

    get_ncol($table) - returns a count of the number of columns in the named table

    count($table) - returns a count of the records in the named table

    fetch_all($table) - return all records in the named table. Be careful about using this if you don't know how many records you are likely to encounter, as the number could be large!

    fetch($table,$id) - returns a particular record in the the named table.

    fetch_child($table,$parent,$id,$sort) - returns a list of records in $table that are children of the given record id in the table $parent. The sort parameter is optional.

    fetch_match($table,$matchhash,$sort) - returns a list of records in the named table that match the match hash. The sort parameter is optional.

    insert($table,$datahash) - adds a new record to the named table. (Implemented using the SQL INSERT statement if no record ID is included in the datahash, REPLACE otherwise.)

    update($table,$datahash,$matchhash) - modifies records in the named table. If a matchhash is given, all matching records are updated with the data in the datahash. Otherwise, only the record whose primary key matches the primary key value in the datahash is updated. (If no primary key is included, the update will fail; ie. unconditional updates are forbidden.)

    trash_key($table,$id) - moves the given record to the trash. delete_key($table,$id) - permanently deletes the given record.

    trash_r($table,$id) - moves the given record, its children, and all decendant records recursively to the trash as a single trash item. delete_r($table,$id) - permanently deletes the given record, its children, and all decendant records recursively.

    undelete($id) - restores the given trash record, if possible. undelete($table,$id) - restores the given record from the named table (and its decendants), if possible.

    In addition, there are more advanced methods also available to these objects and to objects of lower levels (such as ExSite::SQL objects). These are not documented in full here, but perlpod documentation exists in the appropriate kernel files:

    fetch_m2m() - return records related to a certain record through a many-to-many join table.

    dbselect() - flexible 3GL interface to SQL select statement, allowing for joins, selecting particular columns, arbitrary where conditions, sort clauses, and limits.

    dbdelete() - delete records based on arbitrary conditions. No equivalent trash functions exist.

    custom_query() - execute arbitrary SQL queries. While most other calls are portable to other database engines, this call is obviously limited to compatible database engines. WARNING: while ExSite performs its own security checks on all other queries, it does not validate queries via this method; the programmer is reponsible for doing his/her own taint checking and validation.

    Virtual Tables

    ExSite abstracts the idea of a table somewhat, in order to apply its own datatype and access control rules onto the data. This abstraction also makes possible the idea of virtual tables. In the trivial case, a virtual table is just an alias for another table in the database -- in other words, the same table can be addressed by an alternate name.

    This has practical application when you realize that the aliased table name can have its own separate entry in the DB Map, and define its own set of columns that differ from the original table (although they all map onto the same low-level table in the database). Thus you can have one storage table that serves multiple logical purposes. For example, consider the virtual tables student and teacher, which are cases of the real table "person":

    TeacherStudentreal table ("Person")
    firstnamefirstnamefirstname
    lastnamelastnamelastname
    salarysalary
    gradegrade
    anniversarybirthdatedate

    These tables share a number of common columns (firstname, lastname), a couple of columns that are unique to each table (salary, grade), and one column that is logically different in each table, but with shared storage (due to a common underlying data type).

    Since virtual tables share columns with the real tables they correspond to, any changes to the datatypes or other configurations must remain compatible with the underlying representation in the real table. For example, you can't change an int column in the real table to a text column in one of its virtual tables; the text data would not convert to an int and would be lost. However, you could change a text column in the real table to an HTML column in the virtual table, since the HTML data would store just fine in a regular text field.

    It is often useful to have a flag or setting in such tables to distinguish between the different types of data that are stored in that table. For example, records in the page table have a type column, which is set to page, template, or library. In cases where you are dealing with a record in the virtual library table, you would like this type to be automatically set to "library". This can be accomplished using the fixed datatype, which implicitly adds this fixed field into basic queries on virtual tables. For instance, in the real table (page), the type column is defined as having dataype list:page_type, which allows the administrator to select between page, library, and template. In the library virtual table, the type column is defined as having datatype fixed, with a default value of "library".

    Security

    ExSite has some security measures in place to control read and write access to tables, records, and columns.

    Read and write access to tables is controlled by access level settings in the DB Map for each table. Only users of the given access level or higher are granted permission to generate reports from the table or generate forms that write to the table.

    Read and write access to columns is controlled by access level settings in the DB Map for each column. Only users of the given access level are given permission to view the contents of specific columns, or enter data into those columns.

    Access to records is controlled by record ownership. A record is owned by a regular user if the record is a child, grandchild, great-grandchild, etc. of the user's member record. (Except for system administrators, who own all records.) A record is owned by a site administrator if the record is similarly related to either the administrator's user record, or one of his site (ie. section) records. System administrators have ownership access to all records in the database.

    These access controls are not absolute. They function primarily as a set of default rules that are used by automatic report and form generators that interact directly with the database. More specifically, forms and reports generated by high-level methods in the packages ExSite::Form and ExSite::Report generally will obey these security rules. However, specialized code that accesses the ExSite database API (ie. ExSite::DB methods) directly will bypass these security rules. Such code is responsible for its own security.

    Foreign Keys

    In terms of low-level data storage, ExSite foreign keys are just integer IDs. When a form needs to prompt for a foreign key, ExSite can automatically build a selector list of available records to link to. By default, all records in the parent table are shown in this list. Very often, however, this is not the desired logic. For one thing, there may be thousands of records in the parent table, and you don't want to load all of them up just to build a key selector. Also, most of the potential parents may be invalid due to record ownership or other business logic issues.

    The kernel handler called select_foreign_key is helpful in this regard. It allows you to inject logic for selecting which foreign keys to select from, into the routine that builds the selection list, without having to modify the kernel packages. Examples of implementing this handler can be seen in ExSite::Config::CMS_select_foreign_key()

    Note that there is a system-wide security setting called foreign_link_level, which determines what level of user is allowed to link to records they don't own. This security check is performed at the time of form validation, not when the foreign key selector list is built.

    Using Multiple Databases

    Most ExSite CGI programs connect to the core database by default. If there are multiple databases that need to be connected to (for instance if a web application utilizes a different database than the core database), then that's a simple matter of creating additional database objects/handles and using those to interact with the secondaries.

    If you define your connection parameters in %config something like this:

    server.db2.name = dbname
    server.db2.type = sql
    server.db2.user = dbuser
    server.db2.pass = dbpassword
    server.db2.server = localhost
    server.db2.map = map2
    server.map2.name = dbmap2
    server.map2.type = text
    

    then you can create your 2nd database handle like this:

    my $db2 = new ExSite::Form(%{$config{server}{db2}},save=>0);
    

    This passes the db2 connection parameters to the Form class (which ultimately inherits from DB and SQL classes), and does not save the handle in %share{DB}. (The last point is important because %share{DB} is presumed to hold the handle for the core database.)

    Alternative Database Engines

    ExSite uses MySQL for its underlying database engine by default, but there is no hard requirement for MySQL. The ExSite::DB package provides an abstraction layer for interfacing to any number of different database driver layers, including non-SQL database engines that are capable of supporting the API described above. (In fact, ExSite is distributed with an alternative plain text engine that is used for reading external DB maps.)

    Adapting ExSite to another database engine is a matter of writing another database driver package that supports the API calls described above. In the case of an alternate SQL package, this is pretty straight-forward, since the MySQL driver is built on the Perl DBI package, and is easily adapted to any DBI-supported database engine.

    In the case of a data storage technology that is not supported by DBI, the developer should consult the ExSite::SQL and ExSite::Text database drivers for examples on building an database driver that is compatible with the ExSite database API. The ExSite::DB::new() method will also have to be extended to recognize the new database driver.

    If the alternate database is required by a plug-in web application, there there is no requirement that it be ExSite-compatible at all. If the web application handles all of its interactions with its private database on its own, then basically it can do whatever it wants. Of course, it will not benefit from the other ExSite database features such as online database management tools, trashcan and data restore, data ownership and access controls, etc. To get those features, the application's database must be ExSite-compatible. However, it does not have to be integrated right into the core ExSite database, since ExSite can interact with multiple databases independently.