ML - markup language generator

Document Handling

Doc($text)
Append($text)
Prepend($text)
Clear()
Doctype($text)

Basic Element Creation

Element($tag,$data,$attributes)
Comment($text)

HTML tag shortcuts

Cumulative Document Composition

Prepend to the document
Append to the document
Wrap the document in another element

Compound Element Creation

Content($data)
Compound Element Composition Rules

XML and XHTML

XML documents

ML - markup language generator

This is a toolkit of convenience methods for generating tags and elements in an SGML-like markup language. It has many shortcuts for generating HTML markup, but can also be used for XHTML, and even XML.

Most commonly, it will be used to assemble well-formed HTML elements and tags. This gives less risk of broken HTML, better portability between HTML and XHTML, and the ability to build complex and aggregate structures in a single convenient call.

It can be used to generate snippets of markup, or to assemble numerous snippets into partial or complete documents.

The settings in $config{markup} define basic syntax rules:

xml

Use xml syntax for self-closing tags, ie. <tag />, and force tags to lower case.

minattr

Minimize unset attributes, eg. selected instead of selected="selected"

safe_content

Assume content is HTML-safe. If not true, then content will be HTML-escaped before insertion into the document.

safe_attributes

Assume attributes are HTML-safe. If not true, then attributes will be HTML-escaped before insertion into the tags.

nl

Append newline characters to the end of each element, for formatting purposes.

Document Handling

`new(%opt)`

Creates an ML object to work with. %opt contains settings to override the default config settings, noted above.

You also pass a doc option, which initializes the ML object with a preformatted (already marked-up) document.

Example: create a markup language object with XML syntax rules:

    my $ml = new ExSite::ML(xml=&gt;1);

`Write()`

Returns the current document as a string.

`Print(), PrintWithHeader()`

Prints the current document to stdout. The second form includes a content-type header.

`Doc($text)`

Sets the current document to $text.

`Append($text)`

Appends $text to the end of the current document.

`Prepend($text)`

Prepends $text to the beginning of the current document.

`Clear()`

Blanks or resets the current document.

`Doctype($text)`

Sets a document preamble, which will be prepended to the whole document before Writeing or Printing. This is typically used for doctype declarations such as:

&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "href="http://www.w3.org/TR/html4/loose.dtd"&gt;

&lt;?xml version="1.0"?&gt;

The value of $text should contain the entire preamble string.

Note that the doctype is not considered part of the document contents, so it will always appear at the top, no matter how or when you use Append(), Prepend(), or Wrap().

Basic Element Creation

A markup element is a tag with a set of attributes and some contents. The tag is essential, and the contents and attributes are optional.

`Element($tag,$data,$attributes)`

Generates a markup entity. We do not validate the element against any DTD or other standard. We simply generate a text string with an SGML-like or XML-like structure, eg.

    $ml->Element("tag");                # outputs &lt;tag&gt; in non-xml mode
                                        # outputs &lt;tag /&gt; in xml mode
                                        # outputs &lt;tag&gt;&lt;/tag&gt; if element normally holds content

    $ml->Element("tag",
                 "contents");           # outputs &lt;tag&gt;contents&lt;/tag&gt;

    $ml->Element("tag",
                 "contents",
                 {attribute=>"value"}); # outputs &lt;tag attribute="value"&gt;contents&lt;/tag&gt;

The main purpose is to ensure consistent formatting, valid syntax, and easier switching between XML- and non-XML-based formats (specifically, HTML and XHTML).

$tag is a simple string, used as the tag name. This should be a word (no whitespace), but this is not validated.

$attributes is a hashref of key/value pairs; values will be quoted. Attributes with undefined values will be output as either

    name="name"

    name

depending on the minattr configuration option.

$data is the contents, which can be a scalar, array or hash. This will be interpreted by the Content() method, below.

The safe_content configuration setting indicates that string content can be safely inlined right into the element; if false, the content will be escaped (using HTML escape values) first. The safe_attributes configuration setting has the same effect for attribute values.

This method only generates regular elements, not other markup such as comments, document types, CDATA, etc.

`Comment($text)`

Generates an HTML-style comment tag, ie.

&lt;!-- $text --&gt;

Double-hyphens are removed from the comment text to prevent accidental premature closure of the comment.

HTML tag shortcuts

All HTML 4 strict elements have a shortcut method

    $ml->tag($content,$attributes);

where tag is an element name. This is equivalent to

    $ml->Element($tag,$content,$attributes);

$content and $attributes are optional. $content is run through Content() (above) to resolve data structures.

Tag shortcuts for the following elements are supported: a, abbr, acronym, address, applet, area, b, base, big, blockquote, body, br, button, caption, cite, code, col, colgroup, dd, del, dfn, div, dl, dt, em, fieldset, form, frame, frameset, h1, h2, h3, h4, h5, h6, head, hr, html, i, iframe, img, input, ins, kbd, label, legend, li, link, map, meta, noscript, object, ol, optgroup, option, p, param, pre, q, samp, script, select, small, span, strong, style, sub, sup, table, tbody, td, textarea, tfoot, th, thead, title, tr, tt, ul, var.

Note that if you provide content and attributes, the element will be built accordingly, even if HTML does not support attributes or content for that tag. In other words, we compose a syntactically complete element, not a semantically correct one.

Example:

my $link = $ml->a( "Google", { href => "http://google.com" } );

Cumulative Document Composition

Elements can be cumulatively aggregated in the ML object. The "current document" is just the current blob of marked-up text that has been accumulated. Text can be accumulated from the top-down, bottom-up, or in layers like an onion.

Prepend to the document

To add marked-up text to the beginning of the current document:

    $ml->Prepend($text);

Note that in this and the Append() method below, the text is not validated, which means you can break your syntax if you stuff your own tags into it carelessly.

Append to the document

To add marked-up text to the end of the current document:

    $ml->Append($text);

Or, you can use the auto-append methods. The methods Element(), Comment(), and the HTML tag shortcuts above all have an auto-append version which automatically appends their output to the current document. The auto-append method begins with an underscore but the rest of the method is the same.

# compose a link and return it to the caller
$ml->a( "Google", { href => "http://google.com" } );

# compose a link and append it to the current document
$ml->_a( "Google", { href => "http://google.com" } );

Wrap the document in another element

To wrap the current document in a markup element (ie. create a markup element with the current document as its content):

    $ml->Wrap( $tag, $attributes );

As a convenience you can use the auto-wrap methods. The methods Element(), Comment(), and the HTML tag shortcuts above all have an auto-wrap version which automatically uses the current document as the element contents. The auto-wrap method begins with a double-underscore but the rest of the method name is the same.

    # enclose current document in a body (with optional attributes)
    $ml->__body( $attributes );

    # prepend a head section (containing a title element)
    $ml->Prepend( $ml->head( $ml->title("Document Title") ) );

    # wrap the whole shebang in an html tag
    $ml->__html();

Note that it is easy to create bizarre HTML constructions. The caller is responsible for nesting their elements appropriately. For instance, the following will be processed without complaint, despite not being a legal HTML construction:

    $ml->_p("A paragraph.");  # add a paragraph to the document
    $ml->__style();           # wrap document in style tags (!?)

Compound Element Creation

`Content($data)`

Given a data structure of nested elements, we try to transform it into markup text. The elements of our data structure may refer to text, element parameters, or more data structures that have to be resolved recursively. We do not necessarily know the tag in all of these cases, but we can often infer the tag based on the element we are nesting under (eg. if we are in a <ol>, then a nested element is likely to be a <li>).

If $data is a scalar, it is taken to be explicit text or mark-up.

If $data is an arrayref, it is taken to be an Element description ([tag,content,attributes], or [content,attributes]), a list of explicit markup text, or a list of more data structures.

If $data is a hashref, it is taken to be a set of tag => content pairs.

In cases where we are not given the tag explicitly, we can often determine it from context. (Eg. if we are in a <ol>, then a nested element is likely to be a <li>.) To get a context, we need to have been called recursively from a parent element, and that parent element must define a default child tag (see the %default_child variable).

To get a sense for how this works, you can examine the HTML shortcut calls in the following examples. The shortcut call defines the top-level element, which gives a context for determining how the content data structure should be converted into markup.

List Examples: These calls will all generate lists, using various structures to represent the list items.

    # list items are explicit contents
    $ml->ul( [
              "list item 1",
              "list item 2",
              "list item 3",
             ] );

    # list items are element descriptors (tag, content)
    $ml->ol( [
              [ "li", "list item 1" ], 
              [ "li", "list item 2" ], 
              [ "li", "list item 3" ], 
             ], 
             { type=>"i" } );

    # list items are hashes of tag=>content
    $ml->dl( [ 
               { dt => $title1, dd => $description1 },
               { dt => $title2, dd => $description2 },
               { dt => $title3, dd => $description3 },
             ] );

Table Examples: These calls will all generate a 2-column table with numeric data in the cells. Some have header and footer rows, others do not.

    # simple table, no headers or footers
    $ml->table( [ 
                  [ 123, 456 ], 
                  [ 789, 123 ], 
                  [ 456, 789 ], 
                ],
                {class=>"Report"},
              );

    # table with head, body, foot, and caption
    $ml->table( { caption => "Sample Table",
                  thead => [ 
                             [ "head1", "head2" ] 
                           ],
                  tbody => [ 
                             [ 123, 456 ], 
                             [ 789, 123 ], 
                             [ 456, 789 ], 
                           ],
                  tfoot => [ 
                             [ 1368, 1368 ] 
                           ],
                },
                {class=>"Report"},
              );

Compound Element Composition Rules

When creating markup, there are a few parameters that we use for defining some basic nesting and formatting rules.

alltags

This references a list of all standard tags. This is not used to validate tags, so you can create tags not in this list. However, it is used to help identify items that look like tag names in data structures.

emptytags

This references a list of tags that are not supposed to contain content. If these tags are created with undefined content, they will result in a single (self-closing) tag; otherwise, an open and close tag will be created.

no_nl

We normally terminate all closing tags with a newline character for tidier formatting. In inline elements, newlines are treated as whitespace, and can cause minor formatting defects in some cases. Tags in this list will not receive any terminating newline.

default_child

This is a hashref of tag => child-tag, which helps us guess what element nests underneath a parent tag, if it has not been explicitly defined in a data structure.

default_order

This is a hashref of tag => list of tags, which helps us figure out which order to output tags when they have been provided to us in an unordered hash.

The ML class includes default rules for all of the above, which are sufficient for HTML 4 or XHTML composition. If building a markup document of a different type and you want to make use of the data structure feature to build complex markup in one call, then you will need to provide a set of rules to replace the default HTML rules. You can set these rules by providing alternate definitions for the above parameters, like this:

    $ml->set("emptytags",["foo", "bar"]);

XML and XHTML

To make your output XHTML-compatible, set the xml option when creating your ML document, or set markup.xml=1 in your configuration file to make this the default. This forces tags to be lower case, and changes the format of self-closing tags. For example, the call

    $ml->Element("BR");

will produce <BR> if xml is off, and <br /> if xml is on. Note that

    $ml->br();

will produce a lower-case br in all cases.

This effectively changes the syntax to xml, but it still does not validate against a DTD. It also does not manage the syntax of explicitly-coded markup that may have been passed in as content. It only affects the syntax of elements it itself has generated.

For instance, the following will generate correct output all of the time:

    $ml->br();

However, if the safe_content flag is on, then the following will not produce correct XML, since the content contains explicitly-coded markup that is not XML-compatible:

    $ml->p("Linebreak&lt;br&gt;");

(If safe_content is off, then the br tag will be escaped and will be presented as regular content, which keeps it XML-compatible, but may not be what the author intended.)

If you can avoid the latter situation, then it is possible to switch quickly from HTML to XHTML with a single configuration setting.

XML documents

This class has a lot of convenience functions for HTML markup, but it actually doesn't care about the tags you use. That means you can use it to generate XML documents that have no relation to HTML. For example, here is a recipe to generate an XML RSS file:


# make an RSS feed
    my $rss = new ExSite::ML(xml=>1);
    $rss->Doctype('&lt;?xml version="1.0"?&gt;');
# note auto-append calls
    $rss->_Element("title","My Feed");
    $rss->_Element("description","About My Feed");
    $rss->_Element("link","http://myurl.com");
# make an item - do not use auto-append methods
    $item = $rss->Element("title","1st Item");
    $item .= $rss->Element("description","1st description");
    $item .= $rss->Element("link","http://link.com");
    # now append this item to the document
    $rss->_Element("item",$item);
    # repeat for as many items as necessary
# wrap the document up - note wrap calls
    $rss->__Element("channel");
    $rss->__Element("rss",{ version =&gt; "2.0" });
$rss->Print;

ML.pm