This is a toolkit of convenience methods for generating tags and elements in an SGML-like markup language. It has many shortcuts for generating HTML markup, but can also be used for XHTML, and even XML.
Most commonly, it will be used to assemble well-formed HTML elements and tags. This gives less risk of broken HTML, better portability between HTML and XHTML, and the ability to build complex and aggregate structures in a single convenient call.
It can be used to generate snippets of markup, or to assemble numerous snippets into partial or complete documents.
The settings in $config{markup}
define basic syntax rules:
selected
instead of selected="selected"
new(%opt)
Creates an ML object to work with. %opt
contains settings to
override the default config settings, noted above.
You also pass a doc option, which initializes the ML object with a preformatted (already marked-up) document.
Example: create a markup language object with XML syntax rules:
my $ml = new ExSite::ML(xml=>1);
Write()
Returns the current document as a string.
Print(), PrintWithHeader()
Prints the current document to stdout
. The second form includes a
content-type header.
Doc($text)
Sets the current document to $text
.
Append($text)
Appends $text
to the end of the current document.
Prepend($text)
Prepends $text
to the beginning of the current document.
Clear()
Blanks or resets the current document.
Doctype($text)
Sets a document preamble, which will be prepended to the whole
document before Write
ing or Print
ing. This is typically used
for doctype declarations such as:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "href="http://www.w3.org/TR/html4/loose.dtd">
or
<?xml version="1.0"?>
The value of $text
should contain the entire preamble string.
Note that the doctype is not considered part of the document contents, so it will always appear at the top, no matter how or when you use Append(), Prepend(), or Wrap().
A markup element is a tag with a set of attributes and some contents. The tag is essential, and the contents and attributes are optional.
Element($tag,$data,$attributes)
Generates a markup entity. We do not validate the element against any DTD or other standard. We simply generate a text string with an SGML-like or XML-like structure, eg.
$ml->Element("tag"); # outputs <tag> in non-xml mode # outputs <tag /> in xml mode # outputs <tag></tag> if element normally holds content
$ml->Element("tag", "contents"); # outputs <tag>contents</tag>
$ml->Element("tag", "contents", {attribute=>"value"}); # outputs <tag attribute="value">contents</tag>
The main purpose is to ensure consistent formatting, valid syntax, and easier switching between XML- and non-XML-based formats (specifically, HTML and XHTML).
$tag
is a simple string, used as the tag name. This should be a
word (no whitespace), but this is not validated.
$attributes
is a hashref of key/value pairs; values will be quoted.
Attributes with undefined values will be output as either
name="name"
or
name
depending on the minattr
configuration option.
$data
is the contents, which can be a scalar, array or hash. This
will be interpreted by the Content()
method, below.
The safe_content configuration setting indicates that string content can be safely inlined right into the element; if false, the content will be escaped (using HTML escape values) first. The safe_attributes configuration setting has the same effect for attribute values.
This method only generates regular elements, not other markup such as comments, document types, CDATA, etc.
Comment($text)
Generates an HTML-style comment tag, ie.
<!-- $text -->
Double-hyphens are removed from the comment text to prevent accidental premature closure of the comment.
All HTML 4 strict elements have a shortcut method
$ml->tag($content,$attributes);
where tag
is an element name. This is equivalent to
$ml->Element($tag,$content,$attributes);
$content
and $attributes
are optional. $content
is run
through Content()
(above) to resolve data structures.
Tag shortcuts for the following elements are supported: a, abbr, acronym, address, applet, area, b, base, big, blockquote, body, br, button, caption, cite, code, col, colgroup, dd, del, dfn, div, dl, dt, em, fieldset, form, frame, frameset, h1, h2, h3, h4, h5, h6, head, hr, html, i, iframe, img, input, ins, kbd, label, legend, li, link, map, meta, noscript, object, ol, optgroup, option, p, param, pre, q, samp, script, select, small, span, strong, style, sub, sup, table, tbody, td, textarea, tfoot, th, thead, title, tr, tt, ul, var.
Note that if you provide content and attributes, the element will be built accordingly, even if HTML does not support attributes or content for that tag. In other words, we compose a syntactically complete element, not a semantically correct one.
Example:
my $link = $ml->a( "Google", { href => "http://google.com" } );
Elements can be cumulatively aggregated in the ML object. The "current document" is just the current blob of marked-up text that has been accumulated. Text can be accumulated from the top-down, bottom-up, or in layers like an onion.
To add marked-up text to the beginning of the current document:
$ml->Prepend($text);
Note that in this and the Append()
method below, the text is not
validated, which means you can break your syntax if you stuff your own
tags into it carelessly.
To add marked-up text to the end of the current document:
$ml->Append($text);
Or, you can use the auto-append methods. The methods Element()
,
Comment()
, and the HTML tag shortcuts above all have an auto-append
version which automatically appends their output to the current
document. The auto-append method begins with an underscore but the
rest of the method is the same.
# compose a link and return it to the caller $ml->a( "Google", { href => "http://google.com" } );
# compose a link and append it to the current document $ml->_a( "Google", { href => "http://google.com" } );
To wrap the current document in a markup element (ie. create a markup element with the current document as its content):
$ml->Wrap( $tag, $attributes );
As a convenience you can use the auto-wrap methods. The methods
Element()
, Comment()
, and the HTML tag shortcuts above all have
an auto-wrap version which automatically uses the current document
as the element contents. The auto-wrap method begins with a
double-underscore but the rest of the method name is the same.
# enclose current document in a body (with optional attributes) $ml->__body( $attributes );
# prepend a head section (containing a title element) $ml->Prepend( $ml->head( $ml->title("Document Title") ) );
# wrap the whole shebang in an html tag $ml->__html();
Note that it is easy to create bizarre HTML constructions. The caller is responsible for nesting their elements appropriately. For instance, the following will be processed without complaint, despite not being a legal HTML construction:
$ml->_p("A paragraph."); # add a paragraph to the document $ml->__style(); # wrap document in style tags (!?)
Content($data)
Given a data structure of nested elements, we try to transform it into
markup text. The elements of our data structure may refer to text,
element parameters, or more data structures that have to be resolved
recursively. We do not necessarily know the tag in all of these
cases, but we can often infer the tag based on the element we are
nesting under (eg. if we are in a <ol>
, then a nested
element is likely to be a <li>
).
If $data
is a scalar, it is taken to be explicit text or mark-up.
If $data
is an arrayref, it is taken to be an Element description
([tag,content,attributes], or [content,attributes]), a list of
explicit markup text, or a list of more data structures.
If $data
is a hashref, it is taken to be a set of tag => content pairs.
In cases where we are not given the tag explicitly, we can often
determine it from context. (Eg. if we are in a <ol>
, then
a nested element is likely to be a <li>
.) To get a
context, we need to have been called recursively from a parent
element, and that parent element must define a default child tag (see
the %default_child
variable).
To get a sense for how this works, you can examine the HTML shortcut calls in the following examples. The shortcut call defines the top-level element, which gives a context for determining how the content data structure should be converted into markup.
List Examples: These calls will all generate lists, using various structures to represent the list items.
# list items are explicit contents $ml->ul( [ "list item 1", "list item 2", "list item 3", ] );
# list items are element descriptors (tag, content) $ml->ol( [ [ "li", "list item 1" ], [ "li", "list item 2" ], [ "li", "list item 3" ], ], { type=>"i" } );
# list items are hashes of tag=>content $ml->dl( [ { dt => $title1, dd => $description1 }, { dt => $title2, dd => $description2 }, { dt => $title3, dd => $description3 }, ] );
Table Examples: These calls will all generate a 2-column table with numeric data in the cells. Some have header and footer rows, others do not.
# simple table, no headers or footers $ml->table( [ [ 123, 456 ], [ 789, 123 ], [ 456, 789 ], ], {class=>"Report"}, );
# table with head, body, foot, and caption $ml->table( { caption => "Sample Table", thead => [ [ "head1", "head2" ] ], tbody => [ [ 123, 456 ], [ 789, 123 ], [ 456, 789 ], ], tfoot => [ [ 1368, 1368 ] ], }, {class=>"Report"}, );
When creating markup, there are a few parameters that we use for defining some basic nesting and formatting rules.
The ML class includes default rules for all of the above, which are sufficient for HTML 4 or XHTML composition. If building a markup document of a different type and you want to make use of the data structure feature to build complex markup in one call, then you will need to provide a set of rules to replace the default HTML rules. You can set these rules by providing alternate definitions for the above parameters, like this:
$ml->set("emptytags",["foo", "bar"]);
To make your output XHTML-compatible, set the xml
option when
creating your ML document, or set markup.xml=1
in your
configuration file to make this the default. This forces tags to be
lower case, and changes the format of self-closing tags. For example,
the call
$ml->Element("BR");
will produce <BR> if xml is off, and <br /> if xml is on. Note that
$ml->br();
will produce a lower-case br
in all cases.
This effectively changes the syntax to xml, but it still does not validate against a DTD. It also does not manage the syntax of explicitly-coded markup that may have been passed in as content. It only affects the syntax of elements it itself has generated.
For instance, the following will generate correct output all of the time:
$ml->br();
However, if the safe_content
flag is on, then the following will
not produce correct XML, since the content contains explicitly-coded
markup that is not XML-compatible:
$ml->p("Linebreak<br>");
(If safe_content
is off, then the br
tag will be escaped and
will be presented as regular content, which keeps it XML-compatible,
but may not be what the author intended.)
If you can avoid the latter situation, then it is possible to switch quickly from HTML to XHTML with a single configuration setting.
This class has a lot of convenience functions for HTML markup, but it actually doesn't care about the tags you use. That means you can use it to generate XML documents that have no relation to HTML. For example, here is a recipe to generate an XML RSS file:
# make an RSS feed my $rss = new ExSite::ML(xml=>1); $rss->Doctype('<?xml version="1.0"?>');
# note auto-append calls $rss->_Element("title","My Feed"); $rss->_Element("description","About My Feed"); $rss->_Element("link","http://myurl.com");
# make an item - do not use auto-append methods $item = $rss->Element("title","1st Item"); $item .= $rss->Element("description","1st description"); $item .= $rss->Element("link","http://link.com"); # now append this item to the document $rss->_Element("item",$item); # repeat for as many items as necessary
# wrap the document up - note wrap calls $rss->__Element("channel"); $rss->__Element("rss",{ version => "2.0" });
$rss->Print;