Howdy

My name is Rowan Lewis, Im a web developer and standards enthusiast currently living on the Gold Coast, Australia where I work for a brilliant design and development firm known as R&B Creative Communication.

· functional programming · text formatting

Text formatting,
an experiment with functional languages

Ever since I started writing Bitter, Ive been wondering how to apply functional programming concepts to creating a better text formatter.

Before continuing, make sure you have read my introduction to the Bitter syntax highlighter as it contains some important ideas on which I will be elaborating in this article.

For now Im going to call this idea for a language Terror, as its a text formatter and probably going to be full of errors.

What would it look like?

Obviously the syntax cannot be identical to Bitter, as all Bitter needs to do is wrap pieces of text with tags.  Here, we actually need a way to define HTML.  So Ive introduced three new functions:

element(name)
Define a new element
attribute(name, value)
Define a new attribute
text(value)
Define a new text node

Each name or value accepts a string, possibly containing references to matched data: attribute('class', '$1').


<?php
    
    Terror::rule(
        Terror::id('text'),
        
        Terror::id('text-headers')
    );
    
    Terror::rule(
        Terror::id('text-headers'),
        Terror::match('^(h[0-6])\.\s*[^\r\n]+'),
        Terror::element('$1'),
        
        Terror::rule(
            Terror::id('text-header-text'),
            Terror::match('[^\.]+\.\s*(.*)'),
            Terror::text('$1')
        )
    );
    
?>

So, as you can see, this example looks for any Textile style headers, and creates elements to represent them.

If you where to pass it this text:

h1. Heading one

h2. Heading two

Then youd end up with this output:

<h1>Heading one</h1>

<h2>Heading two</h2>

Pretty straight forward, huh?  All you do is tell it to match a particular syntax and what to output it as, everything else is automatic.

Its not all easy

Theres one obvious issue with this method: not enough logic.  The example above assumes you want your headers to be as they where written in the text, theres no shortcut way to offset the header by one: h1 + 1 = h2.

Instead youd have to write individual rules for each of the possible states.  Not an ideal solution in the least:


<?php
    
    Terror::rule(
        Terror::id('textile-headers'),
        Terror::match('^h[0-6]\.\s*[^\r\n]+'),
        
        Terror::rule(
            Terror::id('textile-header-one'),
            Terror::match('^h1\.\s*[^\r\n]+'),
            Terror::element('h2'),
            
            Terror::id('textile-header-text')
        )
    );
    
?>

It may be possible to instead use XPath to match captures:


<?php
    
    Terror::rule(
        Terror::id('text-headers'),
        Terror::match('^h([0-6])\.\s*[^\r\n]+'),
        Terror::element('h{capture-1 + 1}'),
        
        Terror::rule(
            Terror::id('text-header-text'),
            Terror::match('[^\.]+\.\s*(.*)'),
            Terror::text('{capture-1}')
        )
    );
    
?>

This poses its own problems however, since there would be a considerable amount of processing needed to convert an array of matches into XML which can then be used to execute XPath expressions against.

But then again, itd be better than introducing yet another nonstandard language.

Bite me

So, thats what Id like to create, when I have the time.  My ideas are not perfectly brilliant, nor really perfect or brilliant, so please, share your thoughts.

Have your say

Comments can be formatted using a limited set of HTML elements;
a, code, em and strong.

Nils Hörrmann said

Rowan, are you still working on this?  Is this something that could be turned into a Symphony master textformatter offering a new, customizable parser for Markdown or Textile?