Beaten to death by lightweight markup
Over the past few years lightweight markup languages like Markdown and Textile have started to appear everywhere, often replacing some archaic terror from the blue, like BBCode. I have, for the most, seen this happening with a positive light, but recently I’ve started to question the sense of using these languages:
- They are often ambiguous,
- and limited compared to HTML
In this article I’ll be picking on Textile, not because I think it’s worse than Markdown, but because it is in my opinion better. I’m also only going to give one example of each, not because I don’t have more, but because it wouldn’t add or detract from the point I’m trying to make.
Ambiguous?
Perhaps the best example of how Textile can be ambiguous is when trying to add emphasis to a link, here’s the correct way to do it:
*"Example":http://example.com/* OR: "*Example*":http://example.com/
And here’s what I often find myself fixing:
*"Example"*:http://example.com/
Who’s going to say they where wrong? It seems perfectly reasonable that this should work, except that it doesn’t.
Limited?
Indeed, while a lot of lightweight markup languages make a lot of sense for basic things, paragraphs, images, links and adding emphasis, you could even create a table:
|a|table|row| |a|table|row|
That’s pretty simple, and easy, but what if you need a table header? Well, you can’t.
Solution?
There hasn’t really been an ideal solution to the problem of providing power and ease of use at the same time. You could keep the lightweight markup features and add to it support for HTML, but then you’re faced with another problem — you need to pay greater attention to sanitising user input, or you’ll end up with people killing your layout with a well placed </div>, which is part of the reason why lightweight markup languages exist in the first place.
In the process of building this site I created a HTML Formatter extension for Symphony that does just that — It keeps the barest essentials that you get with a lightweight markup language, but allows you to use whatever HTML you need. You can give it basically anything for input, and it’ll give you back valid XML as output.
Essentially, it’s a two step process:
- Run HTML Tidy over the user input,
- Apply those lightweight markup features
Of course, this is grossly simplified, since we’re being careful to generate valid XML output, the entire thing must be done with DOM manipulation in PHP.
Anyhow, I’d really like to hear other peoples thoughts on this matter, is this a good solution? Too hard, or not enough?


Thoughts and feedback
Always hard to decide the best way to deal with user input, beyond creating a “perfect” WSYWIG editor this seems like a good solution.
Perhaps the problem of ambiguity can be mediated by allow people to easily preview the output of their input. JavaScript live preview perhaps?
Max, the editor used on the Symphony site really does help, but only for the standard formats, if you want to do something special you really need to use HTML. If you add a live–preview to that, it’d probably work pretty well, and it would also help encourage people to type valid HTML, because the preview wouldn’t work if it was very broken.
I also meant to say that I don’t like the idea of mixing Markdown or Textile with HTML, because to do it correctly you have to do a lot of crazy processing, consider the following user input:
When you have something written like that, you can’t just use a simple regular expression to add emphasis between the two asterisks. You don’t want to stop right in the middle of the
@hrefbecause that'd cause invalid HTML to be created. So instead you have to treat the document as an XML tree from the very beginning, which makes tracking the asterisks a whole lot trickier.It wouldn’t be impossible, but it’d probably make the terrible Textile code look like a wonderful oasis.
Sorry about offtopic, but Symphony CMS is very impressive! I’m going to grab my own copy :)