Avoin notation

Avoin notation is a language for marking structural elements of hypertext.

Reference

Glossary

Note the choice of words in this section: the use of affirmative vs. modal verbs highlights Avoin notation's system of imposition.

Environment
A system that uses Avoin notation (for example a CMS).
Document
A full presentation of data composed of one or more files (for example a web page or a readme file).
File
A chunk of information marked with Avoin notation (for example a regular file or a database entry). A file can contain a whole document or just be a part of one, but must be sensible in itself, as Avoin notation itself does not recognise a document as an element (this creates some requirements for markup).
Line
A chunk of content (usually literally a line separated by line breaks).
Content
Raw data that can have parts marked with wrap elements.

Element types

Surrounding element
Directly contains line elements. It does not directly contain content or wrap elements: content is always inside a line element.
Line element
Directly contains content and can directly contain wrap elements.
Wrap element
Directly contains special kind of content.

Reserved strings

The following strings have special meaning in Avoin notation and should be escaped by repeating the first character of the string where appropriate.

Reserved strings reference
String Function Notes
{{ Section Only in the beginning of a line
}} Quote
=1, =2, =3, ..., =n Headings Only in the beginning of a line
¤ List item in an unordered list Only in the beginning of a line
# List item in an ordered list Only in the beginning of a line
% Definition Only in the beginning of a line
|| Table row/cell
\\ Link
** Emphasis
&& Optional parameters Only within an element definition

Elements

Avoin notation supports the following set of elements, sorted by highest level to lowest.

Element reference
Type Element Description Syntax Closed with Examples Notes
Surrounding Section A section of data. This is the highest level element in an Avoin document. Nested sections are not possible. {{id End of parent; Another section
  • {{Main column
  • {{Footnotes
In an environment where only one type of section is possible, this element should not be used. When sections are used, the ones that are allowed should be named by the environment. Sections in a file are unique.
Surrounding/Wrap Quote A part of data that is quoted. Quotation can have any fully closed element inside. }}&&source&&link&&destination}}content}} End of parent; Closing tag
  • =3}}&&Benjamin Franklin&&quotations.pdf&&document}}Lighthouses are more helpful than churches.
  • *}}Lighthouses are more helpful than churches.
End of line only closes quote as a wrap element.
Line Heading A heading of the specified depth. =1content; =2content; =3content; ...; =ncontent End of parent
  • =1This is a heading
  • =2This is a subheading
An environment should define the maximum amount of levels for headings and the role of each level. When the maximum is greater than nine, leading zeroes are used.
Line Paragraph Text paragraph. content End of parent
  • This is a paragrah of text.
Line Unordered list item List item in an unordered list. ¤content End of parent
  • *This is a list item
Line Ordered list item List item in an ordered list. #content End of parent
  • #This is the first list item
Line Definition Definition or description for the preceding item. %content End of parent
  • %This is a definition
Line/Wrap Table cell Table cell. ||content End of parent; Another cell
  • ||This is content within a cell||And this is another cell
A table cell closes a line element.
Wrap Link A link to another document, object or location. \\&&link&&destination\\content\\ End of parent; Closing tag
  • \\&&http://eiskis.net/\\eiskis.net\\
  • \\&&doc/2009documentation.odf\\Avoin notation documentation\\
  • ¤\\We supposedly know where this is going
Wrap Emphasis A section of content that is of special importance or noteworthy. **content** End of parent; Closing tag
  • In a paragraph, **this is a part of text** that we want to emphasize

Note that a list as a whole cannot be marked. It is a virtual element. Consecutive list items of the same type are considered to be a part of the same list. Tables behave in a similiar manner, first composing table rows which then compose full tables. Consecutive table cells are considered to be a part of the same table row. Consecutive rows with the same amount of cells are considered to be a part of the same table.

Possible values for optional parameters are not predefined.

Syntax

Opening tags without optional parameters are shortened. Defining parameters is optional, but to define a parameter that succeeds another, the preceeding ones must be defined as well (parameters cannot be skipped).

All elements must and will be closed (see above how to close different tags). An element with open children cannot be closed. No element can continue unclosed after an end of file. Closing tags of line and wrap elements at the end of a line or file can be skipped (although this naturally impedes human-readability) as the end of line will close the element anyway.

Each line in a file not beginning with an appropriate reserved string (as described above) is regarded as a pararaph of text. Double and trailing spaces (spaces around wrap elements are not considered surplus) and line breaks themselves are all ignored. Empty elements are ignored as well. As a result, any line with only a line break of some kind or spacing characters is regarded as an empty paragraph element and ignored.

Examples
Source Result (equivalent in HTML) Notes
}}&&&&eiskis.net}}eiskis.net}} &&&eiskis.neteiskis.net Source cannot be ommitted. This is legal but not the result the user intended.
#Quotation: }}Lighthouses are \\&&http://en.wikiquote.org/wiki/Benjamin_Franklin\\more **helpful\\ than churches
  1. Quotation: Lighthouses are more helpful than churches
This is legal and prints as intended.
||Source||Result (equivalent in HTML)||Notes
||eiskis.net||eiskis.neteiskis.net||Source cannot be ommitted.
||Sauce||**Sauce
Source Result (equivalent in HTML) Notes
eiskis.net eiskis.neteiskis.net Source cannot be ommitted.
Sauce Sauce
This is legal and prints as intended.

Links

Avoin notation in an environment

Nothing that is ignored in Avoin notation should ever be stored or converted from or to Avoin notation.

Note that Avoin notation does not give special treatment to e-mail addresses. They can be separated from regular text without special markup due to their distinctive structure (username@domainname.tld) when necessary, but this is not a job for Avoin notation, which only distinguishes between external links and different kinds of internal links. Avoin notation doesn't give special treatment to different kinds of external links or different kinds of contact information.

Relation to HTML

Even though I do want to emphasize the unrelatedness of Avoin notation to HTML and even web in general, I think it serves a purpose to take a brief look at how and where Avoin notation differs from HTML and why. In general it's safe to say that Avoin notation is stripped from all unnecessary elements (too much so, some might argue) and knows what it's supposed to do (i.e. mark structure of human-produced information in hypertext) unlike the poorly designed and obsolete HTML. Also note that I'm using the word "HTML" in a broad sense (including e.g. XHTML). Even though we can counter the age of HTML by using it carefully, it's still at the very least quite tiresome to use.

Avoin notation takes into regard how we structure—and have structured since kindergarten—text in our documents. In hypertext we have more elements we have to separate and mark, but the basics remain the same. Now, we need to pay close attention to the fact that we don't want too mark just anything: we have a certain, predefined set of elements (some are the same as in regular text documents, like the one you wrote on paper in school, and some are new).

The main difference, to me, is the fact that the generic Avoin notation cannot be used independently. It needs an environment to define the context it's used in. Also HTML (and especially the way it has been used by incompetent people) is to some extent concerned about how things appear. Avoin notation is not.

Commentary on Avoin notation and HTML
Element Avoin notation's view HTML's view
Paragraph Paragraphs are the most basic and most important structural elements of our files. Just like with the papers you wrote in school, you organise information and text into paragraphs. That's how Avoin notation acts. Content can only be stored inside a paragraph, list or a table. Loose and poorly designed HTML doesn't mind if you never use <p>. Many users of HTML have taken this as a sign that you can just throw text into a document without defining what kind of information it actually is and how it relates to other data in the document. To the user agent this data is just a indefinite chunk of information, and it can only guess what it's supposed to do with it. They use line breaks to make text more readable, but <br> does not carry semantic meaning. This is probably the most visible and, fundamental (after all the ideological babble), yet minor (HTML can be written in the same way as Avoin notation in this regard), difference between Avoin notation and HTML (Avoin notation does not know the meaning of a "line break"—actually the word "line" has a different meaning in Avoin notation's terminology).
Section Sections are probably the most "un-Avoin-like" elements in Avoin notation: They can be totally ommitted and the actual nature of their use is left to the environment to decide. However it is necessary to separate different types of sections in files, and predefining them would be impractical and restrict the use of Avoin notation to specific environments. Sections are probably where the designer of an environment will go wrong with Avoin notation if anywhere. It should be bore in mind that even though Avoin notation itself does not define which sections are allowed, it is an unommittable requirement for the environment. HTML does not have a single direct counterpart for Avoin notation's section. HTML uses <head> and <body> for the top level separation of document meta data and content. HTML also has separate elements for separating meta(ish?) data and content inside forms and tables. HTML has <frame> which goes beyond a single document but has other uses besides separating sections as well (linking several documents together, which Avoin notation cannot do—Avoin notation in general does not do). HTML has <div> which is very vague and carries verry little semantic relevance predefined. <div> is perhaps the closest equivalent of a section, but unlike in Avoin notation, it is hard to assert semantic value to it (HTML lacks an environment to define classes and, partly because of that, the definition of a <div> is very vaque). Note that in HTML there can be nested divisions, whereas sections in Avoin notation are restricted to operate in only one level and naturally take a certain kind of role in a file.
Quote A quote in an Avoin notation file can be used either as a wrap element or as a surrounding element. HTML separates between <blockquote>, <q> and <cite>.
Headings In Avoin notation headings partly take the role of HTML's separate tags intended to separate different kinds of data. Headings are used to separate and describe content within sections, and have a role in defining the succeeding data. They should be thought of more as containers and less as separators. In HTML headings tend to be used very loosely—and often ommitted—and as a result often feel like only containers for important text data.
List item Avoin notation separates regular lists from ordered lists. However Avoin notation doesn't mark the actual lists, only list items. HTML makes the same distinction between different types of lists (although <ul> and <ol> aren't the only lists in HTML). However the distinction is made by using a different container for <li>, which is only used with these two list elements.
Definition In Avoin notation a definition for the preceding element can be included with the definition element. More spesifically, definition applies to the last line element closed or surrounding element opened. Definitions are very effective and practical: they can be used to deliver meta data for the whole file, headings, list items, quotes, paragraphs and table rows (which is what a definition applies to when used between table row/cell elements—similiarly, when used after a table, definition applies to the whole table). It should be noted that even though definitions are very versatile, they cannot be used to assign semantic meanings to elements, only to include additional information and definitions. HTML has some elements for offering definitions for data. <dl> is an entirely separate type of list with spesific child elements, all unseen in Avoin notation (which would use a regular or an ordered list with definitions between list items), which is used for data that requires defining. Tables have <caption>. <label> and <legend> are used inside HTML forms. Avoin notation definitions also serve the same purpose as some of the elements in HTML's <head>.
Table row/cell Similiarly to lists, only the lowest level of table elements is marked. Consecutive table cells, which are wrap elements, are interpreted as part of the same table row, which is a (virtual) line element and in turn makes up a table with its colleagues. HTML has a wide array of elements that are meant to be used with tables. The result is a fairly competent, if tiresome, way of defining table data. Avoin notation is nowhere nere either the complexity or the semantically pleasing potential of HTML in this regard, but, as usual, the potential is rarely fully utilised, as many of the features remain unsupported even by browsers, let alone those who actually use the language.
Link Avoin notation can be used to link to another documents. Links behave similiarly to their HTML counterparts. If the environment knows what it's doing, they need no attributes. They can, however, be given information about the type of the destination and a spesific address. The environment plays a large part in here, and Avoin notation relies on it to define how exactly links' parameters are used. <a> is not that different from link, but it does serve a wider purpose as it is also used to mark bookmarks and keywords inside a document (anchors are not only links). In addition to attributes that define the "shape" and "coordinates" of an anchor, HTML does give the possibility of defining the language, document type and even character encoding of a target document. Avoin notation is not limited to web and needs additional information from the environment on how to treat the optional attributes of links.
Emphasis Emphasis is a wrap element used to highlight the importance of a certain part of content. This is not styling. HTML has several elements for separating sections in text, so many in fact that I'm not going to list them all here. Some have more to do with appearance, some with semantic differences, and some are very loosely defined (<span> shares much of its features with <div>). Most are useless.

Limitations

I'm sure it's quite apparent by now that Avoin notation is pretty limited and quite intentionally so. However there are still some annoyances in the system and some limitations I'd rather address.

Wrap elements

To have only a single wrap element (emphasis) for separating regular text elements in addition to links and quotes may prove too limited. It would be possible to offer additional information with optional parameters, but at this point this is not a part of Avoin notation standard.

Escaping reserved strings

Certain things cannot be marked. ** cannot be emphasised, || cannot be the only content in a table cell and so forth. If this is the price that has to be payed for practicality, simplicity and the imposed validness (so far, I haven't come up with a way of writing invalid Avoin notation code), I'm happy to pay it, but I'd be amazed if there wasn't a better way to escape reserved strings.

Headings

The reserved string system for headings is retarded and results in hard-to-escape reserved strings.

Media

So far I have not mentioned a single word about including images or other rich media objects in an Avoin notation document. It's supposed to be done with links, but this needs to be thought out still.

Tables and column headings

Columns in tables, elements for storing data that is categorised, cannot have headings: Avoin notation cannot reliably store information on what kind of data is stored in a table. We do have definitions, which would be ideal if we could be more spesific on how to determine the type of data in spesific columns (definitions can only refer to the table itself or a row inside it, not individual cells or columns).

Avoin notation also fails to distinguish between table rows of two consecutive tables that store unrelated data and just happen to have the same amount of columns.

With lists we only have one level of virtual elements. In tables we have two, which causes these problems.

Why Avoin notation?

Structure is often marked with HTML, which is a derivative of XML, or with XML itself. XML, however, is more of a syntax for syntaxes. I find it obstructive and bloat for marking structure in hypertext and similiar documents. It gives too much possibilities and not enough tools. It doesn't give a useful syntax for marking the kind of data that Avoin notation is designed to mark: even in HTML, which has predefined elements and attributes, there are multiple ways to structure data and much is left for interpretation (just think of the amount of different ways this document could have been marked). According to my experience this leads to much unnecessary confusion and work. Of course we can structure data in multiple ways, but ideally we should only have a single way of marking certain kind of structure.

HTML is easy to abuse, and it often is abused. Many web developers do not know how to use HTML properly. In Avoin notation this is countered with imposed validness and unambiguity.

Avoin notation is externally similiar to different types of wiki syntaxes introduced (at first glance it is anyway), but Avoin notation is even simpler and more unobstructive. Avoin notation strives to do only what is necessary and desireable, do it well and prevent all else. In Avoin notation it is hard to do anything illegal. It is intended to be used in such a way that everything not required is always dropped out. Many simple syntaxes are designed to do everything that is often done with HTML, but Avoin notation is designed to be what HTML should have always been.

This draft of Avoin notation is licensed under the BY-NC-SA Creative Commons licence. Designed by Eiskis lol.