Dev:attributeSyntax

From wiki.railML.org
Jump to: navigation, search
Note.png This page is mirrored from page Dev:attributeSyntax in The railML® 3 wiki.

XML Syntax issues

Correct Syntax for XML attributes (Delimiters)

Sometimes there is a little confusion about the correct XML syntax in attribute definitions. Which version is valid, which not?

  1. <element attribute="value"/>
  2. <element attribute='value'/>

W3C gives us the answer: both are correct!

Definitions in the W3C XML specification

The following definitions use "a simple Extended Backus-Naur Form (EBNF) notation".

Definition 41

Attribute ::= Name Eq AttValue
"... the content of the AttValue (the text between the ' or " delimiters) as the attribute value."

Definition 10

AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

These "simple EBNF" is quite a bit difficult to understand, irrespective of this it allows both quotation marks as attribute delimiters. But the use should be consistent for each attribute definition. You may not mix both for one attribute.

Usage in XML parsers and XML processors

Thus each XML parser has to "accept" both versions (according to the W3C standard).

Despite this you cannot demand your XML processor to output a preferred delimiters' version. The XML processors' implementation decides about this syntax.

From current experience has to be noted that XML files with "double quotes" as attribute delimiters are much more common than XML file with "single quotes".

Workaround in case of problems with single quotation marks

If you have troubles with the "single quotes" version than pipe your XML file through the following simple tool:

tr '\047' '\042' <input-file.xml >output-file.xml
  • It "translates" the character ' (047) into " (042).
  • "input-file.xml" is your XML file with single quotes.
  • "output-file.xml" becomes your XML file with double quotes.

This tool is very fast and doesn't get in troubles with really large uncompressed XML files.

But be aware that any character occurence will be changed:

before
<name>King's Cross St. Pancras</name>
thereafter
<name>King"s Cross St. Pancras</name>

But regardless of this - diacritics stay unchanged:

before
<name>Kings Cross St. Pancras</name>
thereafter
<name>Kings Cross St. Pancras</name>

Character references

A railML file may contain character references such as every other XML file, e.g. &#252; or &#xFC; for the ü character.

W3C Definition Character Reference

A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.
[66] CharRef  ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
...
If the character reference begins with " &#x ", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with " &# ", the digits up to the terminating ; provide a decimal representation of the character's code point.

That means, character references may occur everywhere independent of the encoding declaration on top of the XML file. For the attribute-styled railML files, they mainly may occur in attributes. Please be aware of this topic by consuming railML files. For further infos, you may start at the following Wikipedia site (external link).

Back to Common overview