Dev:attributeSyntax: Difference between revisions

From railML 2 Wiki
Jump to navigation Jump to search
[checked revision][checked revision]
m (Ferri Leberl moved page CO:attributeSyntax to Dev:attributeSyntax: Vereinheitlichung)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
= XML Syntax issues =
= XML Syntax issues =


==  Correct Syntax for XML attributes (Delimiters) ==
==  Correct quoting of XML attribute values ==


Sometimes there is a little confusion about the correct XML syntax in attribute definitions. Which version is valid, which not?<br>
Sometimes there is a little confusion about the correct XML syntax in attribute definitions. Which version is valid, which not?
{{deu|Manchmal besteht Uneinigkeit über die richtige XML-Syntax für Attributdefinitionen. Welche der folgenden Varianten ist korrekt? Welche nicht?}}


# <element attribute='''"'''value'''"'''/>
# <element attribute='''"'''value'''"'''/>
# <element attribute='''<nowiki>'</nowiki>'''value'''<nowiki>'</nowiki>'''/>
# <element attribute='''<nowiki>'</nowiki>'''value'''<nowiki>'</nowiki>'''/>


{{external|http://www.w3.org/TR/REC-xml|W3C}} gives us the answer: both are correct!<br>
{{external|https://www.w3.org/TR/REC-xml|The W3C XML specification}} gives us the answer: both are correct!
{{deu|{{external|http://www.edition-w3.de/TR/REC-xml|W3C}} gibt uns die Antwort: Beide Varianten sind korrekt!}}
 
=== Definitions in the W3C XML specification ===
 
The following definitions use {{external|http://www.w3.org/TR/REC-xml/#sec-notation|"a simple Extended Backus-Naur Form (EBNF) notation"}}.<br>
{{deu|Die folgenden Definitionen nutzen die {{external|http://www.edition-w3.de/TR/REC-xml/#sec-notation|"einfache erweiterte Backus-Naur Form (EBNF)}}".}}
 
; {{external|http://www.w3.org/TR/REC-xml/#NT-Attribute|Definition 41}} {{external|http://www.edition-w3.de/TR/REC-xml/#NT-Attribute|(de)}}:
<blockquote>
Attribute ::= Name Eq AttValue<br>
"... the content of the AttValue (the text between the ' or " delimiters) as the attribute value."<br>
{{deu|"... der Inhalt des AttValue (der Text zwischen den '- oder "-Zeichen) der Attribut-Wert ist."}}
</blockquote>
 
; {{external|http://www.w3.org/TR/REC-xml/#NT-AttValue|Definition 10}} {{external|http://www.edition-w3.de/TR/REC-xml/#NT-AttValue|(de)}}:
<blockquote>
    AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
</blockquote>
 
These "simple EBNF" is quite a bit difficult to understand, irrespective of this it allows both quotation marks as attribute delimiters. But the use should be consistent for each attribute definition. You may not mix both for one attribute.<br>
{{deu|Diese "simple EBNF" ist schon ein bißchen schwierig zu verstehen, unabhängig davon erlaubt sie beide Arten von Anführungszeichen für Attributdefinitionen zu verwenden. Man muss sie aber konsistent einsetzen und darf nicht bei einer Attributdefinition beide verschieden für Anfang und Ende verwenden.}}


=== Usage in XML parsers and XML processors ===
=== Usage in XML parsers and XML processors ===


Thus each XML parser has to "accept" both versions (according to the W3C standard).<br>
Thus each XML parser has to accept both versions (according to the W3C standard), and you may choose to use either one in your output.
{{deu|Somit muss jeder XML-Parser beide Varianten akzeptieren und verarbeiten (wenn er standard-konform ist).}}


Despite this you cannot demand your XML processor to output a preferred delimiters' version. The XML processors' implementation decides about this syntax.<br>
From current experience, it has to be noted that XML files with "double quotes" as attribute delimiters are much more common than XML file with 'single quotes'.
{{deu|Demgegenüber kann man von einem XML-Prozessor keine bevorzugte Variante abverlangen. Die Ausgabesyntax in der XML-Datei ist im Programmcode des XML-Prozessors festgelegt.}}
 
From current experience has to be noted that XML files with "double quotes" as attribute delimiters are much more common than XML file with "single quotes".<br>
{{deu|Aus der Erfahrung sollte angemerkt werden, dass XML-Dateien mit "doppelten Anführungszeichen" in Attributdefinitionen sehr viel häufiger vorkommen als XML-Dateien mit "einfachen Anführungszeichen".}}
 
=== Workaround in case of problems with single quotation marks ===
 
If you have troubles with the "single quotes" version than pipe your XML file through the following simple tool:<br>
{{deu|Falls es Probleme mit den "einfachen Anführungszeichen" im gewünschten XML-Parser gibt, kann man die XML-Datei einfach durch folgendes Programm schicken:}}
 
<pre>tr '\047' '\042' <input-file.xml >output-file.xml</pre>
<blockquote>
* It "translates" the character ' (047) into " (042).
: {{deu|Es "übersetzt" alle Zeichen ' (047) nach " (042).}}
* "input-file.xml" is your XML file with single quotes.
: {{deu|"input-file.xml" ist dabei die XML-Datei mit "einfachen Anführungszeichen".}}
* "output-file.xml" becomes your XML file with double quotes.
: {{deu|"output-file.xml" ist die zu erzeugende XML-Datei mit "doppelten Anführungszeichen".}}
</blockquote>
 
This tool is very fast and doesn't get in troubles with really large uncompressed XML files.<br>
{{deu|Dieses Programm ist sehr schnell und hat keine Schwierigkeiten mit richtig großen unkomprimierten XML-Dateien.}}
 
But be aware that ''any'' character occurence will be changed:<br>
{{deu|Aber man sollte im Kopf behalten, dass wirklich _alle_ Zeichen, die in das Muster passen, ersetzt werden:}}
<blockquote>
; before {{deu|vorher}} : <name>King'''<nowiki>'</nowiki>'''s Cross St. Pancras</name>
; thereafter {{deu|nachher}} : <name>King'''"'''s Cross St. Pancras</name>
</blockquote>
 
But regardless of this - diacritics stay unchanged:<br>
{{deu|Demgegenüber bleiben "Diakritische Zeichen" unverändert:}}
<blockquote>
; before {{deu|vorher}} : <name>King'''’'''s Cross St. Pancras</name>
; thereafter  {{deu|nachher}}: <name>King'''’'''s Cross St. Pancras</name>
</blockquote>


== Character references ==
== Character references ==


A railML file may contain character references such as every other XML file, e.g. '''&amp;#252;''' or '''&amp;#xFC;''' for the '''ü''' character.
A {{rml}} file may contain character references such as every other XML file, e.g. '''&amp;#252;''' or '''&amp;#xFC;''' for the '''ü''' character.


; {{external|http://www.w3.org/TR/REC-xml/#dt-charref|W3C Definition Character Reference}}:
; {{external|https://www.w3.org/TR/REC-xml/#dt-charref|W3C Definition Character Reference}}:
<blockquote>
<blockquote>
A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.<br>
A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.<br>
Line 87: Line 29:
</blockquote>
</blockquote>


That means, character references may occur everywhere independent of the '''encoding''' declaration on top of the XML file. For the attribute-styled railML files, they mainly may occur in attributes. Please be aware of this topic by consuming railML files. For further infos, you may start at the following {{external|https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references|Wikipedia site}}.
That means, character references may occur everywhere independent of the '''encoding''' declaration on top of the XML file. For the attribute-styled {{rml}} files, they mainly may occur in attributes. Please be aware of this topic by consuming {{rml}} files. For further info, e good entry point might be the {{wiki|List of XML and HTML character entity references}}.
 
Back to [[Hauptseite#Common_.28CO.29| Common overview]]


[[Category:GeneralDescription]]
[[Category:GeneralDescription]]
[[category:Common Concept]]
[[category:Common Concept]]

Latest revision as of 17:45, 22 January 2024

XML Syntax issues

Correct quoting of XML attribute values

Sometimes there is a little confusion about the correct XML syntax in attribute definitions. Which version is valid, which not?

  1. <element attribute="value"/>
  2. <element attribute='value'/>

The W3C XML specification gives us the answer: both are correct!

Usage in XML parsers and XML processors

Thus each XML parser has to accept both versions (according to the W3C standard), and you may choose to use either one in your output.

From current experience, it has to be noted that XML files with "double quotes" as attribute delimiters are much more common than XML file with 'single quotes'.

Character references

A railML® file may contain character references such as every other XML file, e.g. &#252; or &#xFC; for the ü character.

W3C Definition Character Reference

A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.
[66] CharRef  ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
...
If the character reference begins with " &#x ", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with " &# ", the digits up to the terminating ; provide a decimal representation of the character's code point.

That means, character references may occur everywhere independent of the encoding declaration on top of the XML file. For the attribute-styled railML® files, they mainly may occur in attributes. Please be aware of this topic by consuming railML® files. For further info, e good entry point might be the List of XML and HTML character entity references (Wiki banner.png).