CO:dc:language: Difference between revisions

From railML 2 Wiki
Jump to navigation Jump to search
[checked revision][checked revision]
(Created using new templates)
 
No edit summary
 
Line 3: Line 3:
|selfLink = {{CO:Tag|dc:language}}
|selfLink = {{CO:Tag|dc:language}}
|contextLink = {{CO:Tag|railml}}
|contextLink = {{CO:Tag|railml}}
|childs={{CO:Tag|any}}
|back = CO:elements
|back = CO:elements
|semantics =
|semantics =

Latest revision as of 17:01, 26 February 2017


dc:language
 


Scheme description / Schemenbeschreibung

Position of dc:language in the XML-Tree / Position von dc:language im XML-Baum

Multiplicity / Anzahl

[1..1]

Semantics / Bedeutung

Dublin Core definition
"A language of the resource."
Dublin Core description
"Recommended best practice is to use a controlled vocabulary such as RFC 4646 (external link)."
railML® semantics
This may be used for identifying the default language and/or script of all name and description attributes in the children elements of <railml>.

If some element within the <railml> element uses a separate xml:lang attribute its local re-definition overwrites the default language/script setting from this high-level element.

The element <dc:language> is intended to contain the Character Set the data belongs to. This value is of importance in case the containing Unicode names (station names a.s.o.) have to be converted into a non-Unicode-string by the reading software.

This value is not to be mixed with <?xml ... encoding='UTF-8' ?> which defines the coding of the railML® file (and all of its characters). Since a railML® file is normally coded in UTF-8 the value of <dc:language> is not necessary for pure reading. It is only necessary if the names have to be converted into a non-Unicode-string for the final target software. A reading algorithm shall not need to 'scan' the names for special characters which would mean a more empirical solution.

For example, in case of <dc:language>und-Grek</dc:language> a reading programme could - without this statement - only recognise that there are obviously Greek names by finding high Unicode-(UTF-8-)values.

Please note that for the main purpose of this element - defining the Script Character Set of the names in the railML® file - it is not necessary nor usual to know a language. Most names (station names) are international and cannot be assigned to a certain language. Therefore, the language sub-tag of the BCP47 tag is often set to "und" (undefined) and only the script sub-tag is used, such as und-Latn, und-Grek, und-Arab, und-Hebr or und-Cyrl. If a writing programme does not know the script of the names it shall skip the attribute <metadata>.language.

A problem is that there are no BCP47 tags for the European character sets of Latin script. There is the one and only und-Latn tag to be used for several code-pages as Western Europe, Central Europe, Balic a. s. o. Therefore, a further naming of the code-page number such as may be necessary but unfortunately there is no attribute for that so far.

Das Element <dc:language> soll den Zeichensatz angeben, in dem die Daten im schreibenden Programm vorliegen. Dieser Wert ist von Bedeutung, wenn die in der railML®-Datei enthaltenen UTF-8-Zeichenketten (Bahnhofsnamen usw.) vom lesenden Programm in eine Nicht-Unicode-Zeichenkette umgewandelt werden müssen.

Dieser Wert ist nicht zu verwechseln mit <?xml ... encoding='UTF-8' ?>, welcher die Codierung der railML®-Datei und damit aller enthaltener Zeichen definiert. Da eine railML®-Datei normalerweise im Zeichensatz UTF-8 codiert ist, ist der Wert <dc:language> zum reinen Auslesen der railML®-Datei nicht notwendig. Er ist nur von Bedeutung, wenn die Eigennamen aus der railML®-Datei im Zielprogramm in Nicht-Unicode-Zeichensätze umgewandelt werden müssen. Ein lesendes Programm soll hier nicht erst alle Eigennamen auf eventuell vorhandene Umlaute "scannen" müssen, um sich auf eher empirische Weise für einen Zeichensatz zu entscheiden. Vielmehr soll das schreibende Programm - wann immer möglich - quasi die Herkunft oder Zugehörigkeit der Namen zu einer Sprache bzw. Region angeben.

Attributes of dc:language / Attribute von dc:language

  • xml:lang (introduced with version 2.1): This is a unique identifier of language. It uses basically the language standard IETF BCP 47 (external link) which may be different to ISO 639-1 (external link) or ISO 639-2 (external link). For mapping hints see relation to other standards (external link).
    The usage of xml:lang does not make any sense in the context of an language identifier.

Syntactic Constraints / Syntaktische Beschränkungen

The Element <dc:language> may occur once.
The controlled vocabulary for the element <dc:language> is the same as for the xml:lang attributes: IETF BCP 47 (external link). Just in case a language is being scripted in different ways, or to address a certain script, add the script identifier (see ISO 15924 (external link) for the four-letter alphanumeric code)

Best practice & Examples / Empfohlene Anwendung & Beispiele

  <dc:language>de-CH</dc:language>
  <dc:language>de-DE</dc:language>
  <dc:language>el-Grek</dc:language>
  <dc:language>el-Latn</dc:language>

Notes / Anmerkungen

None.

Open issues / Offene Punkte/Pendenzen

Not yet described. / Noch nicht beschrieben.