This document proposes a minimal set of datatypes for the validation of the character data strings that provide text nodes within XML elements, the contents of CDATA sections and attribute values within XML documents.
[Issue M1: Should datatypes be applicable to CDATA sections, or should these require the specification of a notation to process their contents?]
Before datatype validation of a document's contents can take place processing must have taken place to apply the rules of XML that a) convert the input to UTF-8 or UTF-16 format, with any relevant rules for combining character sequences (such as the Unicode Normalization Form C) and b) parsed the markup to identify text nodes and attribute values for which datatype validation needs to be performed. As the datatype validation rules are also expressed in XML they must be processed in the same way. The rules used for parsing datatype declarations must apply the same rules for the combining of character sequences, etc, as are applied to the document instance.
Each DSDL datatype definition is assigned a unique identifier within the
datatype definition. The type of a particular element in a DTD or schema is
determined by assigning a dsdl-5:type attribute to the element
(normally as a default value within the DTD or schema). The value of this
attribute must be the identifier of one of the datatypes defined in the
datatype definitions file used to validate the document instance, which is
associated with the DTD/schema by a dsdl-5:datatypeLibrary
attribute.
[Issue M2: How do we relate the Part 5 properties to the <data
datatypeLibrary="u" type="ln"> construct already
defined in Part 2? (Part 2 should not be constraining other parts. There
should be a common way of identifying datatypes within datatype libraries
within the DSDL namespace. The question arises whether there is a single DSDL
namespace, or a set of them? If the latter then Part 5 needs its own one, and
Part 2 needs to refer to that.)]
Note: Terminals in italic are defined in the W3C XML 1.0 specification
dsdl-5:datatype ::= dsdl-5:stringLiteral
| dsdl-5:numberLiteral | dsdl-5:booleanLiteral |
dsdl-5:calendarLiteral | dsdl-5:periodLiteral | dsdl-5:quanityLiteral |
dsdl-5:currency | dsdl-5:ratioLiteral | dsdl-5:resourceIdentifier |
dsdl-5:listLiteral
dsdl-5:stringLiteral ::= CharData
Note: Data that is not character data should be identified as having a specific notation. This includes data coded as hexadecimal or base64 binary data, or any other data whose meaning needs an interpretation other then as basic character data.
dsdl-5:numberLiteral ::= dsdl-5:integerLiteral | dsdl-5:decimalLiteral | dsdl-5:exponentLiteral
dsdl-5:integerLiteral ::= ['+'|'-']? [0-9]+
[Issue M3: Should leading zeros be allowed for integers?]
dsdl-5:decimalLiteral ::= ['+'|'-']? [0-9]+ ('.' [0-9]+)?
[Issue M4: Should the decimal point and at least one following number be compulsory for decimals?]
dsdl-5:exponentLiteral ::= ['+'|'-']? [0-9]+ ('.' [0-9]+)? ('e'|'E') ['+'|'-']? [0-9]+
[Issue M5: Should the literal pattern prevent numbers such as 0.0e0 from being defined?]
dsdl-5:booleanLiteral ::= ('true' | '1') | ('false' | '0')
[Issue M6: Should T and F on their own be
recognized as valid boolean literals? What about definitions for use with
languages other than English?]
dsdl-5:calendarLiteral ::= CharData
Note: A compulsory dsdl-5:dtf attribute is used to
constrain the data type format (dtf) of an element or
attribute whose content conforms to the dsdl-5:calendarLiteral datatype.
The following codes can be used within a dsdl-5:dtf
pattern to identify the subcomponents of entered data:
CC Century (Must immediately
precede a YY string. May be preceded by a hyphen to
indicated dates before the start of a calendar)
YY Year (00-99)
M Month (1-12)
MM Month (01-12)
MMM Month (as text string in language
identified by compulsory xml:lang attribute for element)
D Date (1-31)
DD Date (01-31)
h Hour (0-23)
hh Hour (00-23)
mm Minutes (00-59)
ss.sss Second and, optionally, fraction of second
(00-59.999) or
ss,sss Second and, optionally, fraction of a second
(00-59,999)
Z Universal Time
Co-ordinates to apply
[+|-] Timezone offset to apply (followed by hh:mm,
whose values must be in range 00:15 to 14:00)
For example, an ISO 8601 extended date/time could be defined as dsdl-5:dtf="CCYY-MM-DDThh:mm:ss[+|-]hh:mm".
[Issue M7: Should we allow the ISO 8601 Day in Year (2002-365)
and Day in Week in Year (2002W156) formats to be defined as
well?]
[Issue M8: Should we allow the fractional part of hours and minutes to be entered? Should we allow for commas as delimiters for fractional parts as well as periods, so that all ISO 8601 formats are allowed for? Do we need to allow for leap seconds that can add 60 as a valid number for seconds on specified dates in certain years?]
[Issue M9: Do we need to be able to recognize strings such as 23rd May 2002 as valid dates? Do other languages use qualified day numbers?]
dsdl-5:periodLiteral ::= '-'? 'P' [0-9]+
('.' [0-9]+)? 'Y'
([0-9]+ ('.' [0-9]+)? 'M'
([0-9]+ ('.' [0-9]+)? 'D'
('T' [0-9]+ ('.' [0-9]+)? 'H'
([0-9]+ ('.' [0-9]+)? 'M'
([0-9]+ ('.' [0-9]+)? 'S')?)?)?)?)?
dsdl-5:quanityLiteral ::= [0-9]+ ('.' [0-9]+)? S? dsdl-5:quantifier
dsdl-5:quantifier ::= [^0-9.] CharData
dsdl-5:currency ::= (dsdl-5:ISO4217currency, dsdl-5:decimal) | (dsdl-5:decimal, dsdl-5:currencyIndicator)
dsdl-5:ISO4217currency ::= [A-Za-z$£¥€¢]
[Issue M9: What other currency indicators does ISO 4217 recognize?]
dsdl-5:ratioLiteral ::= [dsdl-5:decimal '/' dsdl-5:decimal S dsdl-5:quantifier?]
dsdl-5:resourceIdentifier ::= [^&{}|^\`"<> ]
Note: The resource identifier must be a valid IETF resource identifier, as defined in IETF RFC2396 and any documents that extend or replace this.
dsdl-5:listLiteral ::= dsdl-5:datatype (S dsdl-5:datatype)+
[Issue M10: Should all items in a list be confined to a single datatype?]
The following properties can be used to constrain datatypes that are derived from
dsdl-5:stringLiterals:
dsdl-5:pattern)and either
dsdl-5:length)dsdl-5:maxLength)dsdl-5:minLength)The following properties can be used to constrain datatypes that are derived from
dsdl-5:numberLiterals:
dsdl-5:maxInclusive)dsdl-5:maxExclusive)dsdl-5:minInclusive)dsdl-5:minExclusive)The following additional properties can be used to constrain dsdl-5:decimals:
dsdl-5:totalDigits)dsdl-5:fractionDigits)The following additional properties can be used to constrain dsdl-5:integers:
dsdl-5:totalDigits)[Issue M11: Do we need any additional constraints for dates/times/durations?]
Where users require application-specific datatypes ....