Rick
>> What about if we do it another way: what if we make it necessary for
> permitted datatype values to be declared locally but provide a mechanism
> that allows the values themselves to be validated at some separate point
in
> the lifecycle, e.g.
>
> <validEntries>
> <permittedValue source="http...." date="...." value="KOR KO">KO</value>
> ....
> </validEntries>
>Can you explain more? Do you mean that we have two classses:
1) the (subset of interest of) the controlled vocabulary at the time of
document created or schema creation and
2) the current value of the controlled vocabulary.
>So these validate
1) Is this document consistent with the world of the schema creator?
2) Is this document still consistent?
>That is an interesting approach.
That was (note the tense, for reasons I will explain in a munute) the
intent.
The reasoning went along the lines:
a) users need to use codes from internationally recognized code sets
b) users need to be able to subset large code sets
c) for validation to be guaranteed valid codes need to be part of the schema
(the point you raised)
d) codes in the schema need some form of validation checks against the
original source.
However, this does not work correctly in the following two scenarios:
A) Users want to validate that a product's EAN identification code is
correct
B) Users want to identify customers by the DUNS or EAN identifiers.
The problem in both cases is that the list of potential entries is too large
to include in the schema, and is updated too regularly for good schema
maintenance. It turns out there is a subtle difference between A and B. For
A any product number must be validated against the supplier's list of valid
product numbers he can supply. The full list maintained by EAN is
irrelevant: only the supplier's catalogue can identifiy valid numbers for
use in an order sent to that supplier. For B any member of a centrally held
list might apply, or there might be a restriction at either the requestor's
or the supplier's side to restrict the set to an "approved list" of
recognized numbers. We could, however, extend the approach suggested above
to cover both A and B scenarios by allowing empty elements for the valid
entries tag with the following format:
<validEntries source="....." query="....."/>
where source points to the resolver (either a local database or an external
database) and query contains the form of query required by the source to be
able to return a boolean confirmation that the value can be used. (The query
could be in SQL, XQL, etc, or just an XPath statement.)
> Haven't had any feedback yet on the idea of using a really minimalist
string
> matching approach rather than formally defining sets of datatypes. Is this
a
> total non-starter?
We certainly need string matching using regex somewhere. I see RELAX NG
has had discussion on adding REGEX into it, and that might be best.
So now we are thinking about:
1) matching token against regex
2) matching token against local list of permitted values and
(probably at some other time) against an external list
3) doing some kind of integrity check, treating the enumeration as a link
or
key
Is that right?
Not quite. My minimal datatypes proposal (which I attach) was based on the
STX approach, which said that there should be some predefined datatypes
based on strings, but these should be defined using regex, so that a single
mechanism can be used for both the primitives and the extensions. I hadn't
got as far as defining lists of permited values, or working out how they
might be validated. However 2) and 3) can be handled using the mechanisms
mentioned above.
Martin
-- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC