Using the MSVTM XML Datatypes Library

JavaTM Technology Implementation of XML Schema Part 2

The MSV XML Datatypes Library, JavaTM technology implementation of XML Schema Part 2, is intended for use with applications that incorporate XML Schema Part 2.

Contents


Note: This distribution of the XML Datatypes Library includes a sample class file, src/com/sun/tranquilo/datatype/CommandLineTester.java, which is provided as a guide for implementing your own Java classes with the Datatypes Library.

Introduction: Validating Strings

The following example validates a string with integer datatype. The getTypeByName method lets you obtain a reference to a built-in datatype.


import com.sun.msv.datatype.xsd.XSDatatype;
void f( String v ) {
  // obtain a type object
  XSDatatype dt = DatatypeFactory.getTypeByName("integer");
  // validates a type
  if( dt.isValid(v,null) )
    ; // v is a valid integer
  else
    ; // v is not a valid integer
}

Some datatypes require context information to validate. For example, the QName type, which validates something like prefix:localPart, needs to know that the given prefix is properly declared. This information must be supplied to Datatype object by the caller. To do this, the caller must provide an object that implements the ValidationContext interface and pass it as the second parameter of the verify method.


import org.relaxng.datatype.ValidationContext;
class MyContext implements ValidationContext
{
  String resolveNamespacePrefix( String prefix ) {
    // resolves namespace prefix to namespace URI.
  }

  boolean isUnparsedEntity( String entityName ) {
    // checks if given name is a valid entity name.
  }
  ...
}

void f( String v, MyContext context ) {
  // obtain a type object
  XSDatatype dt = DatatypeFactory.getTypeByName("QName");
  // validates a type
  if( dt.isValid(v,context) )
    ; // v is a valid QName
  else
    ; // v is not a valid QName
}

When the datatype is "context-dependent", the caller must provide a valid ValidationContext, as in the second example. For other datatypes, the caller can pass null, as in the first example. Use the isContextDependent method to check if a datatype is context dependent or not.


Deriving A New Type

By List

The following example derives a new type from an existing XSDatatype object by list.


XSDatatype deriveByList( XSDatatype itemType ) throws DatatypeException {
  return DatatypeFactory.deriveByList("","myType",itemType);
}

The first two parameters specifiy the namespace URI and the local name of the newly created type.

When an error is found during derivation, a org.relaxng.datatype.DatatypeException will be thrown. For example, if you derive a type by list from another list type, an exception will be thrown.

By Union

The following example derives a new type from existing XSDatatype objects by union.


XSDatatype deriveByUnion( XSDatatype[] memberTypes ) throws DatatypeException {
  return DatatypeFactory.deriveByUnion("","myType",memberTypes);
}

By Restriction

The following example derives a new type by adding facets.


XSDatatype f() throws DatatypeException {
  XSDatatype baseType = DatatypeFactory.getTypeByName("string");

  // create a type incubator with the base type
  TypeIncubator incubator = new TypeIncubator(baseType);

  // add facets
  incubator.addFacet( "minLength", "5", false, null );
  incubator.addFacet( "maxLength", "20", false, null );

  // derive a new type by those facets
  XSDatatype derived = incubator.derive("","newTypeName");

  return derived;
}

The third parameter to the addFacet method specifies whether that facet should be "fixed" or not. Once a facet is fixed, a derived type can no longer restrict that facet.

The fourth parameter is again ValidationContext, which is sometimes necessary (imagine adding an enumeration facet to QName.) The above example does not supply one since we know the base type is a context independent datatype, but in general the caller should supply an object that implements ValidationContext.

DatatypeException can be thrown when you add a facet, or when you call the derive method.


Diagnosing Errors

The following example provides a diagnostic message to users about what is wrong with their value.


void test( XSDatatype dt, String v, ValidationContext context ) {
  try {
    dt.checkValid(v,context);
    System.out.println("valid");
  } catch( DatatypeException e ) {
    if( d.getMessage()==null )
      System.out.println("invalid: diagnosis not supported");
    else
      System.out.println("invalid: "+d.getMessage());
  }
}

In this way, the user gets informative error messages. If the Datatype object does not support diagnosis, the getMessage method returns null. It is the caller's responsibility to handle this situation correctly.


Known Limitations

  1. Types float and double: the spec says a lexical value must be mapped to the closest value in the value space. However, This library cannot accept values that are larger than the maximum value for that type or smaller than the minimum value for that type. This should not be a problem for most users.

  2. ID, IDREF, and IDREFS types are not implemented. Although uniqueness constraints are removed from Part 2, these types are still intended to be used with uniqueness constraints. These constraints are so special that it is impossible to provide a generic implementation. com.sun.tranquilo.datatype.DatatypeFactory does not recognize these three types.

  3. NOTATION type is not implemented. com.sun.tranquilo.datatype.DatatypeFactory does not recognize this type.

  4. length, minLength, and maxLength facets are effectively limited to the value 2147483647. Values above this limit are recognized, but will be treated as this limit. Items larger than this limit will not be validated correctly. This limitation has no practical impact.

  5. Regarding length and min/maxLength facets of anyURI the spec does not define what is the unit of length. This version implements length facet as being in units of XML characters in the lexical space.

  6. Regarding length and min/maxLength facets of QName: again, the specification does not define the unit of length. This version implements length facet as being in units of XML characters in the value space ( # of chars in namespace URI + local part ). Users are strongly discouraged from applying length-related facets to QName type.

  7. anyURI (formerly uriReference) is made to accept several IP v6 addresses like ::192.168.0.1, which are not accepted by the original BNF specified in RFC 2373. This modification should be considered as a "bug fix." Although the BNF specified in RFC 2373 has several other problems, those are not fixed. For example, the current release accepts 1:2:3:4:5:6:7:8:9, which is not a valid IP v6 address.

  8. language type is implemented in accordance with RFC 1766, and language identifiers are treated in a case-insensitive way. XML SchemaPart 2 says that the lexical space of the language type will be as defined in XML1.0 Recommendation 2nd edition, but that production was thrown away in the 2nd edition. Furthermore, the derivation shown inXML Schema Part 2 does not correctly implement the definition given in RFC 1766, so apparently there is a problem in the definition of the language type. However, by making language case-insensitive, it is no longer a derived type of token type.

  9. Regarding base64Binary type, RFC 2045 states that "any characters outside of the base64 alphabet are to be ignored in base64-encoded data." This makes "validation" of base64Binary meaningless. For example, <picture>))))</picture>; is considered as valid base64Binary. Developers should keep this in mind.

  10. minInclusive, maxInclusive, minExclusive, and maxExclusive facets of date/time related types do not work properly. XML Schema Part 2 is broken as regards the order relation of these types. This also affects the behavior of the "enumeration" facet (since equality is a part of order-relation). See Kawaguchi's comments to www-xml-schema-comments@w3.org for details ( [1] [2] [3] and [4] ).