Towards the industrialization of XML processing

Last updated Thursday, December 27, 2001

Outstanding Problems

Some problems that need addressing (in no particular order). Last updated December 27, 2001

Problem: Naming of XTLs and XComponents

Each XComponent has an associated Type that specifies what XML Transformation Language (XTL) it has been written in.

XPipe/XRig excutives and other XPipe tools make use of this type information for a variety of things but principally to ensure that the execution environment contains the necessary programming language environment(s) to execute XComponents.

For the time being, an XComponent Type is simple one of the following strings

  • "Java" - An XComponent implemented in Java (the programming language - not the platform)
  • "Jython" - An XComponent implemented in Jython (The Java Platform version of Python)
  • "XSLT" - An XComponent implemented in XSLT
  • "Exec" - An XComponent written in something else. The CmdStr attribute contains a template that the XPipe executive can use to invoke an external "shell" to execute the component.

The problem is that "Java" does not tell you all you need to know. What version of Java? What plaform (in case there are platform specific dependancies). Similar problems for Jython, XSLT etc.

This is a good old taxonomy problem. Is there a taxonomy of programming languages we could use? Preferably one that uses semantic identifiers?

XML namespaces spring to mind but there are problems. Namespaces are opaque strings, they do-not have well defined component pieces from which to build up a naming system that would allow us to say "Java version 2 or higher" or "XSLT version 2.1 only"

Would a UDDI registry help?

Should we set up our own taxonomy and allow XPipe developers to add entries? 

Problem: Multi-process synchronization in XPipe executives

Single threaded (and by implication single-processor) XPipe execuitives have the luxury of not worring about synchronization of XML instance file IO. When writing an XML instance, there is no fear of another part of the XPipe/XRig executive trying to read the same file as there is only one thread of execution.

As you move up the ladder into more and more powerful executives for XPipes and XRigs, no such luxury exists. In the presence of multiple threads of control, there needs to be some way to synchronize IO. Simple thread synchronization mechanisms may work but given the network centric view of the world espoused by the XGrid execution environment in particular, would it not make sense to address the IO synchronization issue as part of generalized distributed scheduling architecture?

Some candidates include JMS, TupleSpaces (and in particular http://jxtaspaces.jxta.org/), WebDav.