XPipe Presentation

XPipe - An XML Processing Methodology

XML 2001 Florida, USA
December 13 2001

Sean McGrath
CTO
Propylon
http://www.propylon.com
sean.mcgrath@propylon.com

What is XPipe?

It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing ystems.
Based on proven mechanical manufacturing techniques. Specifically:

The Assembly Line Principle
Component assembly and component re-use

An open source project hosted on Sourceforge (http://xpipe.sourceforge.net)
A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations

(If you do not find XML transformation complicated, you are not sufficiently well informed.)

(And no, XSLT does not solve all your problems!)

Contents of this talk

The XPipe philosophy
Major functional elements
Some examples
Relationship to other technologies
The XGrid
Some anticipated objections (and answers)
Current status
Current problems
Future plans

The XPipe Philosophy

Cars Are complex, hierarchical structures
Henry Ford’s Model T Ford Assembly Line – 1914

Lunch is a complex hierarchical structure Lunch under construction at a Subway store

We are complex, hierarchical structures created on assembly lines.

Human tendon showing complex hierarchical structure

What have these scenes got it common?

Complex construction of cars, tuna melts and tendons made possible and efficient through
assembly line manufacturing
re-usable component processes and component materials
Why not apply this approach to XML “manufacturing”?

Why does the assembly line approach work?

Transformation task decomposition
Re-usable transformation components
Transformation decomposition is the key to complexity management. Just ask:

Henry Ford
Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”)
George Miller (7+/-2)
Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776)
Any electrical or chemical engineer.

Component re-use is the key to productivity

Ask any form of engineer (electrical, chemical etc.) apart from software engineers…
Component re-use remains a holy grail in software engineering
XPipe is yet another attempt…

XPipe philosophy

A lot of data processing will consist of XML to XML transformation
A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations:-
XML to XML transformation with possible non-XML start and end-pojnts

Mantra

Get data into XML as quickly as possible
Keep it in XML until the last possible minute
Bring all your XML tools to bear on solving the data processing problem

The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together:-

Any complex XML transformation is a series of smaller, less complex transformations chained together

There are only so many ways to re-arrange an XML tree structure. Consider Rubics Cube - a complex transformation to solve but there are only a certain number of fundamental transformations involved

A complex transformation made up of a finite number of fundamental transformations

A finite number of fundamental transformations, from which all higher order transformations can be derived
Transformation Decomposition leads to:

a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”.
Can build, test, use and then re-use these transformation components
Very team development friendly
High cohesion, loose coupling – just like the professor advised

More XPipe philosophy

Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem

Lexical
SAX
DOM
XSLT
XDuce, Pyxie, Haskell…

Stages in an XPipe can use whatever paradigm best suites the problem at hand

Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves

“Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…”

“Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth
I disaggree. It is at least 60%.
XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff

Major Functional Elements – XComponents

Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.)

All XComponents are standalone programs of the form

[Name] [InputXML] [OutputXML] [ErrorXML]

XComponents described in XML form. An Xcomponent consists of:

Documentation
Unit Tests (input,output XML stream pairs)
Metadata for retrieval
Input and Output predicates – declarative (DTD/RelaxNG/Schema) or procedural (code)

Major Functional Elements – XComponent Unit Tester

Standalone program analogous to JUnit or PyUnit but for XML transformation component testing

Very outsource-friendly and “inbetweenable” approach (specify everything but the code == spec+doc+test harness all in one)

Major Functional Elements – XPipes

Described in XML

They consist of

Documentation
Input/Output Predicates (Schemas/Code)
Test Suite
References to XComponents which are resolved when the XPipe is compiled

Major Functional Elements – XPipe Executive

Uniprocessor: XPipe executed on 1 machine, possibly with separate threads for each XComponent task
Multiprocessor: XML based protocol to implement “Job Shop” work distribution over a P2P network

Major Functional Elements – XPipe Monitor

Analagous to monitoring systems for fluid flow systems.
SCADA based systems have a lot of potential here

Some related open technologies

| - Unix Pipes
SAX Filters
TRAX
XBeans
Cocoon
axKit
JXTA
Translets
TupleSpaces

Simple XComponent examples

Fundamental Operation – Rename Element

Rename
Input : <foo>baz</foo>
Output: <bar>baz</bar>

Rename of foo element to bar element

Fundamental Operation - Peel
Input : <foo><bar>baz</bar></foo>
Output: <foo>baz</foo>

Peeling a bar element

Compound Operation - Matryoshka
Input:
<foo><bar>baz</bar></foo>
Output:
<foo></foo><bar></bar>baz

Unravelling elements like Russian Dolls

KlingonCloak

Input:
<foo><bar>baz</bar></foo>
Output:
<tag name=“foo”><tag name=“bar”>baz</tag></tag>

Making elements invisible but retaining the element type names

XComponents

Once you start thinking in terms of Pipes – components appear everywhere:

Regular fragmentations
Doctype changer
Namespace normalizer
Character set transcoder
Hash generator
RelaxNG/Schematron etc
A validator can be thought of as a component in an Xpipe that mirrors its input on its output

Validation as an XComponent

The XGrid

Grid Technologies – computational power “on tap” (http://www.gridforum.org)
The XGrid – computational power “on tap” to execute XPipes

XGrid - massively parallel XML processing with grid technology

Some objections (with some answers)

It will be slow

No it won’t - Premature optimization is the root of all evil!
Speed is a three headed monster.

Speed is a three headed monster

I’m old enough to have left the X axis and currently heading for Y through Z

Besides, massive Parallelism will kill all von Neumann throughput arguments
"Documents per second" is the important metric - not seconds per document
A myriad of “compile time” optimizations on XPipes possible
Keep the architecture simple – and speed will sort itself out

Pipes are not rich enough, real data flows require graphs

Inside every graph is a collection of straight segments
Do the smallest thing than can possible work
XComponents can conditionally flow data in different directions – graph

Component based software? Harumph! We have heard that one before…

XPipe is data flow based not API based (COM, VBX, CORBA).
The payload is what is important – not the plumbing
Information integration (needed on the server side)– not application integration (needed on the client side)

Current Status

Schemas for XPipes and XComponents on xpipe.sourceforge.net

Sample components (Java/XSLT/Jython) and some documentation

Simple, illustrative XPipe uniprocessor executives

Draft of XJCL – XGrid Job Control Language

Uniprocessor XPipe used to develop

80-C pipe from Hub notation for a complex document type to a legacy mainframe display notation. 120 page spe

20-C pipe for semantic validation of legislation documents

Xpipe and XComponent validators

Current Problems

Everybody agrees that an XML document is a tree but:

The content and structure of the tree depends on the parser

The content and structure of re-generated XML (The round-tripping problem)

Naming things

Taxonomy of XTLs (XML Transformation Languages)

Taxonomy of re-usable XComponents and XPipes

Flexible transformation scheduling is hard
Optimal transformation scheduling is very hard

Packaging

Future Plans

Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of:

A transclusion component (entity expansion)
A macro pre-processor (conditional marked sections)
An attribute decorator (implied/fixed attributes)
A grammar checker
Valid XML

Valid XML as a pipeline transformation of valid XML

XPipes and XComponents as web services (SOAP/XML-RPC, UDDI etc.
Getting the P2P and Grid Technology communities input into XGrid.
Getting help to develop the XPipe reference implementation on Sourceforge
Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing by Propylon)
Use of SCADA tools to develop XPipe process control and monitoring systems
Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering)
Digging around hierarchy theory, self-assembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations

In conclusion

XPipe is simple

Simplicity works!

Plenty of evidence outside of XML engineering that this approach will work

Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach

Thank you

Sean McGrath
http://xpipe.sourceforge.net

Home