Processing XML Documents with XProc

XProc, the XML Processing Language, is a powerful W3C recommendation designed to define workflows and pipelines for processing XML documents. It allows you to string together multiple steps in a sequence to transform, validate, and manipulate XML data. XProc simplifies complex XML workflows by offering a declarative approach to automate the tasks of combining, filtering, and transforming XML files.

What is XProc?

XProc is a language designed to process XML documents through a series of steps, often called pipelines. These pipelines can be used to perform tasks like:

  • Transforming XML with XSLT or XQuery.
  • Validating XML with XML Schema or Schematron.
  • Combining, splitting, or filtering XML documents.
  • Converting data formats.

It’s particularly useful for large-scale XML data processing tasks that need to be automated or repeated frequently.

Key Features of XProc

  • Declarative Workflows: Define complex XML processing pipelines in a structured, step-by-step format.
  • Modularity: Pipelines can be reused, making workflows more efficient.
  • Support for XML Standards: XProc natively supports XSLT, XQuery, XPath, and other W3C standards.
  • Error Handling: Provides built-in mechanisms for handling errors that arise during processing.

Basic Structure of XProc

An XProc pipeline consists of steps, which represent individual processing tasks. These steps can be chained together to form a pipeline. The basic structure of an XProc file is as follows:

XProc Declaration

Each XProc pipeline starts with the namespace declaration for XProc:

				
					<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
  
</p:declare-step>

				
			
  • The root element is <p:declare-step>, and the xmlns:p attribute declares the namespace for XProc.

XProc Pipeline Example: Transforming XML with XSLT

Let’s start with a simple example that shows how to transform an XML document using XSLT within an XProc pipeline.

Input XML (books.xml)

				
					<library>
    <book>
        <title>Learning XML</title>
        <author>John Doe</author>
    </book>
    <book>
        <title>Advanced XSLT</title>
        <author>Jane Smith</author>
    </book>
</library>

				
			

XSLT File (transform.xsl)

				
					<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <html>
            <body>
                <h1>Book List</h1>
                <ul>
                    <xsl:for-each select="//book">
                        <li><xsl:value-of select="title"/> by <xsl:value-of select="author"/></li>
                    </xsl:for-each>
                </ul> <script data-no-optimize="1">!function(t,e){"object"==typeof exports&&"undefined"!=typeof module?module.exports=e():"function"==typeof define&&define.amd?define(e):(t="undefined"!=typeof globalThis?globalThis:t||self).LazyLoad=e()}(this,function(){"use strict";function e(){return(e=Object.assign||function(t){for(var e=1;e<arguments.length;e++){var n,a=arguments[e];for(n in a)Object.prototype.hasOwnProperty.call(a,n)&&(t[n]=a[n])}return t}).apply(this,arguments)}function i(t){return e({},it,t)}function o(t,e){var n,a="LazyLoad::Initialized",i=new t(e);try{n=new CustomEvent(a,{detail:{instance:i}})}catch(t){(n=document.createEvent("CustomEvent")).initCustomEvent(a,!1,!1,{instance:i})}window.dispatchEvent(n)}function l(t,e){return t.getAttribute(gt+e)}function c(t){return l(t,bt)}function s(t,e){return function(t,e,n){e=gt+e;null!==n?t.setAttribute(e,n):t.removeAttribute(e)}(t,bt,e)}function r(t){return s(t,null),0}function u(t){return null===c(t)}function d(t){return c(t)===vt}function f(t,e,n,a){t&&(void 0===a?void 0===n?t(e):t(e,n):t(e,n,a))}function _(t,e){nt?t.classList.add(e):t.className+=(t.className?" ":"")+e}function v(t,e){nt?t.classList.remove(e):t.className=t.className.replace(new RegExp("(^|\\s+)"+e+"(\\s+|$)")," ").replace(/^\s+/,"").replace(/\s+$/,"")}function g(t){return t.llTempImage}function b(t,e){!e||(e=e._observer)&&e.unobserve(t)}function p(t,e){t&&(t.loadingCount+=e)}function h(t,e){t&&(t.toLoadCount=e)}function n(t){for(var e,n=[],a=0;e=t.children[a];a+=1)"SOURCE"===e.tagName&&n.push(e);return n}function m(t,e){(t=t.parentNode)&&"PICTURE"===t.tagName&&n(t).forEach(e)}function a(t,e){n(t).forEach(e)}function E(t){return!!t[st]}function I(t){return t[st]}function y(t){return delete t[st]}function A(e,t){var n;E(e)||(n={},t.forEach(function(t){n[t]=e.getAttribute(t)}),e[st]=n)}function k(a,t){var i;E(a)&&(i=I(a),t.forEach(function(t){var e,n;e=a,(t=i[n=t])?e.setAttribute(n,t):e.removeAttribute(n)}))}function L(t,e,n){_(t,e.class_loading),s(t,ut),n&&(p(n,1),f(e.callback_loading,t,n))}function w(t,e,n){n&&t.setAttribute(e,n)}function x(t,e){w(t,ct,l(t,e.data_sizes)),w(t,rt,l(t,e.data_srcset)),w(t,ot,l(t,e.data_src))}function O(t,e,n){var a=l(t,e.data_bg_multi),i=l(t,e.data_bg_multi_hidpi);(a=at&&i?i:a)&&(t.style.backgroundImage=a,n=n,_(t=t,(e=e).class_applied),s(t,ft),n&&(e.unobserve_completed&&b(t,e),f(e.callback_applied,t,n)))}function N(t,e){!e||0<e.loadingCount||0<e.toLoadCount||f(t.callback_finish,e)}function C(t,e,n){t.addEventListener(e,n),t.llEvLisnrs[e]=n}function M(t){return!!t.llEvLisnrs}function z(t){if(M(t)){var e,n,a=t.llEvLisnrs;for(e in a){var i=a[e];n=e,i=i,t.removeEventListener(n,i)}delete t.llEvLisnrs}}function R(t,e,n){var a;delete t.llTempImage,p(n,-1),(a=n)&&--a.toLoadCount,v(t,e.class_loading),e.unobserve_completed&&b(t,n)}function T(o,r,c){var l=g(o)||o;M(l)||function(t,e,n){M(t)||(t.llEvLisnrs={});var a="VIDEO"===t.tagName?"loadeddata":"load";C(t,a,e),C(t,"error",n)}(l,function(t){var e,n,a,i;n=r,a=c,i=d(e=o),R(e,n,a),_(e,n.class_loaded),s(e,dt),f(n.callback_loaded,e,a),i||N(n,a),z(l)},function(t){var e,n,a,i;n=r,a=c,i=d(e=o),R(e,n,a),_(e,n.class_error),s(e,_t),f(n.callback_error,e,a),i||N(n,a),z(l)})}function G(t,e,n){var a,i,o,r,c;t.llTempImage=document.createElement("IMG"),T(t,e,n),E(c=t)||(c[st]={backgroundImage:c.style.backgroundImage}),o=n,r=l(a=t,(i=e).data_bg),c=l(a,i.data_bg_hidpi),(r=at&&c?c:r)&&(a.style.backgroundImage='url("'.concat(r,'")'),g(a).setAttribute(ot,r),L(a,i,o)),O(t,e,n)}function D(t,e,n){var a;T(t,e,n),a=e,e=n,(t=It[(n=t).tagName])&&(t(n,a),L(n,a,e))}function V(t,e,n){var a;a=t,(-1<yt.indexOf(a.tagName)?D:G)(t,e,n)}function F(t,e,n){var a;t.setAttribute("loading","lazy"),T(t,e,n),a=e,(e=It[(n=t).tagName])&&e(n,a),s(t,vt)}function j(t){t.removeAttribute(ot),t.removeAttribute(rt),t.removeAttribute(ct)}function P(t){m(t,function(t){k(t,Et)}),k(t,Et)}function S(t){var e;(e=At[t.tagName])?e(t):E(e=t)&&(t=I(e),e.style.backgroundImage=t.backgroundImage)}function U(t,e){var n;S(t),n=e,u(e=t)||d(e)||(v(e,n.class_entered),v(e,n.class_exited),v(e,n.class_applied),v(e,n.class_loading),v(e,n.class_loaded),v(e,n.class_error)),r(t),y(t)}function $(t,e,n,a){var i;n.cancel_on_exit&&(c(t)!==ut||"IMG"===t.tagName&&(z(t),m(i=t,function(t){j(t)}),j(i),P(t),v(t,n.class_loading),p(a,-1),r(t),f(n.callback_cancel,t,e,a)))}function q(t,e,n,a){var i,o,r=(o=t,0<=pt.indexOf(c(o)));s(t,"entered"),_(t,n.class_entered),v(t,n.class_exited),i=t,o=a,n.unobserve_entered&&b(i,o),f(n.callback_enter,t,e,a),r||V(t,n,a)}function H(t){return t.use_native&&"loading"in HTMLImageElement.prototype}function B(t,i,o){t.forEach(function(t){return(a=t).isIntersecting||0<a.intersectionRatio?q(t.target,t,i,o):(e=t.target,n=t,a=i,t=o,void(u(e)||(_(e,a.class_exited),$(e,n,a,t),f(a.callback_exit,e,n,t))));var e,n,a})}function J(e,n){var t;et&&!H(e)&&(n._observer=new IntersectionObserver(function(t){B(t,e,n)},{root:(t=e).container===document?null:t.container,rootMargin:t.thresholds||t.threshold+"px"}))}function K(t){return Array.prototype.slice.call(t)}function Q(t){return t.container.querySelectorAll(t.elements_selector)}function W(t){return c(t)===_t}function X(t,e){return e=t||Q(e),K(e).filter(u)}function Y(e,t){var n;(n=Q(e),K(n).filter(W)).forEach(function(t){v(t,e.class_error),r(t)}),t.update()}function t(t,e){var n,a,t=i(t);this._settings=t,this.loadingCount=0,J(t,this),n=t,a=this,Z&&window.addEventListener("online",function(){Y(n,a)}),this.update(e)}var Z="undefined"!=typeof window,tt=Z&&!("onscroll"in window)||"undefined"!=typeof navigator&&/(gle|ing|ro)bot|crawl|spider/i.test(navigator.userAgent),et=Z&&"IntersectionObserver"in window,nt=Z&&"classList"in document.createElement("p"),at=Z&&1<window.devicePixelRatio,it={elements_selector:".lazy",container:tt||Z?document:null,threshold:300,thresholds:null,data_src:"src",data_srcset:"srcset",data_sizes:"sizes",data_bg:"bg",data_bg_hidpi:"bg-hidpi",data_bg_multi:"bg-multi",data_bg_multi_hidpi:"bg-multi-hidpi",data_poster:"poster",class_applied:"applied",class_loading:"litespeed-loading",class_loaded:"litespeed-loaded",class_error:"error",class_entered:"entered",class_exited:"exited",unobserve_completed:!0,unobserve_entered:!1,cancel_on_exit:!0,callback_enter:null,callback_exit:null,callback_applied:null,callback_loading:null,callback_loaded:null,callback_error:null,callback_finish:null,callback_cancel:null,use_native:!1},ot="src",rt="srcset",ct="sizes",lt="poster",st="llOriginalAttrs",ut="loading",dt="loaded",ft="applied",_t="error",vt="native",gt="data-",bt="ll-status",pt=[ut,dt,ft,_t],ht=[ot],mt=[ot,lt],Et=[ot,rt,ct],It={IMG:function(t,e){m(t,function(t){A(t,Et),x(t,e)}),A(t,Et),x(t,e)},IFRAME:function(t,e){A(t,ht),w(t,ot,l(t,e.data_src))},VIDEO:function(t,e){a(t,function(t){A(t,ht),w(t,ot,l(t,e.data_src))}),A(t,mt),w(t,lt,l(t,e.data_poster)),w(t,ot,l(t,e.data_src)),t.load()}},yt=["IMG","IFRAME","VIDEO"],At={IMG:P,IFRAME:function(t){k(t,ht)},VIDEO:function(t){a(t,function(t){k(t,ht)}),k(t,mt),t.load()}},kt=["IMG","IFRAME","VIDEO"];return t.prototype={update:function(t){var e,n,a,i=this._settings,o=X(t,i);{if(h(this,o.length),!tt&&et)return H(i)?(e=i,n=this,o.forEach(function(t){-1!==kt.indexOf(t.tagName)&&F(t,e,n)}),void h(n,0)):(t=this._observer,i=o,t.disconnect(),a=t,void i.forEach(function(t){a.observe(t)}));this.loadAll(o)}},destroy:function(){this._observer&&this._observer.disconnect(),Q(this._settings).forEach(function(t){y(t)}),delete this._observer,delete this._settings,delete this.loadingCount,delete this.toLoadCount},loadAll:function(t){var e=this,n=this._settings;X(t,n).forEach(function(t){b(t,e),V(t,n,e)})},restoreAll:function(){var e=this._settings;Q(e).forEach(function(t){U(t,e)})}},t.load=function(t,e){e=i(e);V(t,e)},t.resetStatus=function(t){r(t)},Z&&function(t,e){if(e)if(e.length)for(var n,a=0;n=e[a];a+=1)o(t,n);else o(t,e)}(t,window.lazyLoadOptions),t});!function(e,t){"use strict";function a(){t.body.classList.add("litespeed_lazyloaded")}function n(){console.log("[LiteSpeed] Start Lazy Load Images"),d=new LazyLoad({elements_selector:"[data-lazyloaded]",callback_finish:a}),o=function(){d.update()},e.MutationObserver&&new MutationObserver(o).observe(t.documentElement,{childList:!0,subtree:!0,attributes:!0})}var d,o;e.addEventListener?e.addEventListener("load",n,!1):e.attachEvent("onload",n)}(window,document);</script><script data-no-optimize="1">var litespeed_vary=document.cookie.replace(/(?:(?:^|.*;\s*)_lscache_vary\s*\=\s*([^;]*).*$)|^.*$/,"");litespeed_vary||fetch("/wp-content/plugins/litespeed-cache/guest.vary.php",{method:"POST",cache:"no-cache",redirect:"follow"}).then(e=>e.json()).then(e=>{console.log(e),e.hasOwnProperty("reload")&&"yes"==e.reload&&(sessionStorage.setItem("litespeed_docref",document.referrer),window.location.reload(!0))});</script><script data-optimized="1" type="litespeed/javascript" data-src="https://diginode.in/wp-content/litespeed/js/097f261601025361dc475e1c9b4cfeeb.js?ver=c8a38"></script><script>const litespeed_ui_events=["mouseover","click","keydown","wheel","touchmove","touchstart"];var urlCreator=window.URL||window.webkitURL;function litespeed_load_delayed_js_force(){console.log("[LiteSpeed] Start Load JS Delayed"),litespeed_ui_events.forEach(e=>{window.removeEventListener(e,litespeed_load_delayed_js_force,{passive:!0})}),document.querySelectorAll("iframe[data-litespeed-src]").forEach(e=>{e.setAttribute("src",e.getAttribute("data-litespeed-src"))}),"loading"==document.readyState?window.addEventListener("DOMContentLoaded",litespeed_load_delayed_js):litespeed_load_delayed_js()}litespeed_ui_events.forEach(e=>{window.addEventListener(e,litespeed_load_delayed_js_force,{passive:!0})});async function litespeed_load_delayed_js(){let t=[];for(var d in document.querySelectorAll('script[type="litespeed/javascript"]').forEach(e=>{t.push(e)}),t)await new Promise(e=>litespeed_load_one(t[d],e));document.dispatchEvent(new Event("DOMContentLiteSpeedLoaded")),window.dispatchEvent(new Event("DOMContentLiteSpeedLoaded"))}function litespeed_load_one(t,e){console.log("[LiteSpeed] Load ",t);var d=document.createElement("script");d.addEventListener("load",e),d.addEventListener("error",e),t.getAttributeNames().forEach(e=>{"type"!=e&&d.setAttribute("data-src"==e?"src":e,t.getAttribute(e))});let a=!(d.type="text/javascript");!d.src&&t.textContent&&(d.src=litespeed_inline2src(t.textContent),a=!0),t.after(d),t.remove(),a&&e()}function litespeed_inline2src(t){try{var d=urlCreator.createObjectURL(new Blob([t.replace(/^(?:<!--)?(.*?)(?:-->)?$/gm,"$1")],{type:"text/javascript"}))}catch(e){d="data:text/javascript;base64,"+btoa(t.replace(/^(?:<!--)?(.*?)(?:-->)?$/gm,"$1"))}return d}</script></body>
        </html>
    </xsl:template>
</xsl:stylesheet>

				
			

XProc Pipeline (pipeline.xpl)

				
					<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
    <p:input port="source"/>
    <p:input port="stylesheet"/>
    <p:output port="result"/>

    <p:xslt name="transform">
        <p:with-input port="source" href="books.xml"/>
        <p:with-input port="stylesheet" href="transform.xsl"/>
        <p:output port="result"/>
    </p:xslt>
</p:declare-step>

				
			

Explanation:

  • <p:declare-step>: This is the root element of the XProc pipeline, which declares the steps involved.
  • <p:input>: Defines inputs for the pipeline, such as the source XML document (books.xml) and the XSLT stylesheet (transform.xsl).
  • <p:xslt>: The xslt step processes the input XML with the provided XSLT stylesheet.
  • <p:with-input>: Specifies which XML document (books.xml) and stylesheet (transform.xsl) to use for the transformation.

Output:

The pipeline will produce an HTML file displaying the list of books with their authors, as transformed by the XSLT stylesheet.

Core Components of XProc

XProc pipelines are built from several fundamental components, each representing different steps in the workflow. Below are the core components you’ll use frequently when creating pipelines.

Steps

Steps are the building blocks of XProc pipelines. There are different types of steps, including:

  • Atomic Steps: Perform specific tasks, like xslt, validate-with-xml-schema, or p:load.
  • Compound Steps: Group multiple steps together. The p:group step is an example of a compound step.

Example: Using Steps in XProc

				
					<p:xslt>
    <p:with-input port="source" href="input.xml"/>
    <p:with-input port="stylesheet" href="style.xsl"/>
    <p:output port="result"/>
</p:xslt>

				
			

Ports

Ports define how input and output are passed between steps in the pipeline.

  • Input Ports: Declare input data sources. For example, p:input defines where data comes from.
  • Output Ports: Specify the result of a step, like p:output.

Pipelines

A pipeline is a sequence of steps defined by the <p:declare-step> element. You can combine multiple steps to form complex workflows.

Advanced Concepts in XProc

XProc offers several advanced features that enable you to build complex and robust XML processing pipelines. Below, we explore key advanced topics.

Conditionals and Loops

XProc supports conditional processing using the p:choose element, allowing you to perform different steps based on conditions.

Example: Conditional Processing

				
					<p:choose>
    <p:when test="$format = 'xml'">
        <p:xslt>
            
        </p:xslt>
    </p:when>
    <p:otherwise>
        
    </p:otherwise>
</p:choose>

				
			

Error Handling

XProc provides mechanisms for error handling using the p:catch element, which allows you to catch and manage errors during the execution of a pipeline.

Example: Error Handling

				
					<p:try>
    <p:xslt>
        <p:with-input port="source" href="input.xml"/>
        <p:with-input port="stylesheet" href="style.xsl"/>
    </p:xslt>
    <p:catch>
        <p:log>Transformation failed.</p:log>
    </p:catch>
</p:try>
				
			

In this example, if the XSLT transformation fails, the pipeline logs an error message.

Common Use Cases for XProc

XProc is useful in many different XML processing scenarios. Some common use cases include:

Data Transformation

XProc can be used to apply transformations to XML data, such as converting XML to HTML using XSLT.

Data Validation

You can validate XML documents against XML Schema or Schematron rules using the p:validate-with-xml-schema or p:validate-with-schematron steps.

Example: XML Schema Validation

				
					<p:validate-with-xml-schema>
    <p:with-input port="source" href="input.xml"/>
    <p:with-input port="schema" href="schema.xsd"/>
    <p:output port="result"/>
</p:validate-with-xml-schema>

				
			

This step validates the XML document against the specified XML Schema.

Document Combination

XProc can combine multiple XML documents into one using the p:wrap or p:unwrap steps.

Example: Combining XML Documents

				
					<p:wrap>
    <p:with-input port="source">
        <p:document href="doc1.xml"/>
        <p:document href="doc2.xml"/>
    </p:with-input>
    <p:output port="result"/>
</p:wrap>

				
			

XProc offers a powerful and flexible framework for processing XML documents. By defining declarative pipelines, you can automate complex XML workflows, including transformations, validations, and document manipulation. Whether you are working with simple XML tasks or large-scale data processing, XProc simplifies the process, allowing you to build reusable and efficient workflows. Happy Coding!❤️

Table of Contents

Contact here

Copyright © 2025 Diginode

Made with ❤️ in India