Latest web development tutorials

Ruby XML, XSLT and XPath Tutorial

What is XML?

It refers to Extensible Markup Language XML (eXtensible Markup Language).

Extensible Markup Language, a subset of the Standard Generalized Markup Language, a method for marking an electronic document to have a structured markup language.

It can be used to tag data, defining data types, is a technology that allows users to define their own markup language source language. It is ideal for Web transmission, providing a unified approach to describing and exchanging independent of applications or vendors of structured data.

For more information, please see our XML tutorial


XML parser and API structure

XML parser SAX and DOM are mainly two kinds.

  • SAX parser is event-based processing, the XML document scanning needs from start to finish again, in the scanning process, each time experiencing a grammatical structure, it will call the event handler specific grammatical structure, send the application an event.
  • DOM Document Object Model analysis, hierarchical construct grammatical structure of the document, establish DOM tree in memory DOM tree node as an object to identify, document parsing Wencheng, the whole DOM tree will document in memory.

Ruby to parse and create XML

RUBY parsing of XML documents can use this library REXML library.

REXML library is an XML toolkit ruby ​​is to use pure Ruby language, follow XML1.0 norms.

In Ruby1.8 and later versions, the library will contain RUBY REXML.

Path REXML library is: rexml / document

All methods and classes are packaged into a REXML module.

REXML parser has the following advantages over other parsers:

  • 100% written by Ruby.
  • Applicable to SAX and DOM parser.
  • It is lightweight, less than 2000 lines of code.
  • Easy to understand methods and classes.
  • Based SAX2 API and full XPath support.
  • Use Ruby installation, without requiring separate installations.

The following is an example of XML code, save it as movies.xml:

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

DOM parser

Let's start to parse XML data First we introduced rexml / document library, we can usually be in the top-level namespace REXML introduced:

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 获取 root 元素
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# 以下将输出电影标题
xmldoc.elements.each("collection/movie"){ 
   |e| puts "Movie Title : " + e.attributes["title"] 
}

# 以下将输出所有电影类型
xmldoc.elements.each("collection/movie/type") {
   |e| puts "Movie Type : " + e.text 
}

# 以下将输出所有电影描述
xmldoc.elements.each("collection/movie/description") {
   |e| puts "Movie Description : " + e.text 
}

The above example output is:

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom
SAX-like Parsing:

SAX parser

Processing the same data file: movies.xml, SAX parsing is not recommended as a small file, the following is a simple example:

#!/usr/bin/ruby -w

require 'rexml/document'
require 'rexml/streamlistener'
include REXML


class MyListener
  include REXML::StreamListener
  def tag_start(*args)
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
  end

  def text(data)
    return if data =~ /^\w*$/     # whitespace only
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")
    puts "  text   :   #{abbrev.inspect}"
  end
end

list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)

Above output is:

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Viewable boredom"

XPath and Ruby

We can use XPath to view XML, XPath to find information is a document in XML language (See: XPath Tutorial ).

XPath is the XML Path Language, it is a method used to determine the XML (a subset of the Standard Generalized Markup Language) document language a part of the location. XPath-based XML tree, and provides the ability to look for in the data structure nodes in the tree.

Ruby's XPath support XPath by REXML class, which is based on the analysis (Document Object Model) tree.

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 第一个电影的信息
movie = XPath.first(xmldoc, "//movie")
p movie

# 打印所有电影类型
XPath.each(xmldoc, "//type") { |e| puts e.text }

# 获取所有电影格式的类型,返回数组
names = XPath.match(xmldoc, "//format").map {|x| x.text }
p names

The above example output is:

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

Ruby has two XSLT parser, a brief description is given below:

Ruby-Sablotron

This parser is written and maintained by the justice Masayoshi Takahash. This is mainly written for the Linux operating system, you need the following libraries:

  • Sablot
  • Iconv
  • Expat

You can Ruby-Sablotron find these libraries.

XSLT4R

XSLT4R written by Michael Neumann. XSLT4R for simple command line interaction, third-party applications can be used to transform XML documents.

XSLT4R need XMLScan operation, including XSLT4R archive, which is a 100% Ruby module. These modules can use the standard Ruby installation method (ie Ruby install.rb) installation.

XSLT4R syntax is as follows:

ruby xslt.rb stylesheet.xsl document.xml [arguments]

If you want to use XSLT4R in your application, you can introduce XSLT and input parameters you need. Examples are as follows:

require "xslt"

stylesheet = File.readlines("stylesheet.xsl").to_s
xml_doc = File.readlines("document.xml").to_s
arguments = { 'image_dir' => '/....' }

sheet = XSLT::Stylesheet.new( stylesheet, arguments )

# output to StdOut
sheet.apply( xml_doc )

# output to 'str'
str = ""
sheet.output = [ str ]
sheet.apply( xml_doc )

more info