×
Clear all filters including search bar
Valeri Tandilashvili's XML Notes
XML
stands for Extensible Markup Language
application/xml
is the official Internet media type for XML.
.xml
is the XML filename extension.
XML is extensible
− XML allows us to create our own self-descriptive tags.
XML carries the data, does not present it
− XML allows us to store the data irrespective of how it will be presented.
XML is a public standard
- XML was developed by an organization called the World Wide Web Consortium.
Any type of data can be expressed as an XML document
XML can be used to exchange
the information between companies and systems<?xml version="1.0" encoding = "UTF-8"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
The names of XML-elements are enclosed by triangular brackets < >
<element>
Each XML-element needs to be closed
either with start or with end elements as shown below<element>...</element>
XML allows self-closing elements, for example if the tag empty<?xml version="1.0"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<address/>
<phone>(011) 123-4567</phone>
</student>
Children elements must not overlap parent elements. i.e., an element end tag must follow all of its children's end tags.
<company>
is closed after the </contact-info>
tag but it's opened after </contact-info>
tag, which is wrong!<?xml version = "1.0"?>
<contact-info>
<company>Learn Practice Teach
</contact-info>
</company>
The following example shows the correct nested tags<?xml version = "1.0"?>
<contact-info>
<company>Applications.ge</company>
<contact-info>
One root element
is necessary for an XML document. In this example below, both <x>
and <y>
elements are at the top level and they don't have one parent element, which is wrong:<?xml version = "1.0"?>
<x>...</x>
<y>...</y>
XML-elements are case-sensitive
, which means that the start and the end elements names need to be exactly in the same case. This example is not correct XML document, because case sensitivity
is not correctly applied<?xml version="1.0"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<address/>
<phone>(011) 123-4567</Phone>
</student>
In this case <phone>
and its close tag </phone>
is not in the same case<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>
XML document can optionally have an XML declaration. XML document without declaration is also valid<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
If the XML declaration is included, it must contain version number attribute<?xml encoding="UTF-8" standalone="no"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
It will generate the following error:error on line 1 at column 7: Malformed declaration expecting version
The names are always in lower case<?xml Version="1.0" encoding="UTF-8" standalone="no"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
It will generate the following error:error on line 1 at column 7: Malformed declaration expecting version
The XML declaration must begin with <?xml
<? version="1.0" encoding="UTF-8" standalone="no"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
The following error will be generated:error on line 1 at column 3: xmlParsePI : no target name
If document contains XML declaration, it must be the first statement
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
The error will be the following:error on line 6 at column 6: XML declaration allowed only at the start of the document
The order of placing the parameters is important. The correct order is: version
, encoding
and standalone
<?xml encoding="UTF-8" standalone="no" version="1.0"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
This will generate the following error:error on line 1 at column 7: Malformed declaration expecting version
Either single or double quotes may be used. Here is valid XML document<?xml version='1.0' encoding='UTF-8' standalone="no"?>
<student>
<name>George</name>
<city>Tbilisi</city>
<phone>(011) 123-4567</phone>
</student>
An HTTP protocol can override the value of encoding that we put in the declaration.elements
, attributes
and data types
.
Here is how XML schema looks like<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="company">
<xs:complexType>
<xs:sequence>
<xs:element name="id" type="xs:unsignedInt" />
<xs:element name="name" type="xs:string" />
<xs:element name="phone" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The XML schema is derived from the following XML:<company>
<id>2859385</id>
<name>Tanmay Patil</name>
<phone>(011) 123-4567</phone>
</company>
The XML schema is generated using the following online tool:https://www.liquid-technologies.com/online-xml-to-xsd-converter
the root
An XML tree
starts at a root element
and branches from the root to child elements
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The terms parent
, child
, and sibling
are used to describe the relationships between elements.
...
Parents
have children.
Children
have parents.
Siblings
are children on the same level (brothers and sisters).
...
We can use any of the online XML tree viewer tools to see the document's tree structure, like this one:https://www.xmlviewer.org/
Books
XML document to see on tree-viewer<books>
<book category="cooking">
<title lang="en">The goals</title>
<author>Giada De Laurentiis</author>
<year>2007</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">12 rules for life</title>
<author>J K. Rowling</author>
<year>2002</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">The richest man in Babilon</title>
<author>Erik T. Ray</author>
<year>2001</year>
<price>39.95</price>
</book>
</books>
XML Attribute
specifies a single property for the element, using a name/value
pair. An XML-element can have one or more
attributes<a href = "http://www.applications.ge/">Applications.ge</a>
In the example href
is the attribute name
and http://www.applications.ge
is the attribute value
...
XML attribute names (unlike HTML) are case sensitive. Which means that HREF
and href
are considered two different XML attributes<?xml version="1.0"?>
<student>
<a href="http://www.applications.ge/" hreF="HTTP://applications.ge/">Applications.ge</a>
</student>
Several different values for the same attribute is not allowed. An attribute name must not appear more than once in the same tags:<?xml version="1.0"?>
<student>
<a href="http://www.applications.ge/" href="HTTP://applications.ge/">Applications.ge</a>
</student>
The following error will appear on the browsererror on line 3 at column 73: Attribute href
redefined
Attribute names must always be defined without quotation marks, whereas attribute values must always appear in single or double quotation marks<?xml version="1.0"?>
<student>
<a "href"="http://www.applications.ge/">Applications.ge</a>
</student>
The following error will appear:error on line 3 at column 8: error parsing attribute name
Attribute values must always be in quotation marks (single '
or double "
quotes)<?xml version="1.0"?>
<student>
<a href=http://www.applications.ge/>Applications.ge</a>
</student>
This incorrect syntax will generate the following error:error on line 3 at column 13: AttValue: " or ' expected
-
, under-score _
and period .
are allowed in element name. The XML example is valid<?xml version="1.0" encoding="UTF-8"?>
<student>
<first-name>George</first-name>
<phone.mobile>(011) 123-4567</phone.mobile>
<native_language>English</native_language>
<city />
</student>
symbols
in names are the hyphen -
, under-score _
, period .
and digits 0-9
- Names are case sensitive
, Address, address, and ADDRESS are different names.
- Start and end tags of an element must be the same
.
- An element, which is a container, can contain text
or elements
<?xml version="1.0" encoding="UTF-8"?>
<student>
<first-name>George</first-name>
<phone.mobile>(011) 123-4567</phone.mobile>
<native_language>English</native_language>
<city />
</student>
Note: XML element name must not start with .
, -
, digit
CDATA
means, Character Data
. CDATA is defined as blocks of text that are not parsed
by the parser, but are otherwise recognized as markup<?xml version="1.0" encoding="UTF-8"?>
<student>
<!-- Some comment about the student -->
<first-name>George</first-name>
<phone.mobile>(011) 123-4567</phone.mobile>
<city />
<description>
<![CDATA[
<p>
<a href="/mylink/article1"><img style="float: left; margin-right: 5px;" height="80" src="/mylink/image" alt=""/></a>
Author Names
<br/><em>Date</em>
<br/>Paragraph of text describing the article to be displayed</p>
]]>
</description>
</student>
CDATA Start section
- CDATA begins with the nine-character delimiter <![CDATA[
CDATA End section
- CDATA section ends with ]]>
delimiter
CData section
- Characters inside CData
section are interpreted as characters, and not as markup.
It may contain markup characters <
, >
, and &
, but they are ignored by the XML processorpart of the document
, while a comment is not
2. In CDATA we cannot include the string ]]>
, while in a comment --
3. CDATA content is visible on the web if we specify xmlns
attribute as http://www.w3.org/1999/xhtml
, even if the file is saved as .xml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>CDATA Example</title>
</head>
<body>
<h2>Using a Comment</h2>
<div id="commentExample">
<!--
You won't see this in the document
and can use reserved characters like
< > & "
-->
</div>
<h2>Using a CDATA Section</h2>
<div id="cdataExample">
<![CDATA[
You will see this in the document
and can use reserved characters like
< > & "
]]>
</div>
</body>
</html>