Friday, July 13, 2012

Defeating X-Frame-Options with Scraping

Introduction

Iframes are an element of web design that are loved and hated. Web developers (used to) love them because they easily allowed resources from various sites to be loaded on-demand within a webpage. Security professionals hate them because they allow content of one site (such as a login page) to be loaded within another site that may not be trusted. This introduces a security concern known as click-jacking where a malicious site overlays invisible elements over what the user believes is a safe login form.

The Solution

Since these concerns arose, the X-Frame-Options header was developed to prevent the loading of one site within an iframe of another. This header is supported by all major browsers and includes two options:
  • SAMEORIGIN - the site can only be loaded within pages of the same domain
  • DENY - the page cannot be loaded in a frame at all

Page Scraping

The goal of X-Frame-Options, as described above, was to prevent the loading of one site within another, potentially malicious site. However, there are multiple ways a site's contents can be displayed, and an iframe is only one. Page scraping can be done via a server-side PHP, Python, or other language script. The code below is an example of how a page's code can be loaded using PHP:

<?php

$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
$url = "https://www.yahoo.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);

echo $html;

?>

This small bit of code, when loaded on any website, at any URL, will cause the contents of Yahoo's home page to be displayed. A malicious user could overlay hidden elements over the $html echoed out and easily execute the same attack X-Frame-Options prevents.

Protections

Luckily, some large websites like Google and Facebook are aware of issues like these and use a complex combination of user-agent and IP address checks to prevent server-based scripts from loading their content. Replace yahoo.com with facebook.com/your_username in the code above and you'll notice that nothing loads.

You Don't Even Need a Server!

This problem is not typically able to be replicated on a simple HTML page without server-side code because JavaScript has cross-domain policies that prevent it from retrieving content from a different domain (in other words, you can't scrape the HTML using just JavaScript). However, through some clever trickery, http://call.jsonlib.com/ has enabled JavaScript scraping. What is actually happening is the JavaScript makes a call to a server hosted on the public Internet which serves as a middle-man. So the same process is happening (server-side scraping), it is just being called locally using JavaScript.

For this to work, save the file: http://call.jsonlib.com/jsonlib.js locally. Then, create a simple HTML page with the following code:

<script type="text/javascript" src="jsonlib.js"></script>
<script type="text/javascript">
    function fetchPage(url) {
        jsonlib.fetch(url, function(m) { document.getElementById('test').innerHTML=m.content; });
    }
</script>

<body onload="fetchPage('https://donate.mozilla.org/page/contribute/join-mozilla?source=join_link')">
<div id="test">
</div>

Most likely many large websites block sites that enable this middle-man functionality. However, it is easy for anyone with access to a server to recreate the process. X-Frame-Options are very useful, but there is no need to use iframes any longer when copying the entire site's code locally works just as well. To protect yourself against these kinds of attacks, always make sure that you do not enter sensitive data on domains you do not trust. Always check the URL bar before typing!

1 comment:

  1. This won't work with forms that reference relative paths. Is there a way to overcome this?

    ReplyDelete