Michael Brunton-Spall on Django, Twitter, Web Development, The Guardian
Find me on
  1. On 03rd of August, 2010 at 18:19
    Posted in Web Development, Security

    There have been a number of incidents recently where a public website I've been using has gone wrong shown me a nice server provided stack trace on the screen.  The most recent of these examples was the Cineworld website.

    This is really really bad for a number of reasons.

    1. It's a bad user experience

    My wife was using the cineworld website when it returned the error page.  Since she doesn't use the Apple Mac a lot she didn't know what was wrong, she told me that something was wrong with internet connection.  Since Chrome does display a strange error page when the internet is not available this is a perfectly valid assumption.  In fact it appears to be everybody's assumption that when you hit a url that if you see any technical text it's coming from your local computer rather than the website.  Normal non geek people don't think that websites can serve one page fine and fail when you follow a link.

    2. It reveals that you are having problems

    In fact because it's a technical error statement it reveals that you are having problems because of something you actually did.  i.e. you've deployed a bad bit of software or that your website is buggy.  The fact that you are revealing the error information shows that you have done something seriously wrong, and the implication is that you can't provide a good service.

    3. It reveals important technical information

    This one is important.  You might choose to reveal that your website is powered by Java, using the tomcat server, but you probably wont explain much more detail than that.  In this case it revealed a number of technical things that you probably didn't want revealed.

    The stack trace consisted of two traces, this is fairly common in Java and indicates that your web application uses a template system that defers to another thread to build up some portion of the website, and then composites the results.  Not a particularly big issue, but it tells any hostile attackers that you are using threads, and therefore multithreading code which might be susceptible to a number of issues.  There are worse things you could reveal however, and the rest of the stack trace contains most of them.

    The thread for rendering the HTML has a stack trace that goes through pieces of code that have the following packages, com.opensymphony.sitemesh, carbonfive.spring, org.springframework, com.cineworld, org.springframework.aop, net.sf.cglib.
    For those of you not in the java world this indicates that they are using the sitemesh and the spring framework.  This is fairly common, but there's some interesting things to know about the features that they use.  CarbonFive is far more interesting, since it's a creative web development agency that doesn't list Cineworld as one of it's clients, so this error message is giving out information that could be considered commercially sensitive.

    Finally the bottom of the page says it was rendered by Tomcat 6.0.18 - This is probably the worst thing that we could do.  For a start, knowing which specific version of a server you use lets people know what potential vulnerabilities there might be, but also in this case we can lookup and find out that Tomcat 6.0.18 was released in July 2008, over 2 years ago.  If this company hasn't upgraded their server software in over 2 years, you can imagine that the libraries and other servers might also not have been upgraded in that long.

    So what can you do?

    In this case, because the server was returning a stack trace (and a 500 result) the java application servers knew that something had gone wrong and there are a number of things that you can do.  At the guardian we catch these things with 3 different systems.

    When we render a component on our page, we pass through a piece of middleware that detects errors (essentially a large try... catch block) and in the case of an error, we replace the content with an empty div.  This means that if just a single component plays up we don't get a good looking page with a stack trace in a single block.

    Sometimes something higher than the component renderer throws an exception, for example the url processors, or the component renderer itself or any of the other supporting code then the component renderer wont catch it.  We therefore have a J2EE filter than does a similar thing to our error detecting component renderer and catches any site wide exceptions and replaces them with a generic error body.  It also ensures that we are returning the correct 500 HTTP error code.

    Finally as the final barrier, our frontend servers (apache in our case) detects 500 status codes from the application servers and replaces the entire page with a generic "There has been an error, the page you are looking for can't be served" type page.

     

    The error page that we server up looks like a guardian served page because of it's branding and it makes it clear that there has been an error at the guardians end and that we are investigating it.

    This means that it is almost impossible for our java application to display this kind of information back to the user scaring them or revealing important information.  Obviously there is almost certainly some weird combination of errors that we have forgotten, and when we stumble upon it (as it is inevitable that we will) we will add a fourth barrier to showing these pages to the end users.

    Following these conventions wont necessarily save you, but it will help you stop leaking information to attackers and scaring your customers.

  2. On 14th of April, 2010 at 22:40
    Posted in Twitter, @Anywhere, Tutorial, Coding

    I've been lucky enough over the last couple of weeks to have early access to the twitter anywhere platform and with the eminent Chris Thorpe (@jaggeree) build a few prototype that we could use with twitter anywhere.

    I thought it might be a nice start to write a simple blog post detailing an introduction to the first few features of twitter anywhere.

    Twitter anywhere is a service, provided by twitter that allows you to get access to twitter details in the webpage.  Written in Javascript, and hosted via IFrame's in your site, it allows frictionless access to twitter.

    Getting started 

    First you need to sign up at http://dev.twitter.com/anywhere/apps/new and create a twitter client application.

    In your website code you make a javascript call to initialise the @anywhere library.  I'm using jquery here to include the script and make a callback once it has rendered.  This ensures that the loading of this javascript library does not hold up the document rendering.

    jQuery(function($) {
    	$.getScript('http://platform.twitter.com/anywhere.js?id=KEY&v=VERSION', function() {
    	/* Your Code */
    	});
    });
    

    At this point you need to provide a key to the anywhere javascript.  This key is the key that you were assigned when you signed up to write an @anywhere application using the twitter API.  you also need to specify the version, for example 1.

    Once you have loaded the twitter anywhere library you can initialise it. Importing the twitter library will create a global javascript variable called twttr. The recommended method to initialise the library is to call the anywhere constructor.

    	twttr.anywhere(function(twitter) {
    	});
    

    The anywhere library acts a lot like jQuery, so your function is passed a reference to the twitter variable, allowing you to name it anything you want, but twitter is probably the best name for easiest code reading.

    Tweet from your website

    Once initialised the twitter library allows you a few commands.  The simplest is that you want to allow people to post a tweet automatically.  Here is a simple version that does that:

    	twitter('#tweetbox').tweetBox({
    	defaultContent: 'Just read a great article by @bruntonspall at '+window.location,
    	height: 100,
    	width: 250,
    	label: 'Tweet how awesome @bruntonspall is'
    	});
    

    It's really that simple.  The text in the twitter box is a standard jquery selector, and of course the tweetbox has a number of defaults, I've only overridden a couple here.  You can checkout the API docs to see the other options.

    The important bit about this is that it is a frictionless design.  In the past if you wanted to add this kind of tweet this functionality, you had to either run your own backend that did OAuth and users would have to authenticate your application, or you had to create a link that went offsite, direct to twitters website.  For a small blog like mine, that is not a big deal, but for most larger organisations, it's a big deal.

    The first time a user clicks the tweet button, a popup will be created to authorize your website to post to their tweet stream.  If the webpage that the user came from does not have the same document domain as the OAuth callback url that you specified then you will get an OAuth failure and the tweet will fail.  That can be a bit of a pain for testing, but you could create a second app, and provide your test environment with a seperate key to get around that. 

    Once a user has authorised your website once, they will never need to authorise it again unless they explicitly deauthorise your website first.  That means that once a user has authorised you, tweets become a single click, that does not navigate away from the page.  nice huh?  That's what twitter means when it says frictionless tweeting.

    The follow button

    The next feature is the follow button.  For my website I might want to put a follow button on that allows you to follow me.  That's as easy as you would expect, simply find the div and call the followButton method, like so:

    	twitter('#followbutton').followButton('@bruntonspall');
    

    This inserts a follow button into the element you specify.  Again if the user has not authenticated it will ask for authorisation, but once authorised it's a single click frictionless follow for your website.

    If your blog had multiple authors, you could easily pull the authors twitter name into the follow button to ensure that you can follow the author, or you could hardcode it to your site wide news feed twitter account.

    Hovercards and Linkify

    I've left this feature late because I'm not a huge fan of the hovercards.  You've seen these on the twitter site, if you hover your mouse pointer over a tweeters name, you get a little card pop up with some info about the tweeter.  Clicking on the more button gives you even more information, and you can follow and deal with lists inline.  Linkify is even simpler, it finds @bruntonspall in the text and wraps it with an A tag to link it to the relevant twitter page.

    You can enable twitter hovercards by calling the hovercards method.  It scans through the text on your page and translates @bruntonspall style twitter names that it finds.  You probably don't want it to parse your entire page though, so I'd recomend using it jQuery style:

    	twitter('.content').hovercards();
    

    Which will only do the hovercard parse on the text in elements with the class 'content'.

    There is something to note, both hovercards and linkifyusers do not parse text inside A, PRE, IFRAME, SCRIPT or STYLE tags by default.  

    It's also worth noting that calling hovercards will implicitly call linkifyUsers, so don't bother linkifying if you are also going to enable the hovercards.

    What if you want to enable a hovercard for something non-text?  An example is that in our prototype we wanted to hovercard twitter avatars.  You can pass a function to the hovercards method as the username parameter which is called on each element to find out if it has a username.  Simply return a username as a string and it will hovercard it correctly.  If you pass the username parameter hovercards wont linkify, so you need to do that by hand.  Our example might be

    	<img src='...' id='bruntonspall><img src='...' id='icklecat'>
    	twitter('img').hovercards({
    	username: function(e){
    	return e.id;
    	}
    	});
    

    The final bit is the most exciting, twitter anywhere provides you with access to the twitter account details if the user is signed into twitter and has authorised your website.  So once somebody has tweeted via your website, you can use those details for customisation or authentication purposes.  I'm sure that you can think of plenty of options here, but I've not had a chance to get them working, so you'll have to expect a follow up post once all the final details are posted

     

    I think that's enough for a brief overview, you can checkout the official docs for more and better information, and I hope you enjoy playing with twitter Anywhere.  My experience so far has shown it to be a rather beautiful API to work with.

    As you can see I've updated my site to use these features too, so feel free to tweet me using the box on the right, and checkout my twitter hovercards and information in the article.

  3. On 03rd of February, 2010 at 17:48
    Posted in Django, Python, PyCharm

    Did you see my link a few days ago, about PyCharm being released by JetBrains?  I hope so because it is a very interesting IDE for python and django developers.

    Introduction

    PyCharm attempts to up the Python IDE stakes, by bringing the expertise of the rather brilliant IntelliJ Java IDE to the python world. The idea is nice, and the execution is good, especially with the Django integration.

    Installation

    Installation was as simple as downloading the linux tar.gz file and extracting it.

    My first issue was that on a 64 bit system, it fails to start up with the message 

    Error occurred during initialization of VM
    Could not find agent library on the library path or in the local directory: yjpagent

    This is because it attempts to load a library for java profiling that is 32bit compatible only.  The easy workaround is to edit bin/pycharm.vmoptions and delete the line "-agentlib:yjpagent=disablej2ee,sessionname=PyCharm".  I think you could also download the 64 bit version of yjpagent and put it in the lib or bin directory, but haven't tried it yet.

    Google App Engine

    I did open directory and opened an existing Google App Engine project, which opened quite nicely.  I immediately got some nice warnings to let me know that I was not overriding some methods correctly along with a spurious error that self.request and self.response are unresolved attributes.  I guess that's a slight problem, the IDE can't possibly know that webapp.RequestHandler.initialise will be called before the get method, and since the __init__ method doesn't create those attributes the IDE can't be sure that they exist.  

    This kind of bug is the exact reason that python IDE's are hard, and not as helpful as in Java.  In Java the fact that RequestHandler has a request attribute would have to be declared in the class.  Since in python it doesn't, the IDE can't tell what type that attribute is, and therefore can't autocomplete the methods on it.  So using this for the App Engine webapp framework is no better than any other application.

    Django

    I next opened the pure django implementation for this website.  I had a weird issue here.  I use pip and virtualenv to ensure I have the correct version of django installed for each python project I build.  However PyCharm doesn't support a per project set of library dependencies, and my work machine doesn't actually have django installed system-wide.  I installed Django 1.1 system wide, and using Settings -> Python Interpreter -> Reload I was able to get PyCharm to reload.

    I was struck by some issues immediately.  Where I was previously doing import home.models I now need to include the project name, so import bruntonspall.home.models.  I also have a funky bit of python path manipulation in my settings.py, unfortunately PyCharm doesn't seem to handle them properly.  So several of my imports did not work anymore, which caused weird warnings everywhere.

    Final Thoughts

    This is only an early access program, and I will admit that for creating a new project from scratch, for general python programming this is as good an IDE as I've used.  It doesn't handle all of the really weird behaviours that I've started to use in my django projects, but for non-advanced systems it's probably highly useful.

    The biggest issue as far as I am concerned is the lack of good support for virtualenv, pip and pythonpath modification in settings.py.  It's possible that I've missed something, and the next few days I'll be using it heavily to test it out and find out what it does well, and what it does poorly.

  4. On 01st of January, 2010 at 15:13
    Posted in The Guardian, Web Development, Python, Scale Camp, Personal

    So as 2009 draws to a close, I look back over the year and consider what has happened.  With this being the end of the decade for everyone but pedants (that will be another year yet), I've also thought about the previous decade.

    10 years ago I was halfway through my degree, determined that I would become a famous game designer, or at least be designing with a  major games studio.  I was still determined not to become a programmer, and instead wanted to spend my days fiddling with excel, scripting langauges and writing design documents and so forth.  Fast forward 10 years and I'm now working as a programmer in the web industry, speaking at conferences and even organising conferences like scale camp!

    This year has been real fun, I've really settled into my role at the Guardian newspaper, and the coming year will hopefully see me extending that role into more developer evangalism focused areas.  I was pleased to recently read a list of a number of technologies that one "must have used" in 2009, including a NoSQL datastore, a dynamic language framework and a non-blocking long poll framework such as node.js.  I've had the opportunity this year to spend time using Reddis, Django, Python and even a brief play with node.js and the tornado framework.

    How about resolutions for 2010?  Well I'd like to increase the quantity and quality of my blogging.  That will hopefully mean writing more often, as I'm repeatedly told that the best way to improve ones writing abiity is to practice it as often as possible.  I wont promise any particular update schedule, as I've tried that before, but rest assured like all new years resolutions, january will be more active before I get bored and forget to do any more!

    I also wish to try my hand at some more new technologies, trying to keep up with the changes in the industry and the way in which programming is changing.  The main aims for the early year is going to be trying to have a play with Yahoo Application Framework and the Facebook platform, and trying out some more social tools to explore the concepts and patterns involved.

    I hope you have had a great 2009, and have a good new year and an enjoyable 2010.