Wednesday 28 November 2012

Arbitrary Java webapps and Heroku

Heroku has been around for a while now, and it's an excellent service. They make it stupidly easy to deploy and manage your web applications, including expansion of resources as necessary and a whole host of plugins for all sorts of extra services.

The big problem with Heroku is that it only really worked if you wrote your applications in a way that they supported. The way it worked was that you pushed your Git source tree up to them, and they would build and deploy it automatically. And they did support a large number of very popular systems for doing this, so it wasn't all that limiting for the most part. It does suffer from the same problem that a lot of things do these days - the whole feeling of "If you do it our way, it's easy. If you don't do it our way, well tough."

Well - the good news is that, for Java webapps at least, this is now history. I'm not sure how long it's been available, but I've just found a page on their wiki detailing WAR Deployment. Which means that you can literally write you webapp in any way that you want, using any technologies that you want, and and build system, and anything at all like that, and as long as you can get a standard WAR file out of the build then you can deploy it to Heroku and get all the benefits of doing so. This is fantastic for anyone who wants to write Java webapps but not tie themselves to certain technologies, but it also starts to open up the way for enterprise developers to be able to write standard J2EE webapps and get them deployed on the Heroku cloud instead of the companies needing to run their own container solution.

Sunday 21 October 2012

User Only authentication

I remember reading somewhere not too long ago - but I can't remember where! Sorry - about a suggestion for authentication that relied on browser sessions and the "Remember Me" idea to totally remove the need for passwords. The basic idea was that if you need to log in, you enter your email address into a box and the server will email you a specially crafted link. Following that link will then work as a one-time login to the site, but in such a way that it will keep you logged in until you press the Log Out button. This has the advantage that you only need to have access to your email to be able to log into the site.

I've been dwelling on this idea, and thought of some potential extensions to it to make life even easier for people. The first thought was to detect if the email was a Google Account, and if so then use Google Authentication - meaning to redirect through the Google Account Login process - to log you in. This has the advantage that if you are already logged into your Google account in the browser then you need do almost nothing to log into the website in question. The big problem with this idea is that, as best as I can work out, there is no way to detect if a given email address is a valid Google Account or not. I have, however, found a way to detect if a domain name is a valid Google Apps domain, and this can be coupled with the fact that the standard Google addresses are only from a very small set - gmail.com and googlemail.com are the only ones I believe.

The next step on this idea was to extend it to support more than just email addresses. It would be really good if you could just enter a twitter handle, or a facebook account name, or an OpenID address, or any of a whole set of potential identifiers from external authentication providers and then just do the correct thing. Of course, the big problem here is working out if an entered string is actually a valid account with a given provider, and if so then which one. The following I've worked out probably covers a fair few bases, but isn't totally reliable.

  • Google Account. Email address where the domain is either "gmail.com" or "googlemail.com"
  • Google Apps At My Domain. Make a request to https://www.google.com/accounts/o8/site-xrds?hd=. If you get a 400 back then it's not a valid Google Apps domain. Unfortunately this doesn't guarantee that you can authenticate the given email address with Google.
  • OpenID. These are always URLs, and you can make a request to the URL with an Accept header of "application/xrds+xml". This should give you a valid XRDS response.
  • Twitter. If the entered string starts with an "@" symbol, this might be a Twitter handle. It is possible to use the Twitter API to query a given screen name to see if it is a real Twitter account. Note that for this you need valid Twitter API credentials.
  • Facebook Accounts. It's possible to use the Facebook Graph API to determine if a given name is a valid Facebook account. Note that for this you need valid Facebook API credentials.
Once you've worked out that a given identifier is something we can handle, you can then do the appropriate magic to redirect the user through that providers authentication system to handle that part of things, and come back to your site with the fact that the user is now fully authenticated. All of the above also support getting some user details to fill out their profile if the credentials represent a new account and not an existing one.

There are a whole host of other providers that you can do similar things with too, but obviously the more providers you opt to support, the more difficult it becomes to know which provider a given identifier should go through. For example, I could opt to support Facebook AND Github accounts, but if the same identifier at Facebook and Github actually map onto two different people this causes big problems. If the only supported options are those listed above then the identifier is guaranteed to only map onto one of those providers - because the format for the identifiers are all different. Of course, one workaround here is to detect when the identifier maps onto multiple providers and prompt the user to select the provider they meant, but this makes the authentication process that little bit clunkier which is a real shame.

I'm going to try and build a proof of concept of the above ideas later on, to see how well it works in reality.

Edit: Turns out, there's one huge problem with this idea. And unfortunately it's a bit of a show stopper. OpenID is easy. Google Accounts in theory should be as easy, because they support OpenID, but actually they fall into the same problems as Facebook and Twitter - namely the fact that the APIs don't have any support for telling the authenticator which username we want to authenticate. This means that auto-detecting the authenticator based on the username is easy, but we'd end up redirecting the user to a login screen that wants them to enter those details again...

Monday 15 October 2012

REST authentication concerns

I've just read a blog post suggesting ways to do user authentication on a REST based service, and - as is far too common - they got it totally wrong. They detailed a means to generate hashes based on the message and shared secrets that only that user would know.

REST is fundamentally based on a few things. One of these is to leverage the HTTP protocol wherever possible. You already see this in the use of HTTP verbs for actions, URLs to refer to resources, status codes for error handling, amongst other things. Following this through, logically the sensible way to do user authentication is to do HTTP auth. It's clearly defined by the HTTP specifications - RFC2617 is a decent place to start here - and gives exactly what is wanted. It also gives a number of different ways to manage how user credentials are sent to the server, and you get to define your own if you really need to. What more could you ask for?

The obvious problem with it is that the standard http auth mechanisms - basic and digest - can be vulnerable to man in the middle attacks. But there's an answer to that too. You should never send anything sensitive - like authentication headers - over plain http. Use https instead. It's so simple to set up these days, and certificates can be self generated or bought for really not very much, so there's no excuse not to!

Thursday 4 October 2012

New Features for your favourite C++ Compiler

Herb Sutter has a post up on Sutters Mill asking what people would like to see added to their favourite C++ Compiler. Unsurprisingly, Conformance has so far come out at the top. This is literally nothing more than adding all of the features in the C++ Specification (C++11 mostly, but I'm sure that there are compilers out there that have missing features from older versions) that haven't been implemented yet. What I found surprising was that Performance is second place.

On modern hardware, unless you are doing realtime or embedded programming, do you really need your compiler to be able to make your code run that much faster? And by "that much faster", the entry for Performance I was to increase the speed by 5%. That's not a lot, and if you are writing native code to run on modern hardware, and that 5% speed improvement is that important then I have to wonder what you are doing.

I honestly expected that Safety would come out higher than Performance. Safety in this sense is split into two cases - STL Safety and Language Safety. STL Safety is making it so that actions on STL components are thread- and bounds-safe with minimal overhead,and Language Safety is the same but for non STL components. It's true that a lot of this is considered unnecessary because you should know what your program is doing, and it shouldn't be doing any unsafe actions, but we all know that this isn't always the case, and having compiler guards that catch these things with minimal overhead would make a lot of our programs a lot safer to run. I personally think this is a lot more important than making the programs run that little bit faster.

For the record, my last vote went to adding some more Reflection capabilities into the language. Certainly not to the level of languages like Groovy or Ruby, where you can totally re-write the interfaces of objects at runtime (That scares me), but being able to introspect types at runtime, and instantiate them by name, and things like that can be very useful on occasion and right now just can't be done in C++ without a lot of nasty hacks.

Sunday 19 August 2012

Consuming .groovy files as runtime classes

Groovy has the ability to load .groovy files at runtime as normal class files, without the need to compile them first. This is how you do the same in a Java program. It took a bit of fiddling to get this to work, but this does the trick.



What this does is to create a Groovy Classloader that is configured to automatically compile all groovy files under "/base/directory/containing/groovy/files". If you then ask this Classloader for a class that exists as a normal class, it will provide it from it's parent. If however the class is actually written in an uncompiled .groovy file in that directory, it will also provide it having compiled the .groovy file automatically.


Friday 10 August 2012

REST APIs in Node.js. Part one - Routing

So, I'm playing with node.js again. In particular, I'm playing with writing a pure REST webapp. That is, a webapp where the client page is static HTML, Javascript and CSS and this communicates to the server using a pure REST API. This has a couple of big advantages - namely that all of the state is stored in the Javascript in the current page, and that different clients - such as native Android or iOS apps - can be written against the same API.

One of the things that a lot of people miss when they come to write REST APIs is that the links between resources are important. That is, you should be able to visit any resource and discover correct links to related resources from there. You should also be able to visit the base page and discover links into other parts of the app.

Now, part of the problem here is that people tend to use IDs instead of Links when they refer to other parts of the system. For example:

Note the author object has an ID as part of it. This means that a consumer of this resource needs to know how to write a URL to the Authors resources, using this ID as part of that URL, in order to look up the authors details. This is Bad™. What happens if the client doesn't know how to write the URLs? Or if the URLs change? Or numerous other things that might happen.

A better way of doing this would be:

Note that we now include a Link in the author object as well. The consumer simply needs to follow this link and they will be rewarded with the author details. This means that if the consumer doesn't know how to navigate the resources, it doesn't matter. And if the URLs change, it doesn't matter. It all still works.

Now for the problem. The majority of REST frameworks - at least the ones I've seen - don't easily cater for this. That is, they don't make it easy to generate the URLs that you would need to use to visit another resource. Now, this isn't a Node.js problem - this is much more widespread than that. I've seen frameworks for Java, Scala, Groovy, Ruby and Node - to mention the ones I've looked at myself - that have this problem. From memory, I think there's only two frameworks I've seen that actually do make URL generation easy. I was reasonably certain that JAX-RS for Java allows you to do this, but I can't find any documentation to say how you would do this. And the Escort router for node.js allows you to do this. In fact, Escort does it in a rather nice manner in that when you register routes you can optionally give the route a name, and then later on you can look up the route by name and generate a URL from it.

What Escort doesn't (currently) do is to generate full URLs. It generates the path part of the URL, since that's all it actually knows about. This has a problem if you want to spread your routing over multiple hostnames - which is unlikely - or if your routing is not at the base of your host - which is more likely. For example, if your entire REST application was mounted under /cmnd, Escort would need to know this to be able to write the URLs correctly. However, in theory that mounting is a deployment time concern and not something the code should ever need to know about.

All in all, it's a tricky problem that has yet to be solved well. But at least there are partial solutions out there that can be used in the interim.

Wednesday 25 July 2012

C/C++ Build Systems

I've been looking a lot lately at build systems for developing C/C++ projects, and have come to a general state of despair about them.

I should say first that I am a professional Java developer, with some more recent experience with Groovy and Scala running on the JVM, and that some of this is likely coloured by that experience. In the Java/Groovy/Scala world there are a number of build systems that can be chosen from, chiefly Ant, Maven, Gradle, and SBT as the ones I've used. I've used Maven a whole lot, Ant a decent amount, and Gradle and SBT are both very new to me still. Gradle in particular I really like because of one overriding thing - it makes the simple case very very simple.

If you then go to look at the C/C++ ecosystem it's a very different story. Choices here that I've looked at include Make, Autotools, CMake, SCons, Waf, and Boost.Build. I'm very aware that this is an incomplete look,but it seems to be the major ones (Not counting Visual Studio, which isn't so much a build system as an entire environment that will also build your projects, but loses out on the fact that it's Windows only). Out of these build systems, the first thing to notice is that they all fail to make the simple case even remotely simple. In the case of SCons and Waf, which a lot of people champion as being very powerful and flexible, you need to first learn Python before you can use them because the build script is actually a python script! That - in my mind - is verging on insanity. The few examples of real projects using Waf and SCons have had build scripts over 1,000 lines long!

What I really want from a C/C++ build system isn't - in my opinion - all that difficult.
  • Make the simple case simple. The simplest case is no dependencies, and just compile every single source file in the project together into one binary (Executable or Library as appropriate). I see no reason why a build system couldn't achieve this off of at most a couple of lines of configuration.
  • Make simple dependency management simple. A lot of libraries out there make use of pkg-config. A lot of the ones that don't are the same library and path names across platforms anyway. I should be able to add a dependency on "zlib" - which is in my pkg-config list - as easily as just stating that I now depend on zlib, optionally with version requirements.
  • Make multi-module projects simple. I should be able to have a multi-module project - that is a project that builds more than one output, all of which work together - as simply as specifying the outputs - most likely as separate subdirectories with their own source trees under them - and have it just work. When I do this, dependencies between the different modules should still be simple.
Unless I'm missing something, none of this seems like rocket science.

Sunday 24 June 2012

Scala and Spring MVC - JSON Mapping

So, after my earlier post I've been playing with a little test rig to do RESTful requests in Scala using the Spring MVC framework. After a few issues I've got something that I actually quite like, and it was relatively painless...

The biggest issue I had was that the Jackson JSON mapping - which Spring uses by default - doesn't work well with Scala types. Case classes it outright failed on, and Scala Maps it did weird things with. Everything else was ok though. Obviously this isn't ideal though. As such, I've put together a Spring HttpMessageConverter implementation that uses the Jerkson JSON mapper - which is based on Jackson but adds Scala support into the mix. I've even written the whole thing in Scala, and was quite impressed with how (relatively) painless it was...

So what does it look like? Well - this...
1:  /**
2:   * Implementation of the Spring HttpMessageConverter to convert between Scala types and JSON using the Jerkson mapper
3:   * @tparam T The generic type of the object to convert
4:   */
5:  class MappingJerksonHttpMessageConverter[T] extends HttpMessageConverter[T] {
6:   /**The Logger to use */
7:   val LOG = Logger[this.type]
8:  
9:   /**The media types that we are able to convert */
10:   val mediaTypes = List(MediaType.APPLICATION_JSON)
11:  
12:   /**
13:    * Helper to see if we support the given media type
14:    * @param mediaType the media type to support. If None here then we return True because of a quirk in Spring
15:    * @return True if we support the media type. False if not
16:    */
17:   def isSupported(mediaType: Option[MediaType]) = {
18:    LOG.debug("Comparing requested media type " + mediaType + " against supported list")
19:    mediaType match {
20:     case Some(mediaType) => mediaTypes.contains(mediaType)
21:     case None => true
22:    }
23:   }
24:  
25:   /**
26:    * Determine if we are able to read the requested type
27:    * @param clazz the class type to read
28:    * @param mediaType the media type to read
29:    * @return True if we support the media type. False if not
30:    */
31:   def canRead(clazz: Class[_], mediaType: MediaType) = isSupported(Option(mediaType))
32:  
33:   /**
34:    * Determine if we are able to write the requested type
35:    * @param clazz the class type to write
36:    * @param mediaType the media type to write
37:    * @return True if we support the media type. False if not
38:    */
39:   def canWrite(clazz: Class[_], mediaType: MediaType) = isSupported(Option(mediaType))
40:  
41:   /**
42:    * Get the supported media types
43:    * @return the supported media types, as a Java List
44:    */
45:   def getSupportedMediaTypes = java.util.Arrays.asList(mediaTypes.toArray: _*)
46:  
47:   /**
48:    * Actually attempt to read the data in the input message into the appropriate data type
49:    * @param clazz the class that we are reading into
50:    * @param inputMessage the input message containing the data
51:    * @return the unmarshalled object
52:    */
53:   def read(clazz: Class[_ <: T], inputMessage: HttpInputMessage) = {
54:    LOG.debug("About to read message")
55:    Json.parse[T](inputMessage.getBody)(Manifest.classType(clazz))
56:   }
57:  
58:   /**
59:    * Actually attempt to write the data to the output message
60:    * @param t the value that is to be written
61:    * @param contentType the media type to write as
62:    * @param outputMessage the output message to write to
63:    */
64:   def write(t: T, contentType: MediaType, outputMessage: HttpOutputMessage) {
65:    LOG.debug("About to write message")
66:    val jsonString = Json.generate(t)
67:    val writer = new OutputStreamWriter(outputMessage.getBody)
68:    writer.write(jsonString)
69:    writer.flush()
70:    writer.close()
71:   }
72:  }
73:    

Saturday 23 June 2012

Scala and REST

I've been looking into Scala and REST recently, because I really like the idea of Scala - though to date I've not actually got on all that well with it in practice - and I really like the idea of writing webapps using the REST principles and Javascript in the client to pull it all together.

As such, I've been looking at the various frameworks that exist for writing webapps, and in particular ones that have REST support for Scala. And have come to the conclusion that the answer isn't that great. I must confess that I am rather spoilt here in that I've done a lot of work with Java frameworks (Mainly Spring) that do provide all the things I'm looking for, and I think the Scala landscape is just too new to have full featured support yet.

Frameworks

The frameworks I've looked at, and my thoughts of them are:

Play 2.0

Seems rather nice on the outside, and there's an awful lot of good press about it. However, it's one of those frameworks that seems lacking on certain features that would just make it. It isn't possible to build a WAR file to run in a standard container using Play 2.0 (Apparently that was possible in 1.0, so for some reason that's been taken out of the newer version!). It also doesn't have fantastic support for JSON marshalling. It has built in support, but only if you specify the actual mappings yourself. What I'd really prefer is to be able to just give it a class (Even better, a case class) and have it get on with it. To be fair, when you are writing JSON objects this is rather trivial, using the Json.generate method, but when you are consuming JSON this seems to be very difficult. I managed to actually achieve it, but not in a fantastic way...

Lift

Seems to be a much more fully features framework than Play, with lots of support for other nice things - the documents mention JTA and JSR 303 Validation for example. Lift also supports being packaged into a standard WAR file, and is I think the only one that I've looked at that does support this! It falls down on JSON support again though, in that it will natively consume and produce JSON using the underlying raw JSON objects - the same as Play - and it can be made to produce JSON from your own objects but doesn't seem to have any easy way to consume JSON into your own objects.

Spray

Spray I need to play with some more. I thought it was lacking in a few areas but I've just come across hints in the documentation that those areas actually are supported. Specifically all the examples I had seen imply that you need to run your own web server in the spray code, but the documents just mentioned that you can run it in a Servlet container, and tell you to go and look at the examples for how to do that.

It does still look like it expects you to do a fair bit that you shouldn't have to - one of the examples has you wrapping your controller in a handler to automatically handle GZip encoded data! Something that shouldn't need to be handled by the controllers at all, but instead by the underlying web server.

The documentation claims that it has proper JSON support too - allowing marshalling from any type and unmarshalling to any type, instead of getting raw JSON objects back that you need to work with yourself. Unfortunately the wiki pages telling you how to do those aren't written yet, so that will mean more example diving.

Blue Eyes

Blue Eyes is an intriguing one, in that it is designed to only do RESTful web services and nothing else, and has a lot of support for various things like versioning of services, path handling, and apparently JSON support (Though this is totally undocumented it seems). It even seems to have built in support for MongoDB - though why a web framework needs built in support for a data store is beyond me. It does however require you to write your own web server first, and even goes so far as to give you lots of tools to do that - including the ability to do request logging and health monitoring of your services. All things that if I ran it in my own container I wouldn't have to do!

Pinky

Pinky does support Servlet containers, but in a bizarre way. You don't write Servlets, or anything that gets called by a Servlet. Instead you write code that all gets called by the Google Guice webapp filter. It does however let you write what it calls Controllers and Servlets (seemingly interchangeably). It also claims to have out of the box support for numerous different data formats, including JSON, but is very low on documentation for those. The actual examples just feel a bit weird too - like a mash up between Java and Scala. (This is the first time I've come across pure Scala code that uses annotations, but I suspect that's because it's built on top of Guice)

Others

I'm well aware there are plenty of other frameworks to choose from - the ones I've seen mentioned include Bowler, Xitrum, Unfiltered, Finangle, Scalatra, and Play Mini, but there are loads more too that I can't name off the top of my head. However, I'm only human and looking through all of these frameworks for the One that fits what I want takes a very long time. I may get to look at some of the others in the future, and write up my thoughts on them, but to date I've had too brief a look to say anything meaningful.

Summary

When I started writing this, my conclusion was going to be that I'm going to go back to Spring MVC but writing all my code in Scala. That means the slightly weird situation of having a mixture of Java and Scala in the controllers themselves - like I just described for Pinky - but hopefully allows me to play with pure Scala code below that, and it also gives me the full power of the Spring framework to leverage, including automatic marshalling/unmarshalling of data that so many of the above frameworks lack on, and running inside a standard Servlet container.

However, my actual conclusion is a bit less solid. I think I need to look closer at Spray and Lift before I outright drop them, and maybe even Play though I have spent a fair bit of time with that and not gotten very far...


Friday 22 June 2012

New Blog

I guess its traditional to start a new blog with a post about why you're starting a new blog. Kinda circular reasoning, but ah well...

Difficult as it may be to believe, I'm starting this blog basically as an outlet for my rants, mostly about coding or related concerns. It seems I rant about these things a fair bit, and most people I know in real life don't really care for these things quite to the same degree that I do, so by ranting here instead (ok, likely as well) it can actually reach a wider audience if people who care. Plus there's the (very vague) chance there may be things in here of use to people...

I should probably also point out that this is the blog of the Ranting Coder. Not the blog where I exclusively rant about coding. Mostly, but odds are there will be bits and pieces of other stuff as well...

So - Enjoy...