Monday, 25 February 2008

The time has come

I've been working on an Erlang project at home for a while, and it has reached the first public version! Here it is http://www.dayfindr.com. It's a simple utility that can make your life easier.





It ticks some of the boxes that Paul Graham related recently in Six Principles for Making things:

(a) simple solutions I hope so. This was one of the big design principles - to make it as simple as possible. That's why, for example, it leverages email instead of using some fancy social networking/address book features. There's also no login, since your email address already identifies you. TripIt uses the same idea
(b) to overlooked problems nothing that I know of addresses it
(c) that actually need to be solved, and This tries to help with a real problem that I've encountered myself. Feedback on the idea has been quite positive, so I know it's something that people will find useful!
(d) deliver them as informally as possible Hope so
(e) starting with a very crude version 1, then Relatively crude
(f) iterating rapidly. We'll see!

It's implemented on a Lyme stack (Linux, Yaws, Mnesia and Erlang). This project has generated some interesting questions, that I will relate in a series of posts. For example:

Why Erlang?
Why not ErlyWeb?
Why not Rails?
Why did I abandon the fully functional Lisp version?

So, use it, it's free. If you like it (or more importantly if you don't), lemme know at feedback@dayfindr.com

P.S. To the Internet Explorer users: You will notice that there's a javascript error on the page, because some function is "not implemented", which means you'll see check boxes and not coloured boxes in the calendar. The current versions of Opera, Firefox and Safari have no problem with it. Maybe you should consider an upgrade...

Friday, 22 February 2008

Lyme vs Lamp IV




The first Lyme vs Lamp comparison is here!

Let's recap. I have created a single web page, in Lyme and Lamp, based on database queries. The query is from a single table with 1000 "blog" entries, with Id, Timestamp, Title and Content fields. The web page displays the last 10 entries in the table.

I use Tsung to generate users at a progressive rate, starting from 2 users per second up to 1000 users per second:



Subjecting each web application stack to this test, they compare as follows:


But what does it mean?

It looks like Lyme and Lamp perform the same up until 240 seconds into the simulations, which is where we step up from 66 to 100 requests per second. At about that point, Lyme starts to handle less transactions per second, and the mean transaction time increases dramatically.

Where's the bottleneck? Good question. I suspect it's in the database query, but this is just a shot in the dark. Why could it be? Well, in the PHP version, it's very simple to retrieve the last 10 entries:

SELECT * FROM blog_entries ORDER BY `ID` DESC LIMIT 0 , 10


Whereas with Mnesia, I'm not aware of a better way to retrieve the last 10 rows, other than to iterate backwards from the end of the table:


last_N_entries(_Key, 0, Acc) ->
lists:reverse(Acc);
last_N_entries(Key, N, Acc) ->
{atomic, {Entry, NextKey}} =
mnesia:transaction(
fun() ->
[Entry] = mnesia:read({blog_entry, Key}),
NextKey = mnesia:prev(blog_entry, Key),
{Entry, NextKey}
end),
last_N_entries(NextKey, N-1, [Entry|Acc]).



last_N_entries(N) ->
{atomic, LastKey} = mnesia:transaction(fun() -> mnesia:last(blog_entry) end),
last_N_entries(LastKey, N, []).


I might try some different approaches to the Mnesia query (using ram copies instead of disk copies for example), but I'd rather put some effort into making the experiment a bit more focussed. I'd like to especially see what happens when multiple cores are thrown at the problem, but more importantly, what happens when the transactions become concurrent.

I have now realised that this is scenario is essentially a sequential load test, which will not show up Lyme's strengths. The pages render rather quickly, so there is no concurrent page rendering. Thus, I think the next step should be to create another scenario that will create concurrent page requests...

Tuesday, 19 February 2008

Lyme vs Lamp III



Lyme vs Lamp continues...

I've been spending some time figuring out how Tsung outputs the data, how gnuplot works and how to create my own graphs using the Tsung output data. I now have a graph that shows the throughput rate (Kbits per second) and the arrival rate of users.

When you set up the testing scenarios using Tsung, you can specify the arrival rate of new users. I've set up a simple progression, that changes every 60 seconds, going from 2 users per second up to 1000 users per second. You can see the progression in the graph below.

The page that is being requested is generated from an Mnesia database, with Yaws. The database contains 1000 made-up blog "postings", and the page requested renders the last 5 postings. The machine is quite old, a 2004 laptop actually, and Tsung is also running on the same machine as Lyme.

I've plotted the data throughput rate against the arrival rate. (I've used Inkscape to polish the gnuplot SVG output. I really love it and use it all the time). Here's the result:


As you can see, the throughput rate increases proportionally to the user arrival rate. The "server" manager to handle 200 users/sec, but with 500 requests per second the increase in rate is not proportional. Looks like the max has been reached. Increasing the request rate to 1000 requests shows no difference in throughput. Please comment if you have any more insight into the data.

I should be able to get a PHP version soon, so I can compare it against something...

Thursday, 14 February 2008

Lyme vs Lamp II - The first graph arrives

The first graph of the Lyme vs Lamp debate has arrived!




What does it mean? It means that I've got the LYME stack working with a prototype mnesia-backed web page, and Tsung is doing something. But that's about it for now, the actual results are almost irrelevant...

Lyme vs Lamp I


As part of a presentation at SPA2008 that I'm involved in, I'm doing a bit of load testing on Lamp and Lyme. LAMP is Linux + Apache + MySql + PHP, and LYME is Linux + Yaws + Mnesia + Erlang. (Mnesia is the Erlang database, Yaws is the Erlang web server).

Interestingly, it would seem that in community there is now a precedent to name erlang projects after diseases/conditions, since yaws and lyme are both diseases. AND mnesia used to be called amnesia. Maybe I'll develop a killer app in Erlang called ankylosing spondylitis. Just kidding.

So, there is a rather well-known comparison of Apache and Yaws, but I'd like to go a step further. I'd like to know how a complete web application stack performs under load testing. As an initial comparison, I'll do Lyme and Lamp, and then move on to some others. I would like to have 3 scenarios, a static page only, a dynamic page and a dynamic page with a database backend.

Exactly how to construct these scenarios are still unclear, so suggestions are welcome.

Why test? Why yet another performance benchmark?
- Because it's easy. It's much easier to test something (relatively) objectively, and then wave the results in the air to prove your point. It's much harder to debate the merits of technical choices in the real world, where we have constraints such as budget, skills availability, culture, vested interests etc.
- It creates conversation. Which is a good thing. No flame wars please.

I'll be using Ubuntu 7.10 and Tsung and will publish everything, including configuration file, source files etc. etc.

Updates:
Lyme vs Lamp II - The first graph arrives
Lyme vs Lamp III
Lyme vs Lamp IV

Monday, 11 February 2008

The Visitor Pattern eliminates enums from the Observer Pattern

To avoid boolean parameters, every now and then in a moment of weakness I do something like this:


public void addFile(File file, InputOutput inputOrOutput) {
switch(inputOrOutput) {
case InputOrOutput.INPUT: ... break;
case InputOrOutput.OUTPUT: ... break;
default: assert false;
}
}


Which is not a very good solution. There are at least two problems:

  • I have to maintain an enum

  • There's a possibility that the parameter is null, which creates an opportunity for an error condition


I can achieve the same result by using two functions:

public void addInputFile(File file) {
...
}

public void addOutputFile(File file) {
...
}


This solves the two problems and and also satisfies the readability requirement.

How does this relate to Observers? When I use the Java Observer interface and Observable class, I usually end up with a class defining the chunk of data that describes the change. This chunk is the delta of the object state from the last notification. And a object of this class is received by the observers.

This delta class can take this form:


public class ModelObservedData {

enum Type { ADDED, REMOVED };

// Members that contain the data
...

public ModelObservedData (Type type, Data data) {
...
}

public Type getType() {
...
}

public Data getData() {
...
}

}

and then in the observer

...

update(Observable o, Object arg) {
// Assert on observable source and type or use an if when observing
// multiple sources

ModelObservedData a data = (ModelObservedData)arg;
switch(data.getType()) {
case ADDED: doAdded(data.getData()) break;
case REMOVED: doRemoved(data.getData()) break;
default: assert false;
}

}


I don't find this very elegant. Now, considering the whole multiple dispatch and visitor pattern that I wrote about, there is a more elegant solution.

Essentially, you visit the observed data instead of getting the type of the delta and switching your implementation on the type. The observed data now takes this form:


class ModelObservedData {

...

public static interface IVisitor {

void dataAdded(Data data);
void dataRemoved(Data data);

}

public void accept(IVisitor visitor) {
if (...) {
visitor.dataAdded(addedData);
} else if (...) {
visitor.dataRemoved(removedData);
}
}
}


The client implements the IVisitor interface and the update with the switch becomes an acceptance.


...

update(Observable o, Object arg) {
// Assert on observable source and type or ifs when observing
// multiple sources
ModelObservedData data = (ModelObservedData)arg;
data.accept(this);
}

void dataAdded(Data data) {
...
}

void dataRemoved(Data data) {
...
}


This mechanism doesn't have the switch in the observer, there's no more enum, and the observed data can better encapsulate the state that defines the delta.

The last bit would be to eliminate constructors of the observed data that are difficult to read. For example, you would have one constructor that receives two data parameters, the added and removed data, and stores them as fields. But then the client needs to know the ordering, and have null values in the constructor. Instead, make that constructor private, and create two factory methods that describe the construction process better:


class ModelObservedData {

Data added;
Data removed;

public static ModelObservedData createAdded(Data data)
return new ModelObservedData (data, null /* removed */);
}


public static ModelObservedData createRemoved(Data data) {
return new ModelObservedData (null /* added */, data);
}

private ModelObservedData (Data dataAdded, Date dataRemoved) {
this.dataAdded = dataAdded;
this.dataRemoved = dataRemoved;
}
}


You now have a more elegant solution (I think), with less possibility for error states and no enum to maintain. Also, the switching that each client would have to implement is now implemented only once in the observed class, so there is less duplication.