Monday, July 30, 2007

Dynamic generation of javascript code

One of the biggest problems for every developer/software architect is the level of generalization/specialization of their code.
If the code is more coupled to the problem , it solves the problem better, but when the problem changes it is harder to change the code. And the oposite .. when the code is more abstract it is easier to change it, but then (generally speaking) the solution is not so good (as it could be in first case).

I am not going to write about the right level of abstraction, because I don’t know it :)
Instead, I am going to write about javascript and some of its interesting features.

Assume that we have the following problem:
We need to write a function that calculates the sum of first n integers.
The first solution that would probably come to everyone’s mind would be the classic for loop that loops from 1 to n and adds value of the counter to some variable..
It could be written as:

function getSum1(n)
{
var s = 0;
for(var i = 1; i <= n; i++)
s += i;
return s;
}


This is, of course, totally correct solution, but is it the best one?
Well, it solves the problem.. and is flexible enough to calculate the sum for every given integer.. so, it probably is the best.
But!
What if we know the number that will be passed to the function before it is called?
If so, it would be better if we wrote the function like this:

function getSum2()
{
return 1+2+3+4+5+6+7+8+9+10;
}

It would be much faster than the first one.
But, as the beginning of this post says.. it is not flexible. It solves the problem, but it is very coupled to it, so when the problem changes (the number n) it is useless.
We have 2 solutions. Which is better? Well, it depends on the fact how many times is function going to be called with the same parameter, and the importance of the execution speed.

I wrote some tests that can be found here.
There are functions that calculate sum of the first 100 integers, and are called 100000 times.

First one calls the function with the classic for loop (getSum1):

var howManyTimes = 100000;
timeStart = new Date();
for (var i = 0; i<howManyTimes; i++)
{
result = getSum1(100);
}
timeStop = new Date();

It then prints duration and the result.

The second test calls the function that generates the code for getSum2 function. After generating , the code is evaluated using the eval function (so the new function is created dynamically):

function createFunction1(val,functName)
{
var code = "function "+functName+"\n{ \n return 0";
for (var i=1; i<=val;i++)
code+="+"+i;
code += "; \n }";
return code;
}

timeStart = new Date();
var code = createFunction1(100,"getSum2()");
eval(code);

for (var i = 0; i<howManyTimes; i++)
{
result = getSum2();
}
timeStop = new Date();

After that it also prints duration and the result.
And finally the third test generates the function using javascript Function object.

function createFunction2(val)
{
var code = " return 0";
for (var i=1; i<=val;i++)
code+="+"+i;
code += ";";
return code;
}

timeStart = new Date();
var code = createFunction2(100);
var getSum3 = new Function(code);

for (var i = 0; i<howManyTimes; i++)
{
result = getSum3();
}
timeStop = new Date();

You can see the results of the tests if you run the test file in your browser.
My average results are:

In firefox: getSum1 ~ 4900ms
getSum2 ~ 550ms
getSum3 ~ 450ms
In IE6: getSum1 ~ 4800ms
getSum2 ~ 840ms
getSum3 ~ 740ms


After all this you can say… ok, this is fine, but.. is there any chance that we can use this in real world, in something more complex than the sum of n numbers?
Well, there might be.
Some of today’s most popular java and .NET frameworks (Spring, Spring.NET) use dynamic code generation in their AOP libraries.

I think that in near future some applications could have "smart execution controllers" that would know whether to call an abstract code or to generate concrete code and call it. Especially in applications where performance (speed) is bottle neck.

6 comments:

Anonymous said...

The performance gain is a tradeoff for readabiity and maintenability.

Unrolling loops is what we did in demo programming 25 years ago with 6502 and later in 68k assembler. Unrolling everything to boost performance and remove the loop condition and jump.
Assembler was wonderful for code generation.

Nice to see the techniques to survive so many years.

Peace
-stephan

--
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org
Blog at http://stephan.reposita.org - No signal. No noise.

Damjan said...

Hi Stephan,
You're totally right about the tredeoff. It will always be there..

Speaking of assemblies.. I was playing with .NET System.Reflection.Emit namespace recently. It has everything you need for generating IL code (.NET assemblies) in runtime. So, many things you've been doing with "real" assembler can be done using IL Code Generator.

I am glad that someone who was programming before I was born wrote a comment :)

Regards,
Damjan.

Anonymous said...

Oh, I'm not that old, I started around 1980 to code when I was 8 :-)

Peace
-stephan

--
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org
Blog at http://stephan.reposita.org - No signal. No noise.

Anonymous said...

This is interesting damjan !

I would add also that precalculating some needed results in a database can even be faster than unrolling loops.

Databases can store an impressive amount of data, so you'd need to calculate them once, and then just retrieve them from the database.

But yeah it doesn't apply to Javascript ! ^^

Damjan said...

Well, any part of the system is a potential place for some sort of optimization.. cacheing of data or doing something with code (unrolling loops :-) ), etc.

Your point is true.
Databases are the slowest part of many systems, so any type of optimization there is more than welcome..

We had the situation on my last project that the query that was executed very often was very,very slow because of complex datamodel..
Creating just one "helper" table and writing some trigers that populate it made application a couple of times faster.

At the end.. programming is very easy.. it is all about a good level of abstraction..
But finding it out is.. well, not so easy :)

Anonymous said...

All true, but generally best not to optimise unless you need to, it makes the code so hard to understand, and coding is hard enough already.

Also, often a better algorithm is more effective than clever code, so for the first n integers, use sum = (n+1)*n/2, this would probably be much quicker than either of your solutions. So sum of first billion integers is 500000000500000000 (did that in my head in 5 seconds, serial addition would have taken longer ;-)