This site uses 3rd-party cookies for targeted advertising. If you do not agree to this, then please leave this site now. Otherwise please click on "Ok" to continue.
Ok
 
 

Benchmarking C, Java, JavaScript and FreePascal

15-Jan-2015

tl;dr - C was fastest and JavaScript slowest. But the performance difference was a lot smaller than I expected.

Why to benchmark these languages

During most of my time as a software-developer I have been using Pascal as my primary language. Mostly using Delphi on Windows. About a year ago I switched to FreePascal on Linux.

But I have also done some programming in C and have recently started using JavaScript, mostly on the server-side with Node.js.

So why would someone who is used to languages which compile directly to native code, start using JavaScript? After all, people who have been around the IT-industry for a decade or so will remember a time when JavaScript was interpreted and was thus, very slow.

But times have changed, and JavaScript is just-in-time compiled to native code nowadays. So it's worth taking a look at how these languages compare performance-wise today.

I wanted to see if JavaScript's performance would make it usable for what I intend to do with it. Which is some pretty low-level stuff. And the answer is: Yes

How I measured performance

For this benchmark I have implemented a very simple algorithm, the Sieve of Eratosthenes, in C, FreePascal, JavaScript and Java. Although I personally have no intention of using Java, I included it, so that I would have another language in this test which uses a JIT-compiler.

Sieve of Eratosthenes is a simple algorithm for finding prime-numbers. In this test I used it to find the prime-numbers between 2 and 100,000. This is then repeated 10,000 times so that each complete test takes a few seconds to complete. You can find the actual code at the end of this article.

I used GCC 4.8.3 for C, FreePascal 2.6.4, Node.js 0.10.32 and javac 1.8.0_40 for this benchmark. I was running the tests on an Intel i7-4771 3,5GHz using a 64-bit OpenSuse 13.2. With GCC I used "gcc -O3" and with FreePascal I used "fpc -O3" too, to get the best optimized compiler output.

For each language I ran the test 7 times and then used the median time as my result.

Here are the results:

Time (seconds)
C3.137
Java3.543
FreePascal3.628
JavaScript4.231

Although JavaScript takes about 35% longer than C, the difference was smaller than I expected. After all this is a fairly low-level benchmark, the kind of code a language C is basically made for. The fact that JavaScript runs at this kind of speed is quite amazing in my opinion.

And let's not forget that computers are getting faster all the time. In a year from now a brandnew computer running JavaScript will probably be faster than today's computer running C.

JavaScript is not JavaScript

But there is more to it. I actually tested not one, but three different versions of JavaScript-code. The result above is for the fastest one.

One version used a regular Array, one used a Typed-Array and one used a Buffer. And the performance-difference between the fastest and the slowest was BIG.

People who have used Node.js may already be able to guess which was fastest and which was slowest. The version using a regular Array took 9x as long as the one using a Buffer. Typed-Array came in close to the speed of using a Buffer:

Time (seconds)
Buffer4.231
Typed-Array4.911
Regular Array38.803

What surprised me here is the difference between Typed-Array and Buffer. I don't know why this is. Maybe someone with more knowledge about Node.js can explain this.

The actual code used

I will only show the code of the actual Sieve of Eratosthenes. To keep it as short as possible I will leave-out the code that measures elapsed time and the loop that calls the benchmark 10,000 times.

C:


void Sieve(int maxNum) {
    int i, j;
    char *Data;

    Data = (char *) malloc(maxNum + 1);
    memset(Data, 1, maxNum+1);

    for (i=2; i<=maxNum; i++) {
        if (Data[i]) {
            for (j=i+i; j<=maxNum; j+=i) {
                Data[j]=0;
            }
        }
    }

    free(Data);
}
FreePascal:


procedure Sieve(maxNum: int32);
var
    Data: array of boolean;
    i,j: int32;
begin
    SetLength( Data, maxNum+1 );
    FillChar( Data[0], maxNum+1, true );

    for i := 2 to maxNum do
    begin
        if Data[i] then
        begin
            j := i+i;
            while j<=maxNum do
            begin
		Data[j] := false;
                Inc( j, i );
            end;
        end;
    end;
end;
Java:


private static void SieveJava(int maxNum) {
    int i, j;
    boolean[] Data = new boolean[maxNum+1];

    for (i=2; i<=maxNum; i++) {
        Data[i] = true;
    }

    for (i=2; i<=maxNum; i++) {
        if (Data[i]) {
            for (j=i+i; j<=maxNum; j+=i) {
                Data[j]= false;
            }
        }
    }
}
JavaScript:


function SieveRegularArray(maxNum) {
    var i;
    var j;

    var Data = new Array();
    Data.length = maxNum + 1;
    for (i=2; i<=maxNum; i++) {
        Data[i] = true;
    }

    for (i=2; i<=maxNum; i++) {
        if (Data[i]) {
            for (j=i+i; j<=maxNum; j+=i) {
                Data[j] = false;
            }
        }
    }
}



function SieveBuffer(maxNum) {
    var i;
    var j;

    var Data = new Buffer(maxNum + 1);
    Data.fill(1);

    for (i=2; i<=maxNum; i++) {
	if (Data[i]) {
            for (j=i+i; j<=maxNum; j+=i) {
		Data[j] = 0;
            }
        }
    }
}



function SieveTypedArray(maxNum) {
    var i;
    var j;

    var Data = new Uint8Array(maxNum + 1);
    for (i=2; i<=maxNum; i++) {
        Data[i] = 1;
    }

    for (i=2; i<=maxNum; i++) {
        if (Data[i]) {
            for (j=i+i; j<=maxNum; j+=i) {
                Data[j] = 0;
            }
        }
    }
}

Comments:

Author: (Unknown)
2015-01-17 12:31 UTC
 

Hey, have you checked the disassembly of the C code? I think that it can be optimised somewhat! Try using ints instead of chars because of the overhead on read and write operations. This also restricts the use of some SSE instructions on unaligned data.

@indiecoderdanny

Author: Michael Schöbel
2015-01-17 12:58 UTC
 

Thanks for your suggestion. I just tested that. Using "int" instead of "char" turned out to take about 40% more time.

"int" takes up 4x as much memory as "char", so there had to be more data written and read to/from RAM. That's probably what slowed it down.

The test as it is causes a total of about 3.5 billion writes. So for char that means writing 3.5gb to memory (or to cache), but for int it means writing 14gb.

The added memory-usage makes memory-caching less effective. L1 cache-size on an i7-4771 is 256kb. Using char the entire array fits into L1, probably eliminating most of the writes. But with int the now 400kb array exceeds L1 cache-size. It's still small enough to fit into L2-cache, but certainly there has to be some slowdown because of that.

Author: (Unknown)
2015-01-17 13:35 UTC
 

How does the direct translation of the C code to Pascal fare? Should be faster than your version, seeing as your code initializes the Data array twice. (Once by SetLength, once by FillChar).

I mean something like the following:

procedure Sieve(maxNum: NativeInt);
var
  i, j: Integer;
  Data: PByte;
begin
  GetMem(Data, maxNum + 1);
  FillChar(Data^, maxNum + 1, 1);

for i := 2 to maxNum do begin if Data[i] <> 0 then begin j := i + i; while j <= maxNum do begin Data[j] := 0; j := j + i; end; end; end;

FreeMem(Data); end;

-mrk

Author: Michael Schöbel
2015-01-17 14:04 UTC
 

That turned out to be about 0.3 seconds slower at first. But when I changed "j:=j+i" back to "Inc(j,i)" it turned out to be marginally faster than my version.

Median time of 3.580s instead of 3.628s.

Note that for simple data-types SetLength does NOT initialize the data. It only reserves the memory, but leaves it untouched.

SetLength only initializes Strings or data-types that have to be initialized to work properly, like Variants.

My guess is that the slight performance increase was caused by SetLength doing some additional housekeeping stuff. Probably releasing whatever little memory got reserved automatically when entering the procedure.

Author: (Unknown)
2015-01-17 14:43 UTC
 

I dont know if it will save time, but in Java, boolean[] arrays are set to false by default. So in the loop, set Data[j] to true.

And then if you want to print the primes show the false value.

(Sorry for my english, still learning)

Author: Michael Schöbel
2015-01-17 15:08 UTC
 

Thanks. My knowledge of Java is very small. Just barely enough to put together that little test.

I tried your suggestion and to my surprise it didn't make any difference. I'm guessing some compiler-optimization improved my inefficient code automatically.

Author: (Unknown)
2015-01-17 15:20 UTC
 

I think it's because maxnum is about 100.000, so init the array mean 100.000 operations and thats not that much compared to the 1.5 billion.

Author: (Unknown)
2015-01-17 15:27 UTC
 

Did you check what part of the execution time is spent in the memory allocation? Maybe you ended up benchmarking this allocation?

I'd guess that all of the languages use malloc() after all so you could remove it from the benchmark by preallocating an array once.

Author: Michael Schöbel
2015-01-17 16:15 UTC
 

I just tried in C to move *Data outside the function and allocate the memory just before the loop that calls Sieve(), so the memory only gets allocated once.

And this ended up to be about 10% slower than the original.

I had to look at the assembler output of GCC to see why. When *Data is declared inside the function, it is stored in a CPU-register. GCC knows that only this function has access to it and is therefor able to store it in a register.

But if *Data is global, then it gets reloaded from memory (actually probably from L1 cache) each time it is accessed.

And I think this is how it needs to be. For a global variable GCC has to assume that *Data may be modified by something outside the scope of the inner function. Maybe by another thread or whatever. So storing it in a register in Sieve() is not possible in that case.

That was an interesting scenario. I hadn't thought about this before. I've learned something new. :)

Author: (Unknown)
2015-01-17 16:21 UTC
 

Hi,

I made a few alterations to the C code which improved it by just over a second on my machine (my setup is very similar to yours, except i7 4770k @ 3.5ghz). Allocating slightly more memory and unrolling the loop a little bit seemed to save quite a lot. I tried ints too and you were correct, it is slower :(

char* Sieve(int maxNum) {
	int i, j;

maxNum = (maxNum + 3) & ~3;

char* Data = new char[maxNum + 1]; memset(Data, 0x01, sizeof(char)*(maxNum+1));

for (i = 2; i <= maxNum / 4;) { if (Data[i]) { for (j = i * 2; j <= maxNum; j += i) { Data[j] = 0; } } i++;

if (Data[i]) { for (j = i * 2; j <= maxNum; j += i) { Data[j] = 0; } } i++; if (Data[i]) { for (j = i * 2; j <= maxNum; j += i) { Data[j] = 0; } } i++; if (Data[i]) { for (j = i * 2; j <= maxNum; j += i) { Data[j] = 0; } } i++; } return Data; }

@indiecoderdanny

Author: (Unknown)
2015-01-17 16:22 UTC
 

I benchmarked the memory allocation time in the C example. The benchmark improved by about 200ms on my system, from 3100ms to 2900ms. It was compiled on MSVC 2013 with full optimizations.

Here's the modified function:

void benchAfter(int maxNum, char* data)
{
    memset(data, 1, maxNum+1);

for (int i = 2; i <= maxNum; ++i) { if (data[i]) { for (int j = i + i; j <= maxNum; j += i) { data[j] = 0; } } } }

Here's how it's called:

const int ITERATIONS = 10000;
const int MAX_NUM = 100000;

char* data = new char[MAX_NUM + 1]; for (int i = 0; i != ITERATIONS; ++i) { benchAfter(MAX_NUM, data); } delete data;

Author: Michael Schöbel
2015-01-17 16:37 UTC
 

@indiecoderdanny

I can confirm that. Your version is almost a second faster.

Author: Michael Schöbel
2015-01-17 16:46 UTC
 

And in reference to two comments above...

Once I passed Data as a parameter instead of just accessing a global variable, the performance-hit was gone.

Performance-gain compared to the original version was only about 50ms for me though.

Author: (Unknown)
2015-01-18 02:25 UTC
 

I just don't understand why you use so large memory size. I'm afraid memory access and cache miss take up most of exec time in you example.

Author: (Unknown)
2015-01-18 10:11 UTC
 

^ There is a mistake in my code posted above, it should read for (i = 2; i <= maxNum;) for the outer loop - which caused it to do less work. Weird that my unit test did not assert when it was doing a quarter of what it should have been! The savings were not as large as previously mentioned, although slightly faster not as significant as a second.

@indiecoderdanny

Author: Michael Schöbel
2015-01-18 10:25 UTC
 

100kb still fits into L1 cache on my CPU.

And given that I saw clearly differing times, the memory-access alone couldn't be that big of a factor.

And as memory was the same for each of the tests, and that the point was to compare how different languages handle the code, I think it worked out well enough.

Author: Michael Schöbel
2015-01-18 10:28 UTC
 

@indiecoderdanny

Damn, I missed that one too. After correcting it, performance is only marginally faster than my original.

Anyway, the point of the exercise was not to hand-optimize the hell out of it, but to compare how different languages perform with basically the same code.

But we've proven another point here: If real programmers see a benchmark, they just *have* to optimize it. It's a compulsion. :)

Author: (Unknown)
2015-01-18 10:44 UTC
 

Yes, it is certainly a compulsion! Luckily I work in the game industry and can fulfill my compulsions hand writing assembler :p Also I can't believe how close the results were, even the Java vs C is extremely close where I would of thought it would be worlds apart.

Anyway, it's a great article. @indiecoderdanny

Author: Michael Schöbel
2015-01-18 11:18 UTC
 

Yep. Java was a surprise. But what really surprised me was how close JavaScript was. After all it is a dynamically-typed language. Hard to generate optimized code for that.

Author: (Unknown)
2015-01-22 14:30 UTC
 

About a year ago I made a comparison between three naive matrix multiplication implementations(O(N^3), 600x600 matrices, multiplication repeated 600 times on the same data): JS, JS+Typed Arrays and C (with and without -O3):

On a i5-3210M with DDR3@1600MHz running Windows 8.1 the results were:

C: ~2s
C with -O3: ~1.1s
JS: ~32s
JS Typed Arrays: ~1.5s

Initially I didn't include the version with -O3, so I was really baffled by the results. Turns out that execution speed was in fact constrained by the memory bandwidth.

Author: (Unknown)
2015-01-22 18:40 UTC
 

is not a surprise the speed of java, it has been very optimized throug years, see this benchmarks of several frameworks, java is between the first http://www.techempower.com/benchmarks/#section=data-r9

Author: (Unknown)
2015-02-26 22:24 UTC
 

The time difference between the regular Array and the Typed-Array JavaScript code could be because they are being created differently.

Regular Array:
var Data = new Array();
Data.length = maxNum + 1;

Typed:
var Data = new Uint8Array(maxNum + 1);

Regular Array can be created like Typed-Array is:
var Data = new Array(maxNum + 1);

Also, I would be interested in seeing how it compares with:
var Data = [];
for (i=2; i<=maxNum; i++) {
  Data[i] = true;
}
Data.length = maxNum + 1;

- Jack Slocum

Author: (Unknown)
2015-09-30 18:31 UTC
 

Small correction for the cache sizes you mentioned - L1 data cache for single core of your i7-4771 is only 32kB. So even a 100k byte array doesn't fit in L1, and given the irregular memory access, I guess the programs spend most of the time waiting for data from L2 cache. Unless they do something stupid in the inner loop, as the Javascript's worst case seems to do...

FPC usually generates slightly more redundant code when accessing memory as an array instead of a pointer access; it would be interesting to compare the resulting assembly as well.

Author: (Unknown)
2016-01-23 20:33 UTC
 

Your results are not credible. I just tried to reproduce your results and C performed 72% faster than JavaScript with optimizations turned on.

Author: Michael Schöbel
2016-01-23 21:02 UTC
 

You may want to have a look at:

https://geekregator.com/2015-01-21-node_js_0_11_15_and_io_js_1_0_3_very_much_the_same_in_performance.html

The speed of Node.js can depend a LOT on how exactly you use it. On which additional parameters you give it when you start it.

That may explain the difference you see.

-- Michael Schöbel

Author: Udo Schmal
2016-06-20 22:04 UTC
 

Free Pascal

Use Cardinal instead of int32. Use PByte and GetMem/FreeMem. Saves 1/4 of time.

procedure Sieve(maxNum: Cardinal);
var
  Data: PByte;
  i,j: Cardinal;
begin
  GetMem(Data, (maxNum+1)*SizeOf(Byte));
  try
    FillByte(Data^, maxNum+1, 1);
    for i := 2 to maxNum do
      if Data[i]=1 then
      begin
        j := i+i;
        while j<=maxNum do
        begin
          Data[j] := 0;
          Inc(j, i);
        end;
      end;
  finally
    FreeMem(Data, (maxNum+1)*SizeOf(Byte));
  end;
end;

How do you measure the time?

Author: Michael Schöbel
2016-06-21 09:06 UTC
 

I tried your version and in my tests it was originally WAY slower than my original. Then I noticed the try/finally and removed it. Which made your version 4-5% faster than mine.

Learning-point: Setting up a try/finally block takes up quite an amount of CPU-time.

Time is measured this way:

StartTi:=GetTickCount;
for i:=1 to 10000 do
begin
    Sieve(100 * 1000);
end;
WriteLn('Time spent = ',GetTickCount-StartTi,'ms');

I used FreePascal 3.0.0 for this test on 64-bit Linux and compiled with "-O3 -Mdelphi".

Author: Udo Schmal
2016-06-21 12:44 UTC
 

I tried it on 64-bit Windows, obviously there are differences.

But I can not believe that you measure the time within the program, the advantage of native languages is then lost, of course!

I would measure the entire time of program execution, for example with a small program like the following:

program Launcher;
uses SysUtils, Process;
var
  AProcess: TProcess;
  StartTime, EndTime: QWord;
begin
  AProcess := TProcess.Create(nil);
  StartTime := GetTickCount64();
  AProcess.Executable := ParamStr(1);
  AProcess.Options := AProcess.Options + [poWaitOnExit];
  AProcess.Execute;
  EndTime := GetTickCount64();
  WriteLn('Time spent: ' + IntToStr(EndTime - StartTime) + ' ms');
  AProcess.Free;
end.

Author: Michael Schöbel
2016-06-21 12:53 UTC
 

Depends on what you want to measure.

If you want to measure the total time to run the program, then you are right.

But what I wanted to measure was the execution time for running that procedure. I didn't want to measure the overhead for starting the process, or for running any JIT-compilers. I wanted to measure the performance of the end-result.

-- Michael Schöbel

Author: Udo Schmal
2016-06-21 13:31 UTC
 

But so also lacks the time to interpret the code, loading depending librarys and so on.

So you get only the execution time of the "generated machine code", not the execution time of the code!

That's exactly what you get always felt, the program may be fast but until it starts a long time passes.

That may be not so important in such small routines but in a large project, it is often a massive problem. Specifically, when the routine is called several times even at the same time, the efficiency decreases extreme.

Author: (Unknown)
2016-12-09 01:39 UTC
 

Almir Bispo:Autor

function milis(tempo_inicio:Ttime):string;
var meut:Ttimestamp;
begin
    meut:=DateTimeToTimeStamp(now - tempo_inicio);
    result:= copy(Variant(TimeStampToMSecs( Ttimestamp(meut) ) *10), length(Variant(TimeStampToMSecs( Ttimestamp(meut) )*10 ) )-4 ,4) ;
end;

//how to use

var timi:Ttime;
begin
    timi:=Ttime(now);
    sleep(1234);
    writeln(milis(timi)); //will show 1234 the time between (now - time current)
end;

You want to comment on this blog-post? If yes, then simply enter your comment in the field below and click on "Submit".

Comments are moderated at the moment thanks to some $%&# who thought it would be funny to post total nonsense here.

 


Back to the Homepage

A Programmer's Diary