Jump to content
Gustavo 'Gus' Carreno

Offical launch of the 1 Billion Row Challenge in Object Pascal

Recommended Posts

Hey Cornelius,

 

Had a look at your fork and if you feel that your changes make more sense in the Delphi side of things, I'm quite happy to merge the PR and push a v1.1 tag.

 

Sorry to use this channel to convey that, but you don't have Issues or Discussions active on your fork.

 

Cheers,

Gus


P.S.: The README had an update to include the input file SHA256 hash and a list of the generator usage.

Share this post


Link to post
3 hours ago, Gustavo 'Gus' Carreno said:

Had a look at your fork and if you feel that your changes make more sense in the Delphi side of things, I'm quite happy to merge the PR and push a v1.1 tag.

I almost made a pull request last night but then backed it out after I made another test.

 

The generator program parses the parameters and checks to make sure they're valid but in Delphi, the way it's done adds a CR/LF which causes 'h' <> 'h$D$A' and thus fails even a valid command-line. I fixed that but didn't take into account that some parameters have a second part (like --line-count or --input-file) so I need to add that. I'll certainly make a PR when it's ready.

  • Like 1

Share this post


Link to post

Hey Cornelius,

 

I was gonna make that version myself using a package to take care of the command line params.
But then my brain entered stooopid mode, someone else offered to make it and I just went with it.

 

The fact that Free Pascal has TCustomApplication that has inbuilt params parsing and checking makes it a breeze to use out of the box.
I'm just sad that Delphi hasn't invested in something like that out of the box and leaves the programmer a bit in the lurch to do the same boilerplate stuff every time.

 

I'll be waiting for your PR!!

 

Cheers,

Gus

  • Like 1

Share this post


Link to post
26 minutes ago, Gustavo 'Gus' Carreno said:

The fact that Free Pascal has TCustomApplication that has inbuilt params parsing and checking makes it a breeze to use out of the box.
I'm just sad that Delphi hasn't invested in something like that out of the box and leaves the programmer a bit in the lurch to do the same boilerplate stuff every time.

QuickLib.Parameters would've prevented you from needing to reinvent the technique--but it's a little overkill and it would require bringing a whole lot of other stuff along with it.

Share this post


Link to post

Lots of options!

 

Command-line parameter validation fixed for Delphi--pull request made.

 

Generated a few different sized files for testing, the last two:

  • 100-million row file created in 9 seconds
  • 1-billion row file created in 2 minutes, 23 seconds.
  • Like 1

Share this post


Link to post
Posted (edited)

Hey Cornelius,

 

Quote

Lots of options!

 

Indeed. But expected, right?
When the out of the box is missing, the FOSS side of things kinda fills the void, right?

 

1 hour ago, corneliusdavid said:

Command-line parameter validation fixed for Delphi--pull request made.

Okidokes, merged!!

 

1 hour ago, corneliusdavid said:

Generated a few different sized files for testing, the last two:

  • 100-million row file created in 9 seconds
  • 1-billion row file created in 2 minutes, 23 seconds.

On my system, into the SSD I was doing ~1m, but was doing 30s into /dev/null.
Those are still quite impressive values. But paweld did do a wonderful job at optimising the code that is shared between Delphi and Free Pascal a lot. Quite proud of his work!!

 

Cheers,

Gus

Edited by Gustavo 'Gus' Carreno

Share this post


Link to post

Hey Cornelius,

 

We are coordinating, live, on Discord.
 

I'm not sure you're a Discord person, or even if you're into online chat things, but I thought you should have that piece of info.

We are assembling on these:

  • The "Delphi Community" Discord server on the events channel
  • The "Unofficial Free Pascal" Discord server on the off-topic channel( mainly, because the server owner did not create a channel for the event )

 

Cheers,

Gus

Share this post


Link to post

Hey Y'All,

 

We are officially in the 16 to 17 seconds:

$ time ./bin/sbalazs /tmp/measurements-1_000_000_000.txt 32
{...}
real 0m16.422s
user 3m46.669s
sys 1m29.654s

Only 15 to 16 seconds to go :classic_biggrin:

 

Cheers,
Gus

  • Like 1

Share this post


Link to post

Hey Y'All,

We are now into the 2 seconds region:

    ******** Run All ********
     
    ===== Arnaud Bouchez ======
    -- SSD --
    Benchmark 1: abouchez
      Time (mean ± σ):      2.472 s ±  0.061 s    [User: 27.787 s, System: 1.720 s]
      Range (min … max):    2.386 s …  2.588 s    10 runs
     
    ===========

That's 1 warmup, 10 runs, 16 threads on SSD

I'm having issues with Linux watchdogs with 2 entries when run on HDD killing my shell.
Need to solve that and I'll have results for those.

Cheers,
Gus

  • Thanks 1

Share this post


Link to post

Hey Y'All,

After some extensive testing from @paweld:

  • Tested with RAMDisk, SSD and HDD
  • Tested with the original input file containing only 400 weather stations

He came to the conclusion that there's really nothing to see between all of the above data containers.

I've eliminated the HDD column in the results.

Cheers,
Gus

  • Thanks 1

Share this post


Link to post

Hey my challenge peeps,

 

Someone from the Telegrams has mentioned that our Rounding code is maybe flawed with regards to negative temps.
He's looking into this.
He's also hacking away at making the Delphi version of the baseline.
Be ready to jump on that PR button once we've come to the bottom of this.

 

Cheers,
Gus

Share this post


Link to post

Hey Y'All,

After a bit of talking to a prospect entry person, I've come to the realisation that we had a very complicated rounding implementation.

The conclusion of that talk and further code is now the new official rounding code.

I've also altered the README.md file to include the code and the new SHA256 hash of the output, alongside an archived file containing the new baseline output.

Cheers,
Gus

Share this post


Link to post
On 3/10/2024 at 9:06 AM, Alexander Sviridenkov said:

HTML Library / SQL framework: 0.33s. Zero lines of code)

 

image.thumb.png.bf0b721d3f06f944a51818f601d4a109.png

 

 

Easy, and wrong.

You are reading the station weathers reference data with one row per station.
And making a min/max/average of a single data per station.

 

The challenge is to read a 1 billion (1,000,000,000) rows of CSV data for all those 41343 stations, and compute it.

There is a generator of a 16GB CSV file to inject and process.

So 0.33s for 41343 rows would make around 8000 seconds, i.e. 5.5 days.

  • Like 3

Share this post


Link to post
17 minutes ago, Arnaud Bouchez said:

Easy, and wrong.

You are reading the station weathers reference data with one row per station.
And making a min/max/average of a single data per station.

 

The challenge is to read a 1 billion (1,000,000,000) rows of CSV data for all those 41343 stations, and compute it.

There is a generator of a 16GB CSV file to inject and process.

So 0.33s for 41343 rows would make around 8000 seconds, i.e. 5.5 days.

 

Share this post


Link to post
Posted (edited)

I missed the info, sorry.

 

For a SQL solution, it is very good.

Out of curiosity, how much memory does it need for the 16.5 GB file?
Does it use SQLite3 and its virtual tables internally for its SQL dialect (something like https://www.sqlite.org/csv.html)?

Edited by Arnaud Bouchez

Share this post


Link to post
9 minutes ago, Arnaud Bouchez said:

I missed the info, sorry.

 

For a SQL solution, it is very good.

Out of curiosity, how much memory does it need for the 16.5 GB file?
Does it use SQLite3 and its virtual tables internally for its SQL dialect (something like https://www.sqlite.org/csv.html)?

Memory is around 20Mb and doesn;t depend on file size.

No SQLite or other external solutions are used, everything is written in plain Delphi - SQL parser, SQL execution classes, etc.
https://delphihtmlcomponents.com/sql/

Share this post


Link to post
Posted (edited)

Hey Y'All,

 

Looks like there is a simple fix: Not using Double but Currency.

function RoundExDouble(x: Currency): Double;
begin
  Result := Ceil(x * 10) / 10;
end;

Does not fix the issue that Delphi is not consistent with Double across Windows 32,64 and Linux 64.

But at least we now have consistency on our end!!!

 

Many thanks to paweld for spotting the fix!!

 

Cheers,

Gus

Edited by Gustavo 'Gus' Carreno

Share this post


Link to post

var
  c : currency;
begin
  c := Wert * 10;

  ....

 

should make it calculating the same on all platforms, but I'm AFK at the moment.

  • Thanks 1

Share this post


Link to post

Hey Attila,

 

42 minutes ago, Attila Kovacs said:

var
  c : currency;
begin
  c := Wert * 10;

  ....

 

should make it calculating the same on all platforms, but I'm AFK at the moment.

 

That was indeed the solution that Paweld came upon and is the one we are using now.

Nonetheless, many, many thanks for the tip!!!

 

Cheers,

Gus

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×