challenge Offical launch of the 1 Billion Row Challenge in Object Pascal

Gustavo 'Gus' Carreno · March 10, 2024

Hey Y'All,

Official launch of the 1 Billion Row Challenge

Happy coding and don't forget to have fun!!

Cheers,
Gus

Alexander Sviridenkov · March 10, 2024

HTML Library / SQL framework: 0.33s. Zero lines of code)

Edited March 10, 2024 by Alexander Sviridenkov

Gustavo 'Gus' Carreno · March 10, 2024

2 minutes ago, Alexander Sviridenkov said:

HTML Library / SQL framework: 0.33s. Zero lines of code)

Hey Alexander,

I quite like your sense of humour 😁 !!

Doesn't quite satisfy the rules, and a line of SQL is still a line, but yeah, good one !!

Cheers,

Gus

Attila Kovacs · March 10, 2024

1 hour ago, Gustavo 'Gus' Carreno said:

Doesn't quite satisfy the rules, and a line of SQL is still a line, but yeah, good one !!

Why not? The whole app on the screenshot is written in object pascal, AFAIK it will even compile to your ubuntu.

Gustavo 'Gus' Carreno · March 10, 2024

5 minutes ago, Attila Kovacs said:

Why not? The whole app on the screenshot is written in object pascal, AFAIK it will even compile to your ubuntu.

Welp, has to be a command line program I can time with hyperfine, as stated on the rules.

Needs to output to STDOUT, as stated on the rules.

Needs to be pure Object Pascal with no external libs or package dependency, as stated in the rules.

Must I go on 😉 ?

Cheers,

Gus

Attila Kovacs · March 10, 2024

Best of luck with that. The issue with these challenges isn't the problem they aim to solve, but rather, who on earth has the time for them.

Gustavo 'Gus' Carreno · March 10, 2024

3 minutes ago, Attila Kovacs said:

Best of luck with that. The issue with these challenges isn't the problem they aim to solve, but rather, who on earth has the time for them.

Hey Attila,

Thank you very much !!!

I don't see that as an issue. I see that as a fact of life, like everyone has a personal life.
Then you chose to make the time for it or not, and that depends on your schedule and your will to participate.

I'm not putting a gun on anyone's head, just proposing a fun, and quite optional, exercise in programming.

Cheers,

Gus

Edited March 10, 2024 by Gustavo 'Gus' Carreno

dummzeuch · March 10, 2024

I wonder how much of the time depends on where the file is read from: RAM Disk, SSD, HDD and for the latter whether it's already in the cache or not.

2 (American billion) = 2.000.000.000 lines of about 15 characters makes it about 30.000.000.000 bytes, that's 30 Gig of data to read, split into lines, then split into name and value and then aggregate by name.

32 bit Delphi won't be able to handle that with Stringlist because it won't fit into memroy, I wonder whether there are any bugs in the RTL that would prevent that with a 64 bit Delphi program. But anyway: Using a StringList is probably not the most efficient way of reading the data. Plain old ReadLn would likely do the trick faster. Some kind of buffering might speed it up and maybe parsing based on a PChar pointer rather than strings.

Then selecting a suitable datastructure, probably some hash based dictionary.

The rest is not much of a challenge.

Edited March 10, 2024 by dummzeuch

Gustavo 'Gus' Carreno · March 10, 2024

Quote

I wonder how much of the time depends on where the file is read from: RAM Disk, SSD, HDD and for the latter whether it's already in the cache or not.

I'm performing tests on both an SSD and an HDD and the results reflect that: https://github.com/gcarreno/1brc-ObjectPascal#results

I'm using hyperfine to run the program 10 times. This will give the system ( Ubuntu 23.10 64b ) the opportunity to cache what it needs to cache.
The specs of my machine are listed on the GitHub repository.

Quote

2 (American billion) = 2.000.000.000 lines of about 15 characters makes it about 30.000.000.000 bytes, that's 30 Gig of data to read, split into lines, then split into name and value and then aggregate by name.

The input file has 1 (American Billion) 1.000.000.000 lines and has the size of ~16GiB.

Quote

32 bit Delphi won't be able to handle that with Stringlist because it won't fit into memroy, I wonder whether there are any bugs in the RTL that would prevent that with a 64 bit Delphi program. But anyway: Using a StringList is probably not the most efficient way of reading the data. Plain old ReadLn would likely do the trick faster. Some kind of buffering might speed it up and maybe parsing based on a PChar pointer rather than strings.

Yeah, sorry, don't have the necessary knowledge to even comment on that 😅

Quote

Then selecting a suitable datastructure, probably some hash based dictionary.

Yeah, agreed!

Quote

The rest is not much of a challenge.

Depends on the opinion you have about using threads and their implicit complexity. But yeah, the real challenge is to make it as blazing fast as you can!!
And the time to match, or beat, is one second. This from the results of the original challenge made in Java.

Cheers,

Gus

Edited March 10, 2024 by Gustavo 'Gus' Carreno
Mention of speed from original challenge

Gustavo 'Gus' Carreno · March 10, 2024

Hey dummzeuch,

BTW... Writing 5 paragraphs with a conjecture of how to do it and then dismissing the entire thing as being "not much of a challenge" is a bit of a crappy dismissal, no?

Instead of just resting on your thought experiment, why don't you put your money where your mouth is and prove what you claim?

Shooting from the hip is rather easy, but making an entry and proving your chops is something entirely different, right?

I'm a bit miffed and my choice of words may seem harsh, but the lack of usefulness or any point in the answers I got just gave me a very bad brogrammer machismo vibe that I've only seen in Stack Overflow.

If this is the type of welcome you peeps extend to a newcomer... I dunno... It's pretty toxic...

Even in the case that this could not be the type of thing that the regulars here have an interest in, at least a sense of community is the least to expect, no?

I deeply regret the thought I had of attempting to post here! I just hope that name calling and dumb shaming is not the next thing I'm to be dealt...

Cheers,

Gus

Attila Kovacs · March 10, 2024

8 minutes ago, Gustavo 'Gus' Carreno said:

If this is the type of welcome you peeps extend to a newcomer... I dunno... It's pretty toxic...

It's Sunday man. You haven't even met the hardcore yet. Be patient. Perhaps you could create a table to track how long it takes until you become angry again 😛

Edited March 10, 2024 by Attila Kovacs

Brian Evans · March 10, 2024

Would have been received better if you hadn't left out the background of the challenge - that it was a Java challenge originally and has subsequently been picked up by other languages.

GitHub - gunnarmorling/1brc: 1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

Gustavo 'Gus' Carreno · March 10, 2024

Hey Attila,

Quote

It's Sunday man. You haven't even met the hardcore yet. Be patient. Perhaps you could create a table to track how long it takes until you become angry again 😛

I am know to go off in tangents and angry rants, sometimes, for not much.

I'm also involved in a bunch of other communities like Telegram(English and Portuguese, Lazarus and Delphi), Discord( 3 Servers of Lazarus and Delphi) and the Lazarus Forums.

In none of those have I ever had such a response.
In all of the above I try really hard to welcome the padawans that wander in with the most devilishly incomplete and weird questions, trying to hang on to the patience of a saint.
Heck, I even got made MVP by Ian because of that alone!!

When I come to a new place and I'm greeted this way, welp, my short fuse did get lit, consumed and passed the spark to the gun powder!!

I probably need to apologise for the words I've used. But I'm not apologising for the message conveyed!

Cheers,

Gus

Gustavo 'Gus' Carreno · March 10, 2024

2 minutes ago, Brian Evans said:

Would have been received better if you hadn't left out the background of the challenge - that it was a Java challenge originally and has subsequently been picked up by other languages.

GitHub - gunnarmorling/1brc: 1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

Hey Brian Evans,

I was only trying to be brief, since the README file on the GitHub repository has all the needed information, plus the necessary attributions.

I left the correct trail to be followed for the ones that would have the interest of getting to the bottom of it all.

I don't think that me leaving a breadcrumb trail is to be used as an excuse to just dismiss the hole thing entirely.

But again, maybe due to my short fuse, while it did take more than a couple of hours to actually come back and write a less honourable post, I will apologise for the type of wording I used. But not for the content itself!!

Cheers,

Gus

dummzeuch · March 10, 2024

25 minutes ago, Gustavo 'Gus' Carreno said:

Writing 5 paragraphs with a conjecture of how to do it ... Instead of just resting on your thought experiment, why don't you put your money where your mouth is and prove what you claim?

I can't be bothered, sorry. I was only "thinking aloud". Maybe I shouldn't have written it as a comment, though.

Gustavo 'Gus' Carreno · March 10, 2024

Just now, dummzeuch said:

I can't be bothered, sorry. I was only "thinking aloud". Maybe I shouldn't have written it as a comment, though.

Hey Dummzeuch,

I really enjoyed the way you laid it out, that I really enjoyed, truly.

Just that last paragraph did put the flame on my short fuse.

If you just said something along: Nice thing to have a go, if ever I had the time for it.

I would just be beaming with content and would never shoot off my big mouth.
I would be quite grateful for your input and gone to do something else. The end...

Cheers,

Gus

kolbasz · March 10, 2024

10 hours ago, Attila Kovacs said:

Why not? The whole app on the screenshot is written in object pascal, AFAIK it will even compile to your ubuntu.

Because he forget to mention how long it take to populate table weather_station with one billion records. I'm guessing many minutes, unless he has a very performant hardware. Running a query is only half part of the story. The post is funny though.

Brian Evans · March 11, 2024

9 hours ago, Gustavo 'Gus' Carreno said:

Hey Brian Evans,

I was only trying to be brief, since the README file on the GitHub repository has all the needed information, plus the necessary attributions.

I left the correct trail to be followed for the ones that would have the interest of getting to the bottom of it all.

I don't think that me leaving a breadcrumb trail is to be used as an excuse to just dismiss the hole thing entirely.

But again, maybe due to my short fuse, while it did take more than a couple of hours to actually come back and write a less honourable post, I will apologise for the type of wording I used. But not for the content itself!!

Cheers,

Gus

It is missing WHY this specific task was chosen and WHY somebody might want to tackle it. Without either WHY the task itself seems silly and not worth much time. Read the blog post and readme from the point of view of somebody who had never heard of the "1 Billion Row Challenge". Only by following and reading some of the LINKS in the readme would they find or deduce answers for the two WHYs.

This observation is not really meant as criticism but feedback for why the response here has been so lackluster: At first look it seems like a very silly contest so got silly and "who cares" answers.

Gustavo 'Gus' Carreno · March 11, 2024

Hey Brian,

Okydokes, I get it!!

I completely forgot to account that I'm a 53 year old person that lives in an era where the average attention span is... oopsss, it's gone!!

And for that I deeply apologise !! I shoulda known better, cuz I do have 2 kids that show those symptoms and I completely forgot about that fact.

Sorry!!

Cheers,

Gus

corneliusdavid · March 11, 2024

Hey Gus,

It seems like a fun challenge that people who have time and interest will look at and participate. We've got a couple of months, so plenty of time to get involved if you're so inclined.

One question: the .CSV in the repository has only 44,691 entries. So, the idea is that we need to run the generator program first to generate the 1B file, right? I suppose this could also, then, be used to generate smaller files for development testing.

Thanks for the links and for bringing it to the Delphi community!

Gustavo 'Gus' Carreno · March 11, 2024

Hey Cornelius,

Quote

It seems like a fun challenge that people who have time and interest will look at and participate.

Absolutely!!

Quote

We've got a couple of months, so plenty of time to get involved if you're so inclined.

Correctomundo!!

Quote

One question: the .CSV in the repository has only 44,691 entries. So, the idea is that we need to run the generator program first to generate the 1B file, right?
I suppose this could also, then, be used to generate smaller files for development testing.

You are correct. As any well behaved Linux command, well at least I made an effort on the Lazarus side, if you run it with the `-h` or `--help` param it will print it's usage.
The Delphi one also has the same behaviour. And we also made an effort to make the Delphi and Lazarus side of things match in terms of generation.
The main objective of having a generator is the fact that anyone can practice with the exact same content.
The other objective is simply the fact that the file that contains the full 1 billion rows is ~16GiB. No way we were going to store that on a free GitHub repository.

$ ./bin/generator -h
Generates the measurement file with the specified number of lines

USAGE
  generator <flags>

FLAGS
  -h|--help                      Writes this help message and exits
  -v|--version                   Writes the version and exits
  -i|--input-file <filename>     The file containing the Weather Stations
  -o|--output-file <filename>    The file that will contain the generated lines
  -n|--line-count <number>       The amount of lines to be generated ( Can use 1_000_000_000 )

The input and output files are needed. As is the number of lines to generate.
The input file being the one you mentioned having the ~44K entries.
The output file is of your choice.

The number of lines can be in the normal base 10 format, or use underscores for the thousands separator, as shown on the usage printed above.

Most have been running on test files of about 100 million rows, but this is just an example.

Quote

Thanks for the links and for bringing it to the Delphi community!

You're more than welcome !!

Hope you can make the time to participate and have a ton of fun while doing it!!

Cheers,

Gus

Edited March 11, 2024 by Gustavo 'Gus' Carreno
Better context

Stefan Glienke · March 11, 2024

On 3/10/2024 at 9:06 AM, Alexander Sviridenkov said:

HTML Library / SQL framework: 0.33s. Zero lines of code)

So assuming that your code scales linearly it will only take 92 days for 1 billion rows

Attila Kovacs · March 11, 2024

18 minutes ago, Stefan Glienke said:

So assuming that your code scales linearly it will only take 92 days for 1 billion rows

USA Billion 😉

Alexander Sviridenkov · March 11, 2024

43 minutes ago, Stefan Glienke said:

So assuming that your code scales linearly it will only take 92 days for 1 billion rows

Scaling is not linear there. Real 10^9 file (16.5 Gb) is processed in 15 minutes (936 sec) on Ryzen 5 4600H notebook (single thread).

Alexander Sviridenkov · March 11, 2024

18 hours ago, kolbasz said:

Because he forget to mention how long it take to populate table weather_station with one billion records. I'm guessing many minutes, unless he has a very performant hardware. Running a query is only half part of the story. The post is funny though.

There are no tables. Query is executed directly on CSV file.

Sign In

challenge Offical launch of the 1 Billion Row Challenge in Object Pascal

Recommended Posts

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Alexander Sviridenkov 363

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Attila Kovacs 666

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Attila Kovacs 666

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

dummzeuch 1656

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Attila Kovacs 666

Share this post

Link to post

Brian Evans 124

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

dummzeuch 1656

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

kolbasz 1

Share this post

Link to post

Brian Evans 124

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

corneliusdavid 249

Share this post

Link to post

Gustavo 'Gus' Carreno 28

Share this post

Link to post

Stefan Glienke 2143

Share this post

Link to post

Attila Kovacs 666

Share this post

Link to post

Alexander Sviridenkov 363

Share this post

Link to post

Alexander Sviridenkov 363

Share this post

Link to post

Create an account or sign in to comment

Create an account