Jump to content
Clément

Storing a large amount of elements in a 50k lines unit

Recommended Posts

Hi,

I'm almost releasing "dhsPinger v2.00" :classic_cheerleader: .
One of the features is recognize a network device manufacturer taking the device MAC Address.
image.thumb.png.6885fb0f8f4ee7b1aecbda7d04650a4c.png

 

Since the first version I'm generating a list with all available MAC vendors in a unit, this latest version has 49978 rows. (The very first release has 16k rows)
My MACvendors.data unit is autogenerated, and has 50028 lines. It looks like this:

var
  glbVendors : TDictionary<String,String>;
  
procedure InitializeVendors;
begin
  glbVendors.Add('286FB9','Nokia Shanghai Bell Co., Ltd.');
  glbVendors.Add('08EA44','Extreme Networks Headquarters');
  glbVendors.Add('F4EAB5','Extreme Networks Headquarters');
  {... 49978 rows ...}
  glbVendors.Add('8C1F64657','Bright Solutions PTE LTD');
  glbVendors.Add('8C1F64C9B','J.M. Voith SE & Co. KG ');
  glbVendors.Add('8C1F64B01','Blue Ocean UG');

end;  

initialization
   glbVendors := TDictionary<String,String>.Create(49978);
   InitializeVendors;

finalization
  glbVendors.free;


There's no impact on loading, and I had no problems (so far) using this structure.
Getting the MAC and it's vendor is done from within a thread.
The IDE (delphi 11) is handling it fine. When rebuilding, almost a minute is required just to compile this unit.
But It's a freaking 50k lines unit.
Is there any known limit I should be aware of?
(Project compiles to windows only, both 32-bit and 64-bit are required ).

TIA,

Clément

Edited by Clément

Share this post


Link to post

I can't remember which unit it was, and which Delphi release, but there was a constructor similar to this where some graphics or hash table was filled with thousands of lines.

After a certain size, the compiler silently cropped the code and linked the exe without any error.

 

Nevertheless, I would rewrite the generator to produce 16 (0-F first byte) array constants, sorted by the MAC address rather than the vendor. Additionally, I would substitute the vendor string with a lookup table to eliminate the redundant entries.

  • Like 2

Share this post


Link to post
1 hour ago, Clément said:

Is there any known limit I should be aware of?

No; I think our scorn for this design choice will be limitless 🙂

 

4 minutes ago, Attila Kovacs said:

Nevertheless, I would rewrite the generator to produce 16 (0-F first byte) array constants, sorted by the MAC address rather than the vendor. Additionally, I would substitute the vendor string with a lookup table to eliminate the redundant entries. 

https://en.wikipedia.org/wiki/Trie

And store the data in a resource.

  • Like 1
  • Haha 1

Share this post


Link to post

Why are you hard-coding so much data directly in your source code to begin with? Why not simply store the data in an external file or database and then load it from there at runtime?

 

If you absolutely need the data to be present statically in your app's executable, I would suggest having the auto-generator store the data in a separate file that is then linked into the app's resources at compile time, and then you can load the data from that resource at runtime. This much data really DOES NOT belong in the source code directly at all.

 

Another benefit of this approach (using either a file, database, resource, etc) is that you can update the data on the user's machine without having to deliver a new executable from your dev machine every time (in the case of using a resource, there are plenty of 3rd party tools available to update an app's resources directly). You can, of course, also update the data on your dev machine and recompile if you really want to.

Edited by Remy Lebeau
  • Like 2
  • Thanks 1

Share this post


Link to post

I just couldn't release the app like that. There will be a Beta V in the near future :classic_smile:
Thanks all for the insight!

Share this post


Link to post
1 hour ago, Clément said:

I just couldn't release the app like that.

Why not? The user won't notice a difference, and your code/project will be easier to manage. You are loading the Dictionary at runtime anyway, so what does it matter where the data originates from at runtime? Make things easier on yourself.

Edited by Remy Lebeau
  • Like 2

Share this post


Link to post
6 minutes ago, Remy Lebeau said:

Why not? The user won't notice a difference, and your code/project will be easier to manage. You are loading the Dictionary at runtime either way, so what does it matter where the data originates from? Make it easier on yourself. 

I meant, I couldn't release the application the way it is now ( 50k unit file). I will generate an external file, and load it at start up.... so a Beta V is on the way. :classic_wink:

 

  • Like 2

Share this post


Link to post
5 hours ago, Clément said:

I couldn't release the application the way it is now ( 50k unit file). I will generate an external file, and load it at start up....

I don't know which processing you need to do with the data. I guess it is a simple lookup. Instead of loading the data into a TDictionary, you could use a SQLite table and use SQL request to do the lookup. With SQLite, the SQL engine is linked with your executable and there is zero installation. It is also very fast.

  • Like 2

Share this post


Link to post
6 hours ago, FPiette said:

I don't know which processing you need to do with the data. I guess it is a simple lookup. Instead of loading the data into a TDictionary, you could use a SQLite table and use SQL request to do the lookup. With SQLite, the SQL engine is linked with your executable and there is zero installation. It is also very fast. 

It's a simple lookup. I never expected it to grow so fast. A pre-build event download some files, processes them, generate a unit that is compiled with the project. Everything is packet in a single exe file.
I noticed some hick-ups when compiling and found out that unit.
I know this not an excuse, but it was supposed to be a simple lookup in a simple application.
Thanks for the advice

Share this post


Link to post

What's the problem in 50k units?)) You can split them into sub-units each filling the same container with their personal data. You can also have pre-compiled binaries of these units to save compile time and rebuild them only when data changes.

However in this case I as well suggest storing the data outside from source. I'd use resource but a simple external file also has its pro's

Share this post


Link to post
2 hours ago, Fr0sT.Brutal said:

What's the problem in 50k units?)) You can split them into sub-units each filling the same container with their personal data. You can also have pre-compiled binaries of these units to save compile time and rebuild them only when data changes.

However in this case I as well suggest storing the data outside from source. I'd use resource but a simple external file also has its pro's

If I choose to ignore the compilation hick-ups, I had no issue. But for this project, an external file has it's advantages.
This application runs on some servers where the internet connection is fiercefully controlled. Among other things, the new engine in this release will run also from a Windows Service. No outside connection will be allowed from this service (security issues)
The download will happen from the interface ( the picture I posted in the first message). In the Service version, a notification will be sent to a registered administrator every time a "new device" is detected. One of the requirement is to have as much data as possible to identity it.
Up until now, I would have to compile that 50k unit and include that unit in both, the interface and the service. Now both will load an external file.

Share this post


Link to post
5 minutes ago, Angus Robertson said:

ICS has a simple functon IcsGetMacVendor that loads the tab separated https://linuxnet.ca/ieee/oui/nmap-mac-prefixes list into a simple TStringList, sorts it, then accesses by partial Find, very quick, simple and efficient.  It also checks for randomly generated MACs that fail look-up and report that.


Angus

 

Cool. I'm using ICS to ping. From Threads :classic_cheerleader:
Thanks for the link
 

Share this post


Link to post

Look at the new OverbyteNetTools sample, the LAN Devices tab scans the LAN for devices in various ways and shows the MAC vendor, often useful for identifying all those IoT devices that our LAN seem to accumulate, often announcing themselves as Amazon or Google, NEST, Tuya, Espressif, and others, just on my LAN. For reasons unknown, they sometimes change MAC address to something random and back again. 

 

Angus

 

Share this post


Link to post
2 hours ago, Angus Robertson said:

Look at the new OverbyteNetTools sample, the LAN Devices tab scans the LAN for devices in various ways and shows the MAC vendor, often useful for identifying all those IoT devices that our LAN seem to accumulate, often announcing themselves as Amazon or Google, NEST, Tuya, Espressif, and others, just on my LAN. For reasons unknown, they sometimes change MAC address to something random and back again. 

 

Angus

 

I will take a look at this sample. I'm still using  V8.71.

 

Thanks

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×