Clément 148 Posted October 28, 2023 (edited) Hi, I'm almost releasing "dhsPinger v2.00" . One of the features is recognize a network device manufacturer taking the device MAC Address. Since the first version I'm generating a list with all available MAC vendors in a unit, this latest version has 49978 rows. (The very first release has 16k rows) My MACvendors.data unit is autogenerated, and has 50028 lines. It looks like this: var glbVendors : TDictionary<String,String>; procedure InitializeVendors; begin glbVendors.Add('286FB9','Nokia Shanghai Bell Co., Ltd.'); glbVendors.Add('08EA44','Extreme Networks Headquarters'); glbVendors.Add('F4EAB5','Extreme Networks Headquarters'); {... 49978 rows ...} glbVendors.Add('8C1F64657','Bright Solutions PTE LTD'); glbVendors.Add('8C1F64C9B','J.M. Voith SE & Co. KG '); glbVendors.Add('8C1F64B01','Blue Ocean UG'); end; initialization glbVendors := TDictionary<String,String>.Create(49978); InitializeVendors; finalization glbVendors.free; There's no impact on loading, and I had no problems (so far) using this structure. Getting the MAC and it's vendor is done from within a thread. The IDE (delphi 11) is handling it fine. When rebuilding, almost a minute is required just to compile this unit. But It's a freaking 50k lines unit. Is there any known limit I should be aware of? (Project compiles to windows only, both 32-bit and 64-bit are required ). TIA, Clément Edited October 28, 2023 by Clément Share this post Link to post
Attila Kovacs 629 Posted October 28, 2023 I can't remember which unit it was, and which Delphi release, but there was a constructor similar to this where some graphics or hash table was filled with thousands of lines. After a certain size, the compiler silently cropped the code and linked the exe without any error. Nevertheless, I would rewrite the generator to produce 16 (0-F first byte) array constants, sorted by the MAC address rather than the vendor. Additionally, I would substitute the vendor string with a lookup table to eliminate the redundant entries. 2 Share this post Link to post
Anders Melander 1783 Posted October 28, 2023 1 hour ago, Clément said: Is there any known limit I should be aware of? No; I think our scorn for this design choice will be limitless 🙂 4 minutes ago, Attila Kovacs said: Nevertheless, I would rewrite the generator to produce 16 (0-F first byte) array constants, sorted by the MAC address rather than the vendor. Additionally, I would substitute the vendor string with a lookup table to eliminate the redundant entries. https://en.wikipedia.org/wiki/Trie And store the data in a resource. 1 1 Share this post Link to post
Remy Lebeau 1394 Posted October 28, 2023 (edited) Why are you hard-coding so much data directly in your source code to begin with? Why not simply store the data in an external file or database and then load it from there at runtime? If you absolutely need the data to be present statically in your app's executable, I would suggest having the auto-generator store the data in a separate file that is then linked into the app's resources at compile time, and then you can load the data from that resource at runtime. This much data really DOES NOT belong in the source code directly at all. Another benefit of this approach (using either a file, database, resource, etc) is that you can update the data on the user's machine without having to deliver a new executable from your dev machine every time (in the case of using a resource, there are plenty of 3rd party tools available to update an app's resources directly). You can, of course, also update the data on your dev machine and recompile if you really want to. Edited October 29, 2023 by Remy Lebeau 2 1 Share this post Link to post
Clément 148 Posted October 29, 2023 I just couldn't release the app like that. There will be a Beta V in the near future Thanks all for the insight! Share this post Link to post
Remy Lebeau 1394 Posted October 29, 2023 (edited) 1 hour ago, Clément said: I just couldn't release the app like that. Why not? The user won't notice a difference, and your code/project will be easier to manage. You are loading the Dictionary at runtime anyway, so what does it matter where the data originates from at runtime? Make things easier on yourself. Edited October 29, 2023 by Remy Lebeau 2 Share this post Link to post
Clément 148 Posted October 29, 2023 6 minutes ago, Remy Lebeau said: Why not? The user won't notice a difference, and your code/project will be easier to manage. You are loading the Dictionary at runtime either way, so what does it matter where the data originates from? Make it easier on yourself. I meant, I couldn't release the application the way it is now ( 50k unit file). I will generate an external file, and load it at start up.... so a Beta V is on the way. 2 Share this post Link to post
FPiette 383 Posted October 29, 2023 5 hours ago, Clément said: I couldn't release the application the way it is now ( 50k unit file). I will generate an external file, and load it at start up.... I don't know which processing you need to do with the data. I guess it is a simple lookup. Instead of loading the data into a TDictionary, you could use a SQLite table and use SQL request to do the lookup. With SQLite, the SQL engine is linked with your executable and there is zero installation. It is also very fast. 2 Share this post Link to post
Clément 148 Posted October 29, 2023 6 hours ago, FPiette said: I don't know which processing you need to do with the data. I guess it is a simple lookup. Instead of loading the data into a TDictionary, you could use a SQLite table and use SQL request to do the lookup. With SQLite, the SQL engine is linked with your executable and there is zero installation. It is also very fast. It's a simple lookup. I never expected it to grow so fast. A pre-build event download some files, processes them, generate a unit that is compiled with the project. Everything is packet in a single exe file. I noticed some hick-ups when compiling and found out that unit. I know this not an excuse, but it was supposed to be a simple lookup in a simple application. Thanks for the advice Share this post Link to post
Rollo62 536 Posted October 29, 2023 You could also consider to load that from an "official" vendors list or an API provider, during runtime. https://regauth.standards.ieee.org/standards-ra-web/pub/view.html#registries Share this post Link to post
Fr0sT.Brutal 900 Posted October 30, 2023 What's the problem in 50k units?)) You can split them into sub-units each filling the same container with their personal data. You can also have pre-compiled binaries of these units to save compile time and rebuild them only when data changes. However in this case I as well suggest storing the data outside from source. I'd use resource but a simple external file also has its pro's Share this post Link to post
Clément 148 Posted October 30, 2023 21 hours ago, Rollo62 said: You could also consider to load that from an "official" vendors list or an API provider, during runtime. https://regauth.standards.ieee.org/standards-ra-web/pub/view.html#registries I'm downloading the CSV files from (https://standards-oui.ieee.org/). Those are the files processed that grew to 50k. Share this post Link to post
Angus Robertson 574 Posted October 30, 2023 ICS has a simple functon IcsGetMacVendor that loads the tab separated https://linuxnet.ca/ieee/oui/nmap-mac-prefixes list into a simple TStringList, sorts it, then accesses by partial Find, very quick, simple and efficient. It also checks for randomly generated MACs that fail look-up and report that. Angus Share this post Link to post
Clément 148 Posted October 30, 2023 2 hours ago, Fr0sT.Brutal said: What's the problem in 50k units?)) You can split them into sub-units each filling the same container with their personal data. You can also have pre-compiled binaries of these units to save compile time and rebuild them only when data changes. However in this case I as well suggest storing the data outside from source. I'd use resource but a simple external file also has its pro's If I choose to ignore the compilation hick-ups, I had no issue. But for this project, an external file has it's advantages. This application runs on some servers where the internet connection is fiercefully controlled. Among other things, the new engine in this release will run also from a Windows Service. No outside connection will be allowed from this service (security issues) The download will happen from the interface ( the picture I posted in the first message). In the Service version, a notification will be sent to a registered administrator every time a "new device" is detected. One of the requirement is to have as much data as possible to identity it. Up until now, I would have to compile that 50k unit and include that unit in both, the interface and the service. Now both will load an external file. Share this post Link to post
Clément 148 Posted October 30, 2023 5 minutes ago, Angus Robertson said: ICS has a simple functon IcsGetMacVendor that loads the tab separated https://linuxnet.ca/ieee/oui/nmap-mac-prefixes list into a simple TStringList, sorts it, then accesses by partial Find, very quick, simple and efficient. It also checks for randomly generated MACs that fail look-up and report that. Angus Cool. I'm using ICS to ping. From Threads Thanks for the link Share this post Link to post
Angus Robertson 574 Posted October 30, 2023 Look at the new OverbyteNetTools sample, the LAN Devices tab scans the LAN for devices in various ways and shows the MAC vendor, often useful for identifying all those IoT devices that our LAN seem to accumulate, often announcing themselves as Amazon or Google, NEST, Tuya, Espressif, and others, just on my LAN. For reasons unknown, they sometimes change MAC address to something random and back again. Angus Share this post Link to post
Clément 148 Posted October 30, 2023 2 hours ago, Angus Robertson said: Look at the new OverbyteNetTools sample, the LAN Devices tab scans the LAN for devices in various ways and shows the MAC vendor, often useful for identifying all those IoT devices that our LAN seem to accumulate, often announcing themselves as Amazon or Google, NEST, Tuya, Espressif, and others, just on my LAN. For reasons unknown, they sometimes change MAC address to something random and back again. Angus I will take a look at this sample. I'm still using V8.71. Thanks Share this post Link to post