Jump to content

David Schwartz

Members
  • Content Count

    1191
  • Joined

  • Last visited

  • Days Won

    24

Everything posted by David Schwartz

  1. I don't see multi-threading as a solution for reducing the complexity of having too many global variables. I imagine it just creates more complexity. Also, I don't say this to be critical, but you don't seem to understand OOP. There are a LOT of programmers who don't. And that's fine. Delphi's language is called "Object Pascal" to differentiate it from original Pascal that was more like 'C'. The Delphi IDE generates forms that are all objects and a 'main' procedure that's used to create everything. The entire Visual Control Library (VCL) as well as FireMonkey (FMX) is based around objects. The component libraries are all objects. Everything in Delphi is oriented around objects. If there are a bunch of global variables in a Delphi form unit, that means that whomever wrote it was probably not skilled with OOP. You should probably refactor the code to get all of those global vars into classes, along with all of the methods that read and write to them. It's not entirely straightforward, but a lot of it is. There's a free book available that teaches a lot of this basic OOP stuff. Object Pascal Handbook by Marco Cantu If you login to your Embarcadero Code Central portal, it's hiding there somewhere as a free download. I also found this link: http://forms.embarcadero.com/DownloadMarcoCantueBook
  2. I think that if Delphi was created today, the DFM file would be a JSON file. Since you're familiar with JSON, you know that if there's nothing in it, it's very small. The format used by DFM files is like JSON only simpler. DFM files are used to save property values defined in the Object Inspector at design time for a given form or DFM. If there are no components on the form or DM canvas, then the DFM file is basically empty. Also, since it's managed the same way as any other form in Delphi, you have the same degree of control of when they're created. By default, they're created automatically. But you can right-click on your project, select Options... then Forms and move it to the right-hand column which represents all of the forms and DMs that you do NOT want to be created automatically. A simpler way is to just right-click on the project, select View Source, and delete the line where a form or DM is created, then save it. In your code, simply call Create where you want an instance and save the reference in a variable somewhere. You can use the one that's emitted in the file by default (a global var in the unit), or you can use any other variable. You can also call it multiple times to create multiple instances. When you're done, just just Free them. I'm sorry for so many words, but I take a lot of words to explain stuff because it seems a lot of people cannot read between the lines of one-line answers. Since I have taken so many words to explain this, I hope you'll actually try it out and see that what I said earlier is correct.
  3. Yeah, I caught the "... and so on" part. That's where you wave your hands to refer to everything that happens when you push the button to launch the space shuttle, right? It does not matter what kind of bucket or box or container you put global variables into -- they're still defined at the global level, and they do not belong there. A Data Module is simply a class, which is a container just like records. It's referenced by the global instance var that every auto-created form in Delphi has, and there's usually just one of them. You can put anything into it that you want because, as I said, it's just a class. If one is senseless to use as a container, then the other is equally senseless for the same reasons. It's not worth arguing about or defending. You are free to have a preference for records over classes, and forms over data modules. It doesn't matter. They're all containers available to Delphi programmers and mostly interchangeable. Encapsulation in OOP terms is a way of creating abstractions around collections of related data and methods. Classes are an abstraction mechanism that lets you encapsulate state and model related behaviors that maintain the state, all in a single "object". Forms and Data Modules are both classes. Records are very similar only they have "pass by value" semantics rather than "pass by reference", so local instances usually take up a LOT of space on the stack. You also need to manage dynamic instances of them wtih additional syntactical decorations. I'm not talking about your personal preferences here, I'm talking about basic OOP principles, and particularly encapsulation. Because that's what the OP asked for help with.
  4. why? That might make sense if they're all related. It's roughly equivalent to moving them into a Data Module, because there's usually only one of them ever instantiated. But most of the time a bunch of global vars represent collections of related state vars that belong inside of distinct classes with setter and getter methods moved into the class as well. So you're simply putting bunch of global vars into a bucket that itself is a global var? All you've accomplished is adding a namespace to the global variables that makes references to the vars longer. It's like taking a long list of variables in a C program and putting STRUCT { ... } MYGLOBALS; around them. You're still dealing with the same bunch of global variables, and if you try defining multiple instances, you'll end up with quite a mess.
  5. David Schwartz

    Parallel.Join.OnStopInvoke(...

    I have this code: join := Parallel.Join .NumTasks( NumThreads_spinner.Value ) .OnStopInvoke( // called after EACH task completes procedure begin doSomething; end ) .TaskConfig( Parallel.TaskConfig.OnTerminated( // called when ALL tasks have completed procedure (const task: IOmniTaskControl) begin doSomethingElse; end ) ); The OnStopInvoke method only takes a TProc parameter. However, Parallel.TaskConfig.OnTerminated takes both a TProc and a procedure (const task: IOmniTaskControl). This seems sort of backwards, because the first one operates on a single task, while the second one is the entire list of tasks. (What task is being passed to OnTerminated? The last one what finished up?) Anyway, my question is ... how can I get the OnStopInvoke proc to know which task just finished?
  6. I've been going through the videos, and I'm left wondering what high-level abstraction is best to use for a batch of REST requests? The requests themselves are different, but they're all going through the same routine to the same service. Eg., the same Google queries for different cities, or hotel room availability at a bunch of different properties in the same hotel chain. That is, situations where you need to qualify the same query with a different geographical location, or different features at the same location. The point is, the API vendor requires you to submit the same query multiple times by varying a piece of the data each time. In this case, I can see using at least half of the abstractions that OTL provides. Indeed, they show loading multiple web pages in most of the examples. The program's users will want to process a batch of these requests, from 4 to 100 per batch, but typically 10-20 per batch on average. They typically take 8-12 seconds each. The purpose of these queries is to collect different sets of data to be analyzed at the next step. The user needs to wait for them all to complete before proceeding. They're all displayed in a list that I want to update in real-time to show what's going on with each one. When they've all completed I want to switch to another tab and take the user to the next step that relies on all of the data that was just obtained. Async/Await, Join, ForEach, ParallelTask, and ForkJoin all seem like equally viable candidates in this situation. How would you choose one over another if they all seem equally capable?
  7. David Schwartz

    The Curiously Recurring Generic Pattern

    Ok, after reading the article, it makes more sense. It will take some pondering to come up with legitimate use cases. The article ends with: A while back, I was trying to translate a Swagger spec into some Delphi code, and I ran into some issues that this might solve. The Swagger spec is read at run-time and the program generates Dephi code that gets compiled later on. The problem is that the typing isn't really known at run-time when you're reading the Swagger spec. But emitting code that resolves it at a subsequent compile time without having to know it when it's emitted might solve the problem...
  8. David Schwartz

    Windows Software Development Kit - why?

    I think it's optional unless there's some dependency that requires it. Which version of Delphi is this?
  9. Global variables in a unit often suggest there are classes that could be created to encapsulate them, possibly as static / class variables. How are they being used? Two things come to mind most frequently: they may be associated with singleton patterns; or they could be acting as buffers for data loaded from a DB or INI files. The presence of lots of global vars tells me the original writers probably used VB a lot and weren't very skilled with OOP concepts. (I once saw a Delphi app that had a ton of global vars in it, and a couple of classes that were used to "collect" a few dozen methods together. There were no data members in these classes; the methods all used the global vars. Yet the code had numerous places where they created and freed these "objects". it was the goofiest code I've ever seen. But overall, it looked like a big VB program simply translated into Pascal and made me want to gag. I told my boss I wouldn't touch it with a 10' pole.) I don't think moving methods that access global vars into TThreads is advisable because that can cause contention issues if they're ever updated, and that would require even more coding to resolve. If the algorithm goes through a process first and loads up data into the global vars, then treats it all as as read-only at run-time, then maybe having multiple threads could be useful, but they won't in the main thread. And that's an optimization that I'd leave for later. Why did you think of doing this first? (Just curious.) Off-hand, it sounds like a recipe for disaster. The threads will NOT run in the main form thread (they're separate threads!), and simply using properties won't buy you any kind of protection against contention at run-time between multiple threads unless you build that into the getters and setters -- which, BTW, won't know what threads they're dealing with unless you add even MORE global vars! They need to be encapsulated into classes FIRST or you're just creating a bigger, more complex mess for yourself. You need to study the code and see what it's doing. But start encapsulating things based on space first (with vars moved into classes), not time (based on threading). Identify groups of related variables and the methods that refer to them, and put them into separate classes and give each one a meaningful name. If the vars are only initialized once and then read-only, then make them static / class vars. They're probably used as configuration parameters. If they're used for buffering values being read in or processed, then they're going to be private data members that need properties defined to access them. The logic will be working like this: - initialize operational / config parameters (some of the global vars) - for each line or batch of input data do -- read the data and put it into some global vars -- process the data that was just read -- save the results somewhere - go on to the next step (if needed) -- eg, display results - wrap everything up and shut down If you've got some global vars that are all set up as arrays of the same length, this is a sign that you can put them into a class (as single vars, not arrays) and you'd have as many instances as the length of the arrays. You could then put the class instances into a single array or list.
  10. David Schwartz

    Which option to use for a large batch of REST queries?

    This is a test jig and I've been juggling things around trying to see what the differences are.
  11. David Schwartz

    task thread priority?

    In general, is there a way to set task thread priority in Windows? (Both in the PPL and OTL.) It seems that if you have more threads than cores, a thread that you want to run on a regular interval can get starved for CPU time by the others that are getting shuffled in and out of cores until they're all done.
  12. David Schwartz

    task thread priority?

    great to know. My test jig is just calling Sleep() because ... that what's used in all of the threading examples I've seen. My main app isn't working yet, but it will block on calls to the HTTPGet routines. Is there a better way to do that if you're multi-threading and want to minimize your overall processing time for a big batch of REST calls to the same API?
  13. David Schwartz

    task thread priority?

    As I said, this is just a test jig for me to get famliar with the OTL and driving the UI properly. I'll be sending out a bunch of requests to my own middle-tier service that forwards them on to the same destination service. My testing on up to 50 queries has shown they take between 3 and 12 seconds to process each one. My middle tier service extracts a handful of details from the full response packets and returns that to the waiting client threads. The overhead in processing the replies is minimal, so there's not much point in holding the data in the middle tier until all replies have been received, then dumping it all down to the client at that point. The client would have nothing to do at that point while waiting for the reply data to arrive. I chose to use the Join abstraction because my application needs to process all of these requests in parallel then wait to get all of the response data back before it can proceed to do anything further. If you have 100 requests to process and they take an average of 10 seconds each, that's 1000 seconds to process them all in sequence. But in parallel, it'll be closer to 30 seconds. THAT is the ONLY metric I'm trying to shrink by using multi-threading. All of the rest is unaffected. Do you mean "cores" instead of "kernals"? If so, my tests show otherwise. And intuitively it makes no sense either. Let's say you have 40 threads and 4 cores. Each thread can run and send off its request, then go to sleep waiting for a reply. It's very possible that all 40 of those threads could send out their queries before the first reply is received. I don't understand why a bunch of threads all waiting on something to happen are saturating all of the CPU cores while basically doing NOTHING! At least by increasing the thread pool size you have a far better chance that each thread will send its request, then go to sleep and let another thread do the same. If you see how things work based on how this test jig works, Windows does a really poor job of reallocating threads to cores when threads have nothing to do. It's clear that increasing the size of the thread pool results in lower overall processing time, up to some point, whereupon the total processing time starts to creep up as the thread pool size grows. That said, I've seen numerous benchmarks from Java and C#/.NET applilcations that show the overhead in their threading code is so high that the break-even point on saving time by multi-threading is absurdly high. So this test jig shows I can get a serious reduction in overall processing time with this approach.
  14. David Schwartz

    task thread priority?

    The project in question is attached to my big post at the end of the Third-Party -> OTL board. (the code is in a Zip file, but the main logic is shown in the post.) I wrote it to test OTL's Join abstraction. However, the original form used Delphi's PPL to update some statistics in the form. I moved them to a status bar at the bottom of the window. Under the hood, both PPL and OTL are using the same Windows logic, so the differences in higher-level abstractions is irrelevant for testing purposes. (This IS just a TEST jig.) I can see what's going on in my head, but I'm having trouble getting it expressed correctly in code. I'm testing out different things, and have gotten this far, changing the TSystemInfo.Create method in the Classes.SystemInfo.pas unit to this: class constructor TSystemInfo.Create; begin // create "dummy" task, that will start System.Threading's monitoring, and keeps it running // required to get meaningful values for CPU Usage //TTask.Run( var aThread := TThread.CreateAnonymousThread( procedure begin sleep(100); end); aThread.Priority := tpHigher; aThread.Start; //platform won't change at runtime FPlatform.Detect; end; I changed TTask.Run to the line below it and then set the Priority to tpHigher and Started the thread. Happily, this corrected the problem with the Join logic not updating the UI for a lot of tasks, which is not obvious but it makes sense to me. Here's why... My CPU has 4 cores. When I set the test to use just 1 core, everything runs as if it's synchronous; ie, there's no parallelism going on. (That makes sense to me given a message-driven system and nothing limiting the amount of time threads can bogart a core. In an interrupt-driven system with caps on CPU time for each task, I don't think the cores would be so saturated. The test generates a bunch of tasks that have delays and a variance that is randomly added or subtracted from the base value. By default, it's 8 +/- 3, so the delay values will be 5-11 seconds. Now, based on the # of cores and the range of delays, there will be some ratio of threads to cores where all of the threads will essentially start at the same time, and will finish after their appointed delay times (all rounded to the nearest full second). When a thread finishes, it will be moved to the right-hand list. Sometimes they will be sorted by delay time, and will appear in ascending order, even when the #tasks exceeds #cores by quite a bit. (Try it, you'll see.) As you adjust the delay factors and ratio of threads to cores, at some point there will be enough contention for cores that some delays will occur, resulting in the threads no longer sorted as nicely in the right-hand list. At some point, the contention for CPU time will become so high that the ordering of completion will appear fairly random vs. their specified delay times. Without boosting the priority on the thread above, all of the requests to update the UI get added to the end of the thread queue, and the contention starts right away -- I've never see a situation where the threads are sorted in the right-hand list by ascending delay times except where the #threads <= #cores. Again, this makes perfect sense to me, although it's not very intuitive. That said, there may be little things I'm overlooking, because I think the point at which the #threads to #cores starts to mess up the ordering of things in the right-hand list by ascending delays should be much higher than what I'm seeing. There is one side-effect of changing that TSystemInfo.Create method from using TTask.Run to use a TThread instead, and that has to do with the way the GetCurrentCPUUsage works -- it always displays 0%. This has no effect on the above logic, it's just a quirk of something in the PPL. I'd like to replace the PPL code with OTL code, but I'm not clear which abstraction is best to use for it.
  15. David Schwartz

    Delphi MRU Project Manager

    Either one ... but feel free to address both. (The images suggest neither one, which is why I asked.)
  16. David Schwartz

    task thread priority?

    Adjusting priorities is a primary way you can ensure that tasks intended to run periodically actually DO run periodically. That's the problem I'm faced with here, and the default settings are causing events messages that should run a FG process periodically all get clumped at the end of the message list (or task queue). They need to run when they're triggered, not when everything else is finished. I have a lot of experience doing real-time control stuff on single-CPU (ie, single-core) systems. We didn't have this problem for a variety of reasons. One was that the Scheduler would wake up periodically and see if there were any higher-priority tasks that needed to run. It would also send tasks that had been running "too long" to the end of the line. And idle tasks would not eat up any CPU time at all. In my case, I've got threads for tasks that take many wall-clock seconds to run, although 99% of that time is spent waiting for a reply to arrive from off-site. In theory, they should be stuffed into an "idle queue" while they're blocked so they don't bogart the cores. I set the thread pool to a relatively large number to ensure as many tasks are waiting for replies as possible. But what happens is they saturate all available cores instead of sitting in an idle queue, and the thread that's supposed to update the UI never gets a chance to run. If you have 50 tasks that all sit for 10 seconds waiting for a reply, the CPU should not be saturated with only 'n' tasks running (where n = #cores). If the response time varies from 5 - 15 secs randomly, the CPU cores should not have a few tasks saturating them waiting on their replies while other tasks in the queue that HAVE received replies are sitting waiting to get time to run. This is how things are working right now, and Windows does not seem to be doing a very intelligent job of managing any of it. The periodic task needs to have its priority raised so it runs ahead of the others when it wakes up. The others would do well to have their priority dropped when they begin waiting for their reply so others can get CPU time, and when a reply arrives it would restore the priority of its sleeping task. If anybody has any suggestions other than adjusting task priorities, I'm all ears.
  17. David Schwartz

    task thread priority?

    Thanks! Well, I can plainly see that the Windows thread scheduler is failing to do what I want. The UI is locking-up after processing a couple of things, even though there's a 300ms timer being used to update it, so nothing further is visible to the user until the number of tasks left in the queue is less than the number of cores, at which point it's like dumping 10 gallons of water on someone all at once who was expecting to see one gallon per minute for 10 minutes. Oddly, in Windows, when you set the thread pool size to 1, the whole asynchronous threading model breaks down and everything runs serially with no asynchronism at all. Which is why Windows had (maybe still has?) this odd "Yield" method that you have to sprinkle-in liberally throughtout your code to ensure no one task hogs too much CPU time. There are warnings I've read that say to beware of this situation where a single thread can hijack and saturate the CPU because everything runs at the same process priority by default. You can solve this by boosting the priority of tasks that are intended to run periodically (ie, on a timer), for example, to ensure they actually run when their timer triggers them rather than having the timer stuff a message at the end of the message queue that's not processed until everything else has finished. The task triggered by the timer needs to actually RUN periodically, not just at the end. I found out that the OTL also has a way to set a thread's priority, but it took quite a while to track down in the manual.
  18. David Schwartz

    Which option to use for a large batch of REST queries?

    I've decided that parallel.join works best for my needs. Here's part of a test I built. The full code is attached in a zip file. Note that I prefer to use Raize Compnents (Konopka) but you can probably replace them with regular versions here. var join : IOmniParallelJoin; // I have 2 pages in a PageControl // The first has a TListView that contains a list of things to process, with objects attached to the .Data property // The second has a TListView that shows completed items. // Each item in the first one is processed, then moved to the second one. This way, you can see the list shrinking as tasks complete. procedure TThreadingTest3_frm.Go_btnClick(Sender: TObject); begin Main_ntbk.ActivePageIndex := 0; Completed_lview.Items.Clear; if (Ready_lview.Items.Count = 0) then ShowMessage( 'add some tasks first!' ) else begin Go_btn.Enabled := False; StartProcessing( Ready_lview ); end; end; procedure TThreadingTest3_frm.StartProcessing( aLV : TRzListView ); begin var ntasks := 5; // if this is = 1 ==> everything runs syncronously, not async join := Parallel.Join .NumTasks( ntasks ) .OnStopInvoke( procedure // AFTER <<EACH TASK>> COMPLETES, run this in the main thread begin UpdateLV; // update the ListView in the main thread end ) .TaskConfig( Parallel.TaskConfig.OnTerminated( // AFTER <<ALL TASKS>> COMPLETE, run this in the main thread procedure (const task: IOmniTaskControl) begin UpdateLV; // update the ListView in the main thread end ) ) ; start_tm := Now; for var li in aLV.Items do // join is a list; this adds tasks to it ProcessListItem( li ); join.NoWait.Execute; // this schedules everything to run, then returns end; procedure TThreadingTest3_frm.ProcessListItem( aLI : TListItem ); begin join.Task( // creates a new task and ads it to the list procedure (const joinState: IOmniJoinState) begin var obj := TmyClass(aLI.Data); obj.ElapsedTm := 0; // signifies we've started joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread var elapsed := DSiTimeGetTime64; // do something DSiuSecDelay( obj.DelayAmt * 1000 * 1000 ); // micro-seconds // done obj.ElapsedTm := DSiElapsedTime64( elapsed ); // in msecs joinState.task.Invoke( UpdateLV ); // update the ListView in the main thread end ); end; procedure TThreadingTest3_frm.UpdateLV; // this must run in the main thread begin var we_are_done := (Ready_lview.Items.Count = 0); for var li in Ready_lview.Items do begin var obj := TmyClass(li.Data); if Assigned(obj) then begin if obj.isReady then li.SubItems[1] := 'Started...' else if obj.isFinished then begin var li2 := Completed_lview.Items.Add; li2.Caption := li.Caption; li2.SubItems.Assign( li.SubItems ); li2.SubItems[1] := 'Finished!'; li2.SubItems[3] := Format( '%f', [obj.ElapsedTm / 1000] ); obj.markComplete; obj.Free; li.Data := NIL; li.Delete; end; end; Application.ProcessMessages; end; Application.ProcessMessages; if we_are_done then begin if (Main_ntbk.ActivePage <> Completed_TabSheet) then begin Main_ntbk.ActivePage := Completed_TabSheet; ShowMessage( 'ALL DONE! Total time = '+Format( '%f', [SecondSpan( Now, start_tm )] ) ); end; Go_btn.Enabled := True; end; end; parallel_join_test3.zip
  19. David Schwartz

    Frequent and/or annoying typos you make while coding

    'g' gets flopped a lot, esp. in -ing -> -ign. It happens so often that I've gotten to the point where if there's not a spell-check line under it, I have to really think about it.
  20. David Schwartz

    Which option to use for a large batch of REST queries?

    Ok, got it. But I like abstractions! 🙂
  21. David Schwartz

    Which option to use for a large batch of REST queries?

    I can imagine many different uses for this, for sure! It's a way of implementing a *nix command-line pipe expression in code. In my case, I'm just grabbing a bunch of data, then when it has all been fetched I let the user select pieces of it and it displays a bunch of related items in a heatmap. There's no "import" process required.
  22. David Schwartz

    Which option to use for a large batch of REST queries?

    Thanks, but you're pointing at something that amounts to mere milliseconds of added overhead on a 2GHz CPU. Meanwhile, these tasks are spending ~95% of their time waiting for a reply, which is MANY orders of magnitude greater than your concerns of any inefficiencies caused by the abstraction layers. So why should I care? The user's experience won't be perceptible. I suspect I can set up a thread pool of 50 for a 2-core CPU and I'm guessing it will still not saturate either core. I'm looking forward to testing it. Anyway, this library offers 8 different high-level abstractions, and it's a library I'm just getting familiar with. So I'm still curious what approach others with more experience using it might choose.
  23. David Schwartz

    Which option to use for a large batch of REST queries?

    I guess "huge" is relative. 🙂 As I stated: I'm sending these requests to my own service which forwards on the requests, waits for replies, extracts the exact details needed and returns them to my client app. The rest of the data can be downloaded as well if desired, but it's superfluous in most cases. (And the overhead of downloading that data is minor.) If we have 100 requests per batch and most are just sitting there waiting for a reply, I'm guessing they can all be processed in ~20 seconds of wall-clock time instead of 100 x ~12 secs if processed in series. Even if we're only processing 10, the total processing time is STILL going to be ~12 seconds because that's how long it typically takes for one request, and we cannot speed that up. I cannot do anything about the +/- 25% processing time variations of the service that's being called, so I don't see that optimizing anything will improve overall performance. Do you?
  24. David Schwartz

    Which option to use for a large batch of REST queries?

    Sorry, what does this have to do with calling a REST API in OTL?
  25. David Schwartz

    Which option to use for a large batch of REST queries?

    I think they all do that in slightly different ways.
×