Gerben Abbink 0 Posted June 30, 2022 I want to count newlines in a huge string (C-like string that ends with a #0), like this: while P^ <> #0 do begin case P^ of #10: begin NewlineCount := NewlineCount + 1; P := P + 1; end; else begin P := P + 1; end; end; end; The string is huge (5 GBytes) so I want to do thing in parallel. I tried Delphi's TParallel.For() but was really slow. OmniThread's Parallel.For() does not support Int64 (and is probably also slow?). So I came up with this: // Create 4 tasks (my CPU has 4 cores) Task1 := CreateTask(TaskProcedure); Task2 := CreateTask(TaskProcedure); Task3 := CreateTask(TaskProcedure); Task4 := CreateTask(TaskProcedure); // Give each task 1/4 of the string Task1.SetParameter('P', P + 0 * Count / 4); Task1.SetParameter('Count', Count / 4); Task2.SetParameter('P', P + 1 * Count / 4); Task2.SetParameter('Count', Count / 4); Task3.SetParameter('P', P + 2 * Count / 4); Task3.SetParameter('Count', Count / 4); Task4.SetParameter('P', P + 3 * Count / 4); Task4.SetParameter('Count', Count / 4); // Return values Task1.SetParameter('PNewlineCount', @NewlineCount1); Task2.SetParameter('PNewlineCount', @NewlineCount2); Task3.SetParameter('PNewlineCount', @NewlineCount3); Task4.SetParameter('PNewlineCount', @NewlineCount4); // Run the tasks in parallel Task1.Run(); Task2.Run(); Task3.Run(); Task4.Run(); // Wait for each task to end Task1.WaitFor(INFINITE); Task2.WaitFor(INFINITE); Task3.WaitFor(INFINITE); Task4.WaitFor(INFINITE); // Total NewlineCount := NewlineCount1 + NewlineCount2 + NewlineCount3 + NewlineCount4; This is the actual thread: procedure TaskProcedure(const OmniTask: IOmniTask); var Count: Int64; NewlineCount: Int64; P: PChar; PEnd: PChar; begin P := OmniTask.Param['P'].AsPointer; Count := OmniTask.Param['Count'].AsInt64; PEnd := P + Count; NewLineCount := 0; while P <> PEnd do begin case P^ of #10: begin NewlineCount := NewlineCount + 1; P := P + 1; end; else begin P := P + 1; end; end; end; (OmniTask.Param['PNewlineCount].AsPointer)^ := NewlineCount; end; This works really great! It is indeed almost 4x faster (actually 3x) than the original non-Parallel procedure. Actually, if I create 8 tasks instead of 4 it is even faster, even if my CPU only has 4 cores. I have some questions: I create 4 tasks for a CPU with 4 cores, but I am not sure if each task is actually a different thread. Should I create 4 threads instead of 4 tasks? Is that possible in OmniThread? The main thread does nothing and waits for each task to end (Task.WaitFor(INFINITE)). Does this mean that 1 CPU core is doing nothing? Should I calculate part of the problem in the main thread? Share this post Link to post
David Heffernan 2353 Posted June 30, 2022 1 hour ago, Gerben Abbink said: Should I create 4 threads instead of 4 tasks? No. The scheduler will put the tasks onto different threads for you. 1 hour ago, Gerben Abbink said: Does this mean that 1 CPU core is doing nothing? No. It means that the main thread is doing nothing, but since there are as many worker threads as cores, the CPU will be fully busy. 1 hour ago, Gerben Abbink said: Should I calculate part of the problem in the main thread? No. Looking at your code, I guess you need to generalise it to work with an arbitrary number of tasks. This will require you to use some arrays in place of variables named 1, 2, 3, etc. It also looks like you are doing floating point division and I'm not convinced that you will count every single character, and also that you won't count some characters twice. I think you need to be more precise about the bounds of each task's work. Share this post Link to post
Stefan Glienke 2019 Posted June 30, 2022 You don't need to manually create tasks - just use a TParallel.For and do proper partitioning - I wrote an answer on Stackoverflow about that subject some while ago. The code there which did some array processing should be easily be adaptable to your usecase Share this post Link to post
Gerben Abbink 0 Posted July 8, 2022 You are right, this is wrong: NewlineCount1 = Count / 4; NewlineCount2 = Count / 4; NewlineCount3 = Count / 4; NewlineCount4 = Count / 4; Now I do this: NewlineCount1 = Count div 4; Count := Count - NewlineCount1; NewlineCount2 = Count div 3; Count := Count - NewlineCount2; NewlineCount3 = Count div 2; Count := Count - NewlineCount3; NewlineCount4 = Count div 1; Count := Count - NewlineCount4; I will also look at Stackoverflow. Share this post Link to post