I want to count newlines in a huge string (C-like string that ends with a #0), like this:
while P^ <> #0 do begin
case P^ of
#10: begin
NewlineCount := NewlineCount + 1;
P := P + 1;
end;
else begin
P := P + 1;
end;
end;
end;
The string is huge (5 GBytes) so I want to do thing in parallel.
I tried Delphi's TParallel.For() but was really slow. OmniThread's Parallel.For() does not support Int64 (and is probably also slow?).
So I came up with this:
// Create 4 tasks (my CPU has 4 cores)
Task1 := CreateTask(TaskProcedure);
Task2 := CreateTask(TaskProcedure);
Task3 := CreateTask(TaskProcedure);
Task4 := CreateTask(TaskProcedure);
// Give each task 1/4 of the string
Task1.SetParameter('P', P + 0 * Count / 4);
Task1.SetParameter('Count', Count / 4);
Task2.SetParameter('P', P + 1 * Count / 4);
Task2.SetParameter('Count', Count / 4);
Task3.SetParameter('P', P + 2 * Count / 4);
Task3.SetParameter('Count', Count / 4);
Task4.SetParameter('P', P + 3 * Count / 4);
Task4.SetParameter('Count', Count / 4);
// Return values
Task1.SetParameter('PNewlineCount', @NewlineCount1);
Task2.SetParameter('PNewlineCount', @NewlineCount2);
Task3.SetParameter('PNewlineCount', @NewlineCount3);
Task4.SetParameter('PNewlineCount', @NewlineCount4);
// Run the tasks in parallel
Task1.Run();
Task2.Run();
Task3.Run();
Task4.Run();
// Wait for each task to end
Task1.WaitFor(INFINITE);
Task2.WaitFor(INFINITE);
Task3.WaitFor(INFINITE);
Task4.WaitFor(INFINITE);
// Total
NewlineCount := NewlineCount1 + NewlineCount2 + NewlineCount3 + NewlineCount4;
This is the actual thread:
procedure TaskProcedure(const OmniTask: IOmniTask);
var
Count: Int64;
NewlineCount: Int64;
P: PChar;
PEnd: PChar;
begin
P := OmniTask.Param['P'].AsPointer;
Count := OmniTask.Param['Count'].AsInt64;
PEnd := P + Count;
NewLineCount := 0;
while P <> PEnd do begin
case P^ of
#10: begin
NewlineCount := NewlineCount + 1;
P := P + 1;
end;
else begin
P := P + 1;
end;
end;
end;
(OmniTask.Param['PNewlineCount].AsPointer)^ := NewlineCount;
end;
This works really great! It is indeed almost 4x faster (actually 3x) than the original non-Parallel procedure.
Actually, if I create 8 tasks instead of 4 it is even faster, even if my CPU only has 4 cores.
I have some questions:
I create 4 tasks for a CPU with 4 cores, but I am not sure if each task is actually a different thread. Should I create 4 threads instead of 4 tasks? Is that possible in OmniThread?
The main thread does nothing and waits for each task to end (Task.WaitFor(INFINITE)). Does this mean that 1 CPU core is doing nothing? Should I calculate part of the problem in the main thread?