Here's how I would solve it - in theory:
Assign each tile a sequential number.
Since you know the size of the target bitmap (TargetSizeX*TargetSizeY) and you know the size of each tile (TileSizeX*TileSizeY), calculating the tile number is simple:
TileCountX := ((TargetSizeX + TileSizeX-1) div TileSizeX);
TileCountY := ((TargetSizeY + TileSizeY-1) div TileSizeY);
TileCount := TileCountX * TileCountY; // Tile number goes from 0 to TileCount-1
// Tile coords from Tile number
TileX := (TileNumber mod TileCountX) * TileSizeX;
TileY := (TileNumber div TileCountY) * TileSizeY;
The job of reading a tile from the database can be delegated to one or more tasks, depending on how you choose to partition the workload.
A DB tasks reads a request from a (threadsafe) queue, performs the database request, stores the result in the request object and notifies the requestor that the result is ready. The request object contains: 1) Tile number, 2) Result bitmap and 3) Signal object (e.g. an event handle).
Create a number of tasks to render the tiles.
Each task grabs a Tile Number (just use a InterlockedIncrement() on a shared integer), creates a DB request object, queues it and waits for the result. Once the result is ready the task draws the tile onto the target bitmap (*) and starts over. To avoid cache conflicts it would be best if the Tile Numbers currently being worked on are as far apart as possible, but I guess the DB overhead will make this optimization irrelevant.
*) A TBitmap32 is just a chunk of sequential memory and since none of the tile tasks will write to the same memory it is not necessary to lock the target bitmap.
So in short: One thread pool to render the tiles and one thread pool to read from the database. A work queue in between them.
However like @Cristian Peța said, unless you are using some super fast ninja science fiction database, there's no reason to try to optime the rendering much. All the tiles can probably be rendered in the time it takes to make a single database request. In fact, using graphics32, a thread context switch will take far longer that drawing a single tile.
So in practice I would probably just do away with the DB tasks and execute the database request directly from the rendering tasks.