Your disk can read data into memory without the CPU being involved, up to some buffer size. So while the disk is reading some chunk to memory, the CPU is free to do other things. And newer types of disks, like SSDs, can actually read different parts of the disk in parallel, which is one reason for SSDs to be so much faster then traditional drives. To do that, SSDs actually come with their own micro-controller which will handle it, again, freeing the CPU to do other things.
So in general, parallel reads from disk should not be faster, but with SSDs, they often are to some extent. This could even be true of HDDs. Thereās a few reasons I know:
-
It might be youāre not saturating disk IO, so if for example you read a chunk and go to process it, and then read another chunk, during the time you went to process the first chunk, you were not reading from the disk anymore. So you wasted time you could have been reading. So if you parallelize your IO requests, when the first returns from the disk and your CPU goes to process it, while the CPU is processing, the disk can begin reading the next chunk from disk, so as soon as the CPU is done, the chunk is ready for it, etc.
-
Some OS and combination of hardware can be smarter about the IO scheduling. If you tell the scheduler your next 5 IO requests, it might realize it is faster to do them in a different order, due to where the data is on the disk. But if you tell it one, go to process it, then tell it the other, etc. It wonāt have a chance to come up with a better ordering.
-
Some OS and combination of hardware actually support parallel reads. SSDs for example can read from multiple places in parallel. So again, if the scheduler and IO controller knows about a set of requests you want from it, it could decide to actually parallelize some of them.
EDIT: By the way, in your case, I donāt think the disk read is the slowdown, since you mentioned things that take multiple seconds. On my machine, I can read 20 files each of 50k lines of random numbers in 50ms. I suspect the time it takes you to parse each line, prepare the query request, connect to the DB, the network time to send the request to the DB and the DB to answer is where time is spent, and not really the read from disk. That said, keep in mind the network behaves somewhat similarly to IO, your network card can send and receive packets from/into buffers without the CPU being involved, and obviously your DB can process things while your machineās CPU do other things as well. So thereās lots of parallelization going on here between CPU and IO.