Reading million files from a share in windows

Hi, has any one experienced reading metadata (created, modified, file name, extension etc) not the contents of file of 1 million files. Files are in normal shared drive backed by non ssd from single root directory with little subdirectories. We just need metadata of all files no need for directory or sub dir info. I am basically a c# programmer would do this using task parallel Library. I heard go is much faster for tasks like this using goroutines, has any one done similar task before and have any stats on time it took and approach taken for development such solution.
Program need to run in windows server 2019 not Linux based os. Thanks for looking.

For querying metadata of a million files from a single non-SSD disk, I suspect you could just use PowerShell or bash. Neither the programming language nor CPU will likely be the bottleneck here; it’ll likely be the filesystem and physical device characteristics that limit the performance. I had to query file modified dates and read the contents of 16 million files a few years ago and used Go for it because I wanted to use goroutines, but I don’t think the performance will be aignificantly different than the same thing in C# with async/await.

Yes.

You are proposing to attempt parallel access (CPUs) to a serial device (HDD).

HDD seek time and rotational latency is in the order of several milliseconds: Wikipedia: Hard disk drive performance characteristics

Here are some demonstration results. These are Linux results.

A root directory with 102 subdirectories with subdirectories:

For one goroutine for root directory with subdirectories, 29 seconds,

1 goroutine
29.049492835s 236536 files
real 0m29.113s
user 0m1.668s
sys 0m3.516s

For 102 goroutines, one for each root subdirectory with subdirectories, 52 seconds,

102 goroutines
52.177539348s 236483 files
real 0m52.207s
user 0m2.253s
sys 0m4.944s

There is very little CPU time (user + sys) and a lot of waiting for HDD I/O (real is wall clock). For 102 goroutines, there is a lot of HDD contention.

YMMV

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.