Feels good to post after a long time. I always hear HPC systems people flapping their mouths in context of I/O performance measures in distributed file systems like Lustre or CXFS about Direct I/O so I thought lets dig this. You might have seen usage of dd command on my blog with oflag parameter, but for guys who don't know I have briefly revised it here.
dd if=/dev/zero of=/tmp/testfile bs=2M count=250 oflag=direct
oflag here indicates write operation through dd in accordance with provided symbols, i.e. direct. What exactly is direct we will see a little later in the post.
dd if=/tmp/testfile of=/dev/null bs=2M iflag=direct
Similar to write operation, read operation takes iflag as a parameter with same symbols, particular to our interest is direct.
If you fire these commands with or without oflag/iflag you will notice a significant I/O performance difference in statistics provided by dd. This is basically the effect of cache employed by modern day storage systems/Linux kernel.
Now these caches can be multilevel going right from operating systems buffer cache to storage controller cache to hard drive cache, so depending upon the underlying system architecture cache effects will appear. Filesystem software also plays a huge role in caching behavior. A traditional distributed filesystem might leverage multiple caches on multiple LUN's distributed across multiple storage controllers. An object-based filesystem such as Lustre will have multiple OSS (Object Storage Server's) which will leverage it's own independent OS buffer cache to enhance performance. I am going to do a separate detailed post shortly about Lustre Performance Impact due to OSS Cache. My point is benchmarks of cache effects of a specific HPC system as whole is not comparable to another system unless all the granular details are known and acting in the same direction. Cache effect cannot be completely removed in today's complex systems, we can try to tell underlying components to not use cache if they are configured to accept such requests. When you open a disk file with none of any flags mentioned below, a call to read() & write() for that file returns as soon as data is copied into kernel address space buffer, actual operation happens later on depending upon operating system. Buffer usually defaults to 2.5% of physical memory but this is subject to change depending upon different Linux kernel tree. We also see what is the difference between "buffers" section and "cached" section of free command in later section of this post.