Mathieu AVILA put forth on 9/22/2010 5:26 AM:
I have run my test again with default parameters for mkfs.
I still have this issue. For 20 seconds, the writes are either stalled,
or very slow.
I have run "vmstat" at the same time than "dd", and it appears that the
block device continues to receive write requests, while "dd" is blocked
in the kernel.
With blktrace, I can see that during this period of time, the block
receives a lot of small write requests throughout the volume ranging
from the start till the point where the file has stopped writing. During
the other periods of time, the volume is written normally, starting at
offset 0 and filling the disk continuously.
What happens with "dd if=/dev/zero of=/DATA/big oflag=direct"? You said
the copy is hanging in the kernel. Maybe a buffer cache issue?
What fstab mount options are you using for this filesystem?
Could this be an effect of tree rebalancing for extents management (both
inode of big file and free space trees) ? Can it be a hardware problem ?
Have you ever seen that issue before ?
WRT tree rebalancing, that's beyond my knowledge level and someone else
will need to jump into this thread. If it's a hardware problem you
should be seeing something in dmesg or the kernel log, or both. If
you're not seeing controller or device errors it's probably not a
hardware problem. Have you tried this same test with only one of those
two 500GB drives, no mdraid stripe? That would eliminate any possible
issues with your mdraid implementation. Speaking of which, could you
please share your mdraid parameters for this stripe set? That could be
a factor as well.