Török Edwin wrote: > On 2008-07-18 13:31, Martin Peschke wrote: >> recent changes: >> - added man page >> - beautified human-readable output >> - fixed an x86 compile error caused by incomplete endianess handling >> - fixed some x86 __u64 vs. unsigned long compiler warnings >> - fixed checking of a command line argument >> >> >> blkiomon periodicaly generates per devive request size and request latency >> statistics from blktrace data. It provides histograms as well as data that >> > > Does it also measure latency caused by the request queues being full? > (which happens around here): > blk-core.c:get_request: > /* > * The queue is full and the allocating > * process is not a "batcher", and not > * exempted by the IO scheduler > */ > goto out; > > That would be very useful to trace some high latencies I am seeing, see > discussion here: > http://lkml.org/lkml/2008/7/12/104 > > Best regards, > --Edwin Hi Edwin - If you could try this, it might provide some more data. This is very hastily thrown together, so it may need some work. Basically: I wrote a Python script which chains together blktrace | blkparse and takes the output looking for sleep requests, and then a successful get request on that same device. It then outputs min/avg/max as in: 1 sleepers 0.142157151 1 sleepers 0.174224178 3 sleepers min= 0.027375521 avg= 0.048117339 max= 0.075909947 1 sleepers 0.132267452 1 sleepers 0.030572955 2 sleepers min= 0.060603135 avg= 0.071087077 max= 0.081571020 1 sleepers 0.002082554 1 sleepers 0.033337675 1 sleepers 0.010796369 (with 1 sleeper, I leave off the min/max stuff). The values are in seconds (so above I'm seeing 0.01 to 0.17+ seconds of sleeping...) To run it (as root): # ./qsg.py [ ...] So, for me: # ./qsg.py /dev/sda This will produce output only when sleeps-for-requests occur. (I think other logic in the block I/O layer will put off other potential requesters - I'm looking into how to measure that next.) For now the script just assumes that the debugfs stuff is mounted at /sys/kernel/debug... Let me know if this needs some tweaking... BTW: I can't seem to reproduce your problem (I do see the sleeps from my script, but the system seems responsive otherwise). I have a 4-core (dual-socket) Xeon box w/ 8GB RAM plus 5 SAS drives. I'm using Ubuntu 8.04 w/ an Ext3 FS for root (where I was running the test). I tried bumping the dd' counts (large amount of RAM), but that didn't make a difference. I will try some more dd's just in case... Alan