Re: [linux-lvm] Tracing IO requests?

From: Ray Morris <support@bettercgi.com>
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] Tracing IO requests?
Date: Wed, 2 Mar 2011 17:44:12 -0600	[thread overview]
Message-ID: <20110302174412.06eecd67@bettercgi.com> (raw)
In-Reply-To: <4D6ECC25.8010502@redhat.com>

> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads?  It should be 99% writes. 

cp has to read something before it can write it elsewhere.
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php

On Wed, 02 Mar 2011 18:00:53 -0500
Dave Sullivan <dsulliva@redhat.com> wrote:

> http://sourceware.org/systemtap/examples/
> 
> look at traceio.stp and disktop.stp
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/index.html
> 
> On 03/02/2011 05:19 PM, Bart Kus wrote:
> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> >> I once used a tool called dstat. dstat has modules which can tell
> >> you which processes are using disk IO. I havenâ€™t used dstat in a
> >> while so maybe someone else can chime in
> >
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads?  It should be 99% writes.  Another
> > thing I noticed is the average request size is pretty small:
> >
> > 14:06:20          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  
> > avgqu-sz     await     svctm     %util
> > [...snip!...]
> > 14:06:21          sde    219.00  11304.00  30640.00    191.53      
> > 1.15      5.16      2.10     46.00
> > 14:06:21          sdf    209.00  11016.00  29904.00    195.79      
> > 1.06      5.02      2.01     42.00
> > 14:06:21          sdg    178.00  11512.00  28568.00    225.17      
> > 0.74      3.99      2.08     37.00
> > 14:06:21          sdh    175.00  10736.00  26832.00    214.67      
> > 0.89      4.91      2.00     35.00
> > 14:06:21          sdi    206.00  11512.00  29112.00    197.20      
> > 0.83      3.98      1.80     37.00
> > 14:06:21          sdj    209.00  11264.00  30264.00    198.70      
> > 0.79      3.78      1.96     41.00
> > 14:06:21          sds    214.00  10984.00  28552.00    184.75      
> > 0.78      3.60      1.78     38.00
> > 14:06:21          sdt    194.00  13352.00  27808.00    212.16      
> > 0.83      4.23      1.91     37.00
> > 14:06:21          sdu    183.00  12856.00  28872.00    228.02      
> > 0.60      3.22      2.13     39.00
> > 14:06:21          sdv    189.00  11984.00  31696.00    231.11      
> > 0.57      2.96      1.69     32.00
> > 14:06:21          md5    754.00      0.00 153848.00    204.04      
> > 0.00      0.00      0.00      0.00
> > 14:06:21    DayTar-DayTar    753.00      0.00 153600.00
> > 203.98 15.73     20.58      1.33    100.00
> > 14:06:21         data    760.00      0.00 155800.00    205.00   
> > 4670.84   6070.91      1.32    100.00
> >
> > Looks to be about 205 sectors/request, which is 104,960 bytes.
> > This might be causing read-modify-write cycles if for whatever
> > reason md is not taking advantage of the stripe cache.
> > stripe_cache_active shows about 128 blocks (512kB) of RAM in use,
> > per hard drive.  Given the chunk size is 512kB, and the writes
> > being requested are linear, it should not be doing
> > read-modify-write.  And yet, there are tons of reads being logged,
> > as shown above.
> >
> > A couple more confusing things:
> >
> > jo ~ # blockdev --getss /dev/mapper/data
> > 512
> > jo ~ # blockdev --getpbsz /dev/mapper/data
> > 512
> > jo ~ # blockdev --getioopt /dev/mapper/data
> > 4194304
> > jo ~ # blockdev --getiomin /dev/mapper/data
> > 524288
> > jo ~ # blockdev --getmaxsect /dev/mapper/data
> > 255
> > jo ~ # blockdev --getbsz /dev/mapper/data
> > 512
> > jo ~ #
> >
> > If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data 
> > drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how
> > can optimal IO ever be done???  I re-mounted XFS with 
> > sunit=1024,swidth=8192 but that hasn't increased the average 
> > transaction size as expected.  Perhaps it's respecting this maxsect 
> > limit?
> >
> > --Bart
> >
> > PS: The RAID6 full stripe has +2 parity drives for a total of 10,
> > but they're not included in the "data zone" definitions of stripe
> > size, which are the only important ones for figuring out how large
> > your writes should be.
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-lvm
> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>