linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] Tracing IO requests?
@ 2011-03-02 20:09 Bart Kus
  2011-03-02 20:13 ` Jonathan Tripathy
  2011-03-03 10:29 ` Bryn M. Reeves
  0 siblings, 2 replies; 9+ messages in thread
From: Bart Kus @ 2011-03-02 20:09 UTC (permalink / raw)
  To: linux-lvm

Hello,

I have the following setup:

md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS

When copying data onto the target XFS, I notice a large number of READs 
occurring on the physical hard drives.  Is there any way of monitoring 
what might be causing these read ops?

I have setup the system to minimize read-modify-write cycles as best I 
can, but I fear I've missed some possible options in LVM2 or 
cryptsetup.  Here are the specifics:

11:43:54          sde    162.00  12040.00  34344.00    286.32      
0.40      2.47      1.67     27.00
11:43:54          sdf    170.00  12008.00  36832.00    287.29      
0.62      3.65      2.12     36.00
11:43:54          sdg    185.00  10552.00  37920.00    262.01      
0.49      2.65      1.84     34.00
11:43:54          sdh    152.00  11824.00  37304.00    323.21      
0.29      1.78      1.71     26.00
11:43:54          sdi    140.00  13016.00  35216.00    344.51      
0.68      4.71      3.21     45.00
11:43:54          sdj    181.00  11784.00  36240.00    265.33      
0.43      2.38      1.55     28.00
11:43:54          sds    162.00  11824.00  34040.00    283.11      
0.46      2.84      1.67     27.00
11:43:54          sdt    157.00  11264.00  35192.00    295.90      
0.65      4.14      2.29     36.00
11:43:54          sdu    154.00  12584.00  35424.00    311.74      
0.46      2.79      1.69     26.00
11:43:54          sdv    131.00  12800.00  33264.00    351.63      
0.39      2.75      1.98     26.00
11:43:54          md5    752.00      0.00 153688.00    204.37      
0.00      0.00      0.00      0.00
11:43:54    DayTar-DayTar    752.00      0.00 153688.00    204.37     
12.42     16.76      1.33    100.00
11:43:54         data      0.00      0.00      0.00      0.00   
7238.71      0.00      0.00    100.00

Where md5 is the RAID6 holding the drives right above it, DayTar-DayTar 
are the VG and LV respectively, and data is the cryptsetup device 
derived from the LV.  Hard drives are set to "blockdev --setra 1024", 
md5 is set for stripe_cache_size of 6553 and preread_bypass_threshold of 
0.  XFS is mounted with the following options:

/dev/mapper/data on /data type xfs 
(rw,noatime,nodiratime,allocsize=256m,nobarrier,noikeep,inode64,logbufs=8,logbsize=256k)

And here are the format options of XFS:

meta-data=/dev/mapper/data       isize=256    agcount=15, 
agsize=268435455 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=3906993152, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

I wasn't sure how to do any kind of stripe alignment with the md RAID6 
given the layers in between.  Here are the LVM2 properties:

   --- Physical volume ---
   PV Name               /dev/md5
   VG Name               DayTar
   PV Size               14.55 TiB / not usable 116.00 MiB
   Allocatable           yes (but full)
   PE Size               256.00 MiB
   Total PE              59616
   Free PE               0
   Allocated PE          59616
   PV UUID               jwcRz9-Yl0k-OHRQ-p5yR-AbAP-j09z-PCgSFo

   --- Volume group ---
   VG Name               DayTar
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  2
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                1
   Open LV               1
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               14.55 TiB
   PE Size               256.00 MiB
   Total PE              59616
   Alloc PE / Size       59616 / 14.55 TiB
   Free  PE / Size       0 / 0
   VG UUID               X8gbkZ-BOMq-D6x2-xx6y-r2wF-cePQ-JTKZQs

   --- Logical volume ---
   LV Name                /dev/DayTar/DayTar
   VG Name                DayTar
   LV UUID                cdebg4-EcCR-6QR7-sAhT-EN1h-20Lv-qIFSH8
   LV Write Access        read/write
   LV Status              available
   # open                 1
   LV Size                14.55 TiB
   Current LE             59616
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     16384
   Block device           253:0

And finally the cryptsetup properties:

/dev/mapper/data is active:
   cipher:  aes-cbc-essiv:sha256
   keysize: 256 bits
   device:  /dev/mapper/DayTar-DayTar
   offset:  8192 sectors
   size:    31255945216 sectors
   mode:    read/write

Anyone have any suggestions on how to tune this to do better at pure 
writing by eliminating needless reading?

Thanks,

--Bart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
@ 2011-03-02 20:13 ` Jonathan Tripathy
  2011-03-02 22:19   ` Bart Kus
  2011-03-03 10:29 ` Bryn M. Reeves
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Tripathy @ 2011-03-02 20:13 UTC (permalink / raw)
  To: linux-lvm


On 02/03/11 20:09, Bart Kus wrote:
> Hello,
>
> I have the following setup:
>
> md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS
>
> When copying data onto the target XFS, I notice a large number of 
> READs occurring on the physical hard drives. Is there any way of 
> monitoring what might be causing these read ops?
>
> I have setup the system to minimize read-modify-write cycles as best I 
> can, but I fear I've missed some possible options in LVM2 or 
> cryptsetup. Here are the specifics:
>
> 11:43:54 sde 162.00 12040.00 34344.00 286.32 0.40 2.47 1.67 27.00
> 11:43:54 sdf 170.00 12008.00 36832.00 287.29 0.62 3.65 2.12 36.00
> 11:43:54 sdg 185.00 10552.00 37920.00 262.01 0.49 2.65 1.84 34.00
> 11:43:54 sdh 152.00 11824.00 37304.00 323.21 0.29 1.78 1.71 26.00
> 11:43:54 sdi 140.00 13016.00 35216.00 344.51 0.68 4.71 3.21 45.00
> 11:43:54 sdj 181.00 11784.00 36240.00 265.33 0.43 2.38 1.55 28.00
> 11:43:54 sds 162.00 11824.00 34040.00 283.11 0.46 2.84 1.67 27.00
> 11:43:54 sdt 157.00 11264.00 35192.00 295.90 0.65 4.14 2.29 36.00
> 11:43:54 sdu 154.00 12584.00 35424.00 311.74 0.46 2.79 1.69 26.00
> 11:43:54 sdv 131.00 12800.00 33264.00 351.63 0.39 2.75 1.98 26.00
> 11:43:54 md5 752.00 0.00 153688.00 204.37 0.00 0.00 0.00 0.00
> 11:43:54 DayTar-DayTar 752.00 0.00 153688.00 204.37 12.42 16.76 1.33 
> 100.00
> 11:43:54 data 0.00 0.00 0.00 0.00 7238.71 0.00 0.00 100.00
>
> Where md5 is the RAID6 holding the drives right above it, 
> DayTar-DayTar are the VG and LV respectively, and data is the 
> cryptsetup device derived from the LV. Hard drives are set to 
> "blockdev --setra 1024", md5 is set for stripe_cache_size of 6553 and 
> preread_bypass_threshold of 0. XFS is mounted with the following options:
>
> /dev/mapper/data on /data type xfs 
> (rw,noatime,nodiratime,allocsize=256m,nobarrier,noikeep,inode64,logbufs=8,logbsize=256k)
>
> And here are the format options of XFS:
>
> meta-data=/dev/mapper/data isize=256 agcount=15, agsize=268435455 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=3906993152, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=0 blks, lazy-count=0
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> I wasn't sure how to do any kind of stripe alignment with the md RAID6 
> given the layers in between. Here are the LVM2 properties:
>
> --- Physical volume ---
> PV Name /dev/md5
> VG Name DayTar
> PV Size 14.55 TiB / not usable 116.00 MiB
> Allocatable yes (but full)
> PE Size 256.00 MiB
> Total PE 59616
> Free PE 0
> Allocated PE 59616
> PV UUID jwcRz9-Yl0k-OHRQ-p5yR-AbAP-j09z-PCgSFo
>
> --- Volume group ---
> VG Name DayTar
> System ID
> Format lvm2
> Metadata Areas 1
> Metadata Sequence No 2
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 1
> Open LV 1
> Max PV 0
> Cur PV 1
> Act PV 1
> VG Size 14.55 TiB
> PE Size 256.00 MiB
> Total PE 59616
> Alloc PE / Size 59616 / 14.55 TiB
> Free PE / Size 0 / 0
> VG UUID X8gbkZ-BOMq-D6x2-xx6y-r2wF-cePQ-JTKZQs
>
> --- Logical volume ---
> LV Name /dev/DayTar/DayTar
> VG Name DayTar
> LV UUID cdebg4-EcCR-6QR7-sAhT-EN1h-20Lv-qIFSH8
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 14.55 TiB
> Current LE 59616
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 16384
> Block device 253:0
>
> And finally the cryptsetup properties:
>
> /dev/mapper/data is active:
> cipher: aes-cbc-essiv:sha256
> keysize: 256 bits
> device: /dev/mapper/DayTar-DayTar
> offset: 8192 sectors
> size: 31255945216 sectors
> mode: read/write
>
> Anyone have any suggestions on how to tune this to do better at pure 
> writing by eliminating needless reading?
>
> Thanks,
>
> --Bart
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm

I once used a tool called dstat. dstat has modules which can tell you 
which processes are using disk IO. I haven�t used dstat in a while so 
maybe someone else can chime in

Cheers
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 20:13 ` Jonathan Tripathy
@ 2011-03-02 22:19   ` Bart Kus
  2011-03-02 23:00     ` Dave Sullivan
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Kus @ 2011-03-02 22:19 UTC (permalink / raw)
  To: LVM general discussion and development

On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> I once used a tool called dstat. dstat has modules which can tell you 
> which processes are using disk IO. I haven�t used dstat in a while so 
> maybe someone else can chime in

I know the IO is only being caused by a "cp -a" command, but the issue 
is why all the reads?  It should be 99% writes.  Another thing I noticed 
is the average request size is pretty small:

14:06:20          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  
avgqu-sz     await     svctm     %util
[...snip!...]
14:06:21          sde    219.00  11304.00  30640.00    191.53      
1.15      5.16      2.10     46.00
14:06:21          sdf    209.00  11016.00  29904.00    195.79      
1.06      5.02      2.01     42.00
14:06:21          sdg    178.00  11512.00  28568.00    225.17      
0.74      3.99      2.08     37.00
14:06:21          sdh    175.00  10736.00  26832.00    214.67      
0.89      4.91      2.00     35.00
14:06:21          sdi    206.00  11512.00  29112.00    197.20      
0.83      3.98      1.80     37.00
14:06:21          sdj    209.00  11264.00  30264.00    198.70      
0.79      3.78      1.96     41.00
14:06:21          sds    214.00  10984.00  28552.00    184.75      
0.78      3.60      1.78     38.00
14:06:21          sdt    194.00  13352.00  27808.00    212.16      
0.83      4.23      1.91     37.00
14:06:21          sdu    183.00  12856.00  28872.00    228.02      
0.60      3.22      2.13     39.00
14:06:21          sdv    189.00  11984.00  31696.00    231.11      
0.57      2.96      1.69     32.00
14:06:21          md5    754.00      0.00 153848.00    204.04      
0.00      0.00      0.00      0.00
14:06:21    DayTar-DayTar    753.00      0.00 153600.00    203.98     
15.73     20.58      1.33    100.00
14:06:21         data    760.00      0.00 155800.00    205.00   
4670.84   6070.91      1.32    100.00

Looks to be about 205 sectors/request, which is 104,960 bytes.  This 
might be causing read-modify-write cycles if for whatever reason md is 
not taking advantage of the stripe cache.  stripe_cache_active shows 
about 128 blocks (512kB) of RAM in use, per hard drive.  Given the chunk 
size is 512kB, and the writes being requested are linear, it should not 
be doing read-modify-write.  And yet, there are tons of reads being 
logged, as shown above.

A couple more confusing things:

jo ~ # blockdev --getss /dev/mapper/data
512
jo ~ # blockdev --getpbsz /dev/mapper/data
512
jo ~ # blockdev --getioopt /dev/mapper/data
4194304
jo ~ # blockdev --getiomin /dev/mapper/data
524288
jo ~ # blockdev --getmaxsect /dev/mapper/data
255
jo ~ # blockdev --getbsz /dev/mapper/data
512
jo ~ #

If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data drives 
= 4MB stripe), but maxsect count is 255 (255*512=128k) how can optimal 
IO ever be done???  I re-mounted XFS with sunit=1024,swidth=8192 but 
that hasn't increased the average transaction size as expected.  Perhaps 
it's respecting this maxsect limit?

--Bart

PS: The RAID6 full stripe has +2 parity drives for a total of 10, but 
they're not included in the "data zone" definitions of stripe size, 
which are the only important ones for figuring out how large your writes 
should be.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 22:19   ` Bart Kus
@ 2011-03-02 23:00     ` Dave Sullivan
  2011-03-02 23:44       ` Ray Morris
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Sullivan @ 2011-03-02 23:00 UTC (permalink / raw)
  To: linux-lvm

http://sourceware.org/systemtap/examples/

look at traceio.stp and disktop.stp

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/index.html

On 03/02/2011 05:19 PM, Bart Kus wrote:
> On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
>> I once used a tool called dstat. dstat has modules which can tell you 
>> which processes are using disk IO. I haven�t used dstat in a while so 
>> maybe someone else can chime in
>
> I know the IO is only being caused by a "cp -a" command, but the issue 
> is why all the reads?  It should be 99% writes.  Another thing I 
> noticed is the average request size is pretty small:
>
> 14:06:20          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  
> avgqu-sz     await     svctm     %util
> [...snip!...]
> 14:06:21          sde    219.00  11304.00  30640.00    191.53      
> 1.15      5.16      2.10     46.00
> 14:06:21          sdf    209.00  11016.00  29904.00    195.79      
> 1.06      5.02      2.01     42.00
> 14:06:21          sdg    178.00  11512.00  28568.00    225.17      
> 0.74      3.99      2.08     37.00
> 14:06:21          sdh    175.00  10736.00  26832.00    214.67      
> 0.89      4.91      2.00     35.00
> 14:06:21          sdi    206.00  11512.00  29112.00    197.20      
> 0.83      3.98      1.80     37.00
> 14:06:21          sdj    209.00  11264.00  30264.00    198.70      
> 0.79      3.78      1.96     41.00
> 14:06:21          sds    214.00  10984.00  28552.00    184.75      
> 0.78      3.60      1.78     38.00
> 14:06:21          sdt    194.00  13352.00  27808.00    212.16      
> 0.83      4.23      1.91     37.00
> 14:06:21          sdu    183.00  12856.00  28872.00    228.02      
> 0.60      3.22      2.13     39.00
> 14:06:21          sdv    189.00  11984.00  31696.00    231.11      
> 0.57      2.96      1.69     32.00
> 14:06:21          md5    754.00      0.00 153848.00    204.04      
> 0.00      0.00      0.00      0.00
> 14:06:21    DayTar-DayTar    753.00      0.00 153600.00    203.98     
> 15.73     20.58      1.33    100.00
> 14:06:21         data    760.00      0.00 155800.00    205.00   
> 4670.84   6070.91      1.32    100.00
>
> Looks to be about 205 sectors/request, which is 104,960 bytes.  This 
> might be causing read-modify-write cycles if for whatever reason md is 
> not taking advantage of the stripe cache.  stripe_cache_active shows 
> about 128 blocks (512kB) of RAM in use, per hard drive.  Given the 
> chunk size is 512kB, and the writes being requested are linear, it 
> should not be doing read-modify-write.  And yet, there are tons of 
> reads being logged, as shown above.
>
> A couple more confusing things:
>
> jo ~ # blockdev --getss /dev/mapper/data
> 512
> jo ~ # blockdev --getpbsz /dev/mapper/data
> 512
> jo ~ # blockdev --getioopt /dev/mapper/data
> 4194304
> jo ~ # blockdev --getiomin /dev/mapper/data
> 524288
> jo ~ # blockdev --getmaxsect /dev/mapper/data
> 255
> jo ~ # blockdev --getbsz /dev/mapper/data
> 512
> jo ~ #
>
> If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data 
> drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how can 
> optimal IO ever be done???  I re-mounted XFS with 
> sunit=1024,swidth=8192 but that hasn't increased the average 
> transaction size as expected.  Perhaps it's respecting this maxsect 
> limit?
>
> --Bart
>
> PS: The RAID6 full stripe has +2 parity drives for a total of 10, but 
> they're not included in the "data zone" definitions of stripe size, 
> which are the only important ones for figuring out how large your 
> writes should be.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 23:00     ` Dave Sullivan
@ 2011-03-02 23:44       ` Ray Morris
  2011-03-03  0:25         ` Bart Kus
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Morris @ 2011-03-02 23:44 UTC (permalink / raw)
  To: linux-lvm

> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads?  It should be 99% writes. 

cp has to read something before it can write it elsewhere.
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Wed, 02 Mar 2011 18:00:53 -0500
Dave Sullivan <dsulliva@redhat.com> wrote:

> http://sourceware.org/systemtap/examples/
> 
> look at traceio.stp and disktop.stp
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/index.html
> 
> On 03/02/2011 05:19 PM, Bart Kus wrote:
> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> >> I once used a tool called dstat. dstat has modules which can tell
> >> you which processes are using disk IO. I haven’t used dstat in a
> >> while so maybe someone else can chime in
> >
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads?  It should be 99% writes.  Another
> > thing I noticed is the average request size is pretty small:
> >
> > 14:06:20          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  
> > avgqu-sz     await     svctm     %util
> > [...snip!...]
> > 14:06:21          sde    219.00  11304.00  30640.00    191.53      
> > 1.15      5.16      2.10     46.00
> > 14:06:21          sdf    209.00  11016.00  29904.00    195.79      
> > 1.06      5.02      2.01     42.00
> > 14:06:21          sdg    178.00  11512.00  28568.00    225.17      
> > 0.74      3.99      2.08     37.00
> > 14:06:21          sdh    175.00  10736.00  26832.00    214.67      
> > 0.89      4.91      2.00     35.00
> > 14:06:21          sdi    206.00  11512.00  29112.00    197.20      
> > 0.83      3.98      1.80     37.00
> > 14:06:21          sdj    209.00  11264.00  30264.00    198.70      
> > 0.79      3.78      1.96     41.00
> > 14:06:21          sds    214.00  10984.00  28552.00    184.75      
> > 0.78      3.60      1.78     38.00
> > 14:06:21          sdt    194.00  13352.00  27808.00    212.16      
> > 0.83      4.23      1.91     37.00
> > 14:06:21          sdu    183.00  12856.00  28872.00    228.02      
> > 0.60      3.22      2.13     39.00
> > 14:06:21          sdv    189.00  11984.00  31696.00    231.11      
> > 0.57      2.96      1.69     32.00
> > 14:06:21          md5    754.00      0.00 153848.00    204.04      
> > 0.00      0.00      0.00      0.00
> > 14:06:21    DayTar-DayTar    753.00      0.00 153600.00
> > 203.98 15.73     20.58      1.33    100.00
> > 14:06:21         data    760.00      0.00 155800.00    205.00   
> > 4670.84   6070.91      1.32    100.00
> >
> > Looks to be about 205 sectors/request, which is 104,960 bytes.
> > This might be causing read-modify-write cycles if for whatever
> > reason md is not taking advantage of the stripe cache.
> > stripe_cache_active shows about 128 blocks (512kB) of RAM in use,
> > per hard drive.  Given the chunk size is 512kB, and the writes
> > being requested are linear, it should not be doing
> > read-modify-write.  And yet, there are tons of reads being logged,
> > as shown above.
> >
> > A couple more confusing things:
> >
> > jo ~ # blockdev --getss /dev/mapper/data
> > 512
> > jo ~ # blockdev --getpbsz /dev/mapper/data
> > 512
> > jo ~ # blockdev --getioopt /dev/mapper/data
> > 4194304
> > jo ~ # blockdev --getiomin /dev/mapper/data
> > 524288
> > jo ~ # blockdev --getmaxsect /dev/mapper/data
> > 255
> > jo ~ # blockdev --getbsz /dev/mapper/data
> > 512
> > jo ~ #
> >
> > If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data 
> > drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how
> > can optimal IO ever be done???  I re-mounted XFS with 
> > sunit=1024,swidth=8192 but that hasn't increased the average 
> > transaction size as expected.  Perhaps it's respecting this maxsect 
> > limit?
> >
> > --Bart
> >
> > PS: The RAID6 full stripe has +2 parity drives for a total of 10,
> > but they're not included in the "data zone" definitions of stripe
> > size, which are the only important ones for figuring out how large
> > your writes should be.
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-lvm
> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 23:44       ` Ray Morris
@ 2011-03-03  0:25         ` Bart Kus
  2011-03-07 16:02           ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Kus @ 2011-03-03  0:25 UTC (permalink / raw)
  To: LVM general discussion and development

On 3/2/2011 3:44 PM, Ray Morris wrote:
>>> On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
>>> I know the IO is only being caused by a "cp -a" command, but the
>>> issue is why all the reads?  It should be 99% writes.
> cp has to read something before it can write it elsewhere.

Ray, my bad, I should have specified, the cp reads from a different 
volume/set of drives.

Thanks for the systemtap tip, Dave!

--Bart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
  2011-03-02 20:13 ` Jonathan Tripathy
@ 2011-03-03 10:29 ` Bryn M. Reeves
  1 sibling, 0 replies; 9+ messages in thread
From: Bryn M. Reeves @ 2011-03-03 10:29 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Bart Kus

On 03/02/2011 08:09 PM, Bart Kus wrote:
> Hello,
> 
> I have the following setup:
> 
> md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS
> 
> When copying data onto the target XFS, I notice a large number of READs 
> occurring on the physical hard drives.  Is there any way of monitoring 
> what might be causing these read ops?

The blktrace command is extremely useful for this kind of I/O tracing. I've used
it numerous times to figure out where I/O is originating and also how it's
making its way through layers of stacked devices.

There's a conference talk overview here from a few years ago:

http://www.gelato.org/pdf/apr2006/gelato_ICE06apr_blktrace_brunelle_hp.pdf

And also a user's guide:

http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html
http://pdfedit.petricek.net/bt/file_download.php?file_id=17&type=bug

Regards,
Bryn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-03  0:25         ` Bart Kus
@ 2011-03-07 16:02           ` Frank Ch. Eigler
  2011-03-07 18:06             ` Wendy Cheng
  0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2011-03-07 16:02 UTC (permalink / raw)
  To: LVM general discussion and development

Bart Kus <me@bartk.us> writes:

>>>> issue is why all the reads?  It should be 99% writes.
>> cp has to read something before it can write it elsewhere.
> Ray, my bad, I should have specified, the cp reads from a different
> volume/set of drives. [...]

One way to try answering such "why" questions is to plop a systemtap
probe at an event that should not be happening much, and print a
backtrace.  In your case you could run this for a little while during
the copy:

# stap -c 'sleep 2' -e '
probe ioblock.request {
  if (devname == "sdg2")                # adjust to taste
    if ((rw & 1) == 0)                  # ! REQ_WRITE
      if (randint(100) < 2)             # 2% of occurrences, if you like
         { println(devname, rw, size)
           print_backtrace() }
}
'


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Tracing IO requests?
  2011-03-07 16:02           ` Frank Ch. Eigler
@ 2011-03-07 18:06             ` Wendy Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Wendy Cheng @ 2011-03-07 18:06 UTC (permalink / raw)
  To: LVM general discussion and development

My guess is that these reads come from parity disk(s).... And the more
fragmented your blocks, the more read(s) you'll see.

-- Wendy

On Mon, Mar 7, 2011 at 8:02 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Bart Kus <me@bartk.us> writes:
>
>>>>> issue is why all the reads? �It should be 99% writes.
>>> cp has to read something before it can write it elsewhere.
>> Ray, my bad, I should have specified, the cp reads from a different
>> volume/set of drives. [...]
>
> One way to try answering such "why" questions is to plop a systemtap
> probe at an event that should not be happening much, and print a
> backtrace. �In your case you could run this for a little while during
> the copy:
>
> # stap -c 'sleep 2' -e '
> probe ioblock.request {
> �if (devname == "sdg2") � � � � � � � �# adjust to taste
> � �if ((rw & 1) == 0) � � � � � � � � �# ! REQ_WRITE
> � � �if (randint(100) < 2) � � � � � � # 2% of occurrences, if you like
> � � � � { println(devname, rw, size)
> � � � � � print_backtrace() }
> }
> '
>
>
> - FChE
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-03-07 18:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
2011-03-02 20:13 ` Jonathan Tripathy
2011-03-02 22:19   ` Bart Kus
2011-03-02 23:00     ` Dave Sullivan
2011-03-02 23:44       ` Ray Morris
2011-03-03  0:25         ` Bart Kus
2011-03-07 16:02           ` Frank Ch. Eigler
2011-03-07 18:06             ` Wendy Cheng
2011-03-03 10:29 ` Bryn M. Reeves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).