* [linux-lvm] Tracing IO requests?
@ 2011-03-02 20:09 Bart Kus
2011-03-02 20:13 ` Jonathan Tripathy
2011-03-03 10:29 ` Bryn M. Reeves
0 siblings, 2 replies; 9+ messages in thread
From: Bart Kus @ 2011-03-02 20:09 UTC (permalink / raw)
To: linux-lvm
Hello,
I have the following setup:
md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS
When copying data onto the target XFS, I notice a large number of READs
occurring on the physical hard drives. Is there any way of monitoring
what might be causing these read ops?
I have setup the system to minimize read-modify-write cycles as best I
can, but I fear I've missed some possible options in LVM2 or
cryptsetup. Here are the specifics:
11:43:54 sde 162.00 12040.00 34344.00 286.32
0.40 2.47 1.67 27.00
11:43:54 sdf 170.00 12008.00 36832.00 287.29
0.62 3.65 2.12 36.00
11:43:54 sdg 185.00 10552.00 37920.00 262.01
0.49 2.65 1.84 34.00
11:43:54 sdh 152.00 11824.00 37304.00 323.21
0.29 1.78 1.71 26.00
11:43:54 sdi 140.00 13016.00 35216.00 344.51
0.68 4.71 3.21 45.00
11:43:54 sdj 181.00 11784.00 36240.00 265.33
0.43 2.38 1.55 28.00
11:43:54 sds 162.00 11824.00 34040.00 283.11
0.46 2.84 1.67 27.00
11:43:54 sdt 157.00 11264.00 35192.00 295.90
0.65 4.14 2.29 36.00
11:43:54 sdu 154.00 12584.00 35424.00 311.74
0.46 2.79 1.69 26.00
11:43:54 sdv 131.00 12800.00 33264.00 351.63
0.39 2.75 1.98 26.00
11:43:54 md5 752.00 0.00 153688.00 204.37
0.00 0.00 0.00 0.00
11:43:54 DayTar-DayTar 752.00 0.00 153688.00 204.37
12.42 16.76 1.33 100.00
11:43:54 data 0.00 0.00 0.00 0.00
7238.71 0.00 0.00 100.00
Where md5 is the RAID6 holding the drives right above it, DayTar-DayTar
are the VG and LV respectively, and data is the cryptsetup device
derived from the LV. Hard drives are set to "blockdev --setra 1024",
md5 is set for stripe_cache_size of 6553 and preread_bypass_threshold of
0. XFS is mounted with the following options:
/dev/mapper/data on /data type xfs
(rw,noatime,nodiratime,allocsize=256m,nobarrier,noikeep,inode64,logbufs=8,logbsize=256k)
And here are the format options of XFS:
meta-data=/dev/mapper/data isize=256 agcount=15,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=3906993152, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
I wasn't sure how to do any kind of stripe alignment with the md RAID6
given the layers in between. Here are the LVM2 properties:
--- Physical volume ---
PV Name /dev/md5
VG Name DayTar
PV Size 14.55 TiB / not usable 116.00 MiB
Allocatable yes (but full)
PE Size 256.00 MiB
Total PE 59616
Free PE 0
Allocated PE 59616
PV UUID jwcRz9-Yl0k-OHRQ-p5yR-AbAP-j09z-PCgSFo
--- Volume group ---
VG Name DayTar
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 14.55 TiB
PE Size 256.00 MiB
Total PE 59616
Alloc PE / Size 59616 / 14.55 TiB
Free PE / Size 0 / 0
VG UUID X8gbkZ-BOMq-D6x2-xx6y-r2wF-cePQ-JTKZQs
--- Logical volume ---
LV Name /dev/DayTar/DayTar
VG Name DayTar
LV UUID cdebg4-EcCR-6QR7-sAhT-EN1h-20Lv-qIFSH8
LV Write Access read/write
LV Status available
# open 1
LV Size 14.55 TiB
Current LE 59616
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 16384
Block device 253:0
And finally the cryptsetup properties:
/dev/mapper/data is active:
cipher: aes-cbc-essiv:sha256
keysize: 256 bits
device: /dev/mapper/DayTar-DayTar
offset: 8192 sectors
size: 31255945216 sectors
mode: read/write
Anyone have any suggestions on how to tune this to do better at pure
writing by eliminating needless reading?
Thanks,
--Bart
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
@ 2011-03-02 20:13 ` Jonathan Tripathy
2011-03-02 22:19 ` Bart Kus
2011-03-03 10:29 ` Bryn M. Reeves
1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Tripathy @ 2011-03-02 20:13 UTC (permalink / raw)
To: linux-lvm
On 02/03/11 20:09, Bart Kus wrote:
> Hello,
>
> I have the following setup:
>
> md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS
>
> When copying data onto the target XFS, I notice a large number of
> READs occurring on the physical hard drives. Is there any way of
> monitoring what might be causing these read ops?
>
> I have setup the system to minimize read-modify-write cycles as best I
> can, but I fear I've missed some possible options in LVM2 or
> cryptsetup. Here are the specifics:
>
> 11:43:54 sde 162.00 12040.00 34344.00 286.32 0.40 2.47 1.67 27.00
> 11:43:54 sdf 170.00 12008.00 36832.00 287.29 0.62 3.65 2.12 36.00
> 11:43:54 sdg 185.00 10552.00 37920.00 262.01 0.49 2.65 1.84 34.00
> 11:43:54 sdh 152.00 11824.00 37304.00 323.21 0.29 1.78 1.71 26.00
> 11:43:54 sdi 140.00 13016.00 35216.00 344.51 0.68 4.71 3.21 45.00
> 11:43:54 sdj 181.00 11784.00 36240.00 265.33 0.43 2.38 1.55 28.00
> 11:43:54 sds 162.00 11824.00 34040.00 283.11 0.46 2.84 1.67 27.00
> 11:43:54 sdt 157.00 11264.00 35192.00 295.90 0.65 4.14 2.29 36.00
> 11:43:54 sdu 154.00 12584.00 35424.00 311.74 0.46 2.79 1.69 26.00
> 11:43:54 sdv 131.00 12800.00 33264.00 351.63 0.39 2.75 1.98 26.00
> 11:43:54 md5 752.00 0.00 153688.00 204.37 0.00 0.00 0.00 0.00
> 11:43:54 DayTar-DayTar 752.00 0.00 153688.00 204.37 12.42 16.76 1.33
> 100.00
> 11:43:54 data 0.00 0.00 0.00 0.00 7238.71 0.00 0.00 100.00
>
> Where md5 is the RAID6 holding the drives right above it,
> DayTar-DayTar are the VG and LV respectively, and data is the
> cryptsetup device derived from the LV. Hard drives are set to
> "blockdev --setra 1024", md5 is set for stripe_cache_size of 6553 and
> preread_bypass_threshold of 0. XFS is mounted with the following options:
>
> /dev/mapper/data on /data type xfs
> (rw,noatime,nodiratime,allocsize=256m,nobarrier,noikeep,inode64,logbufs=8,logbsize=256k)
>
> And here are the format options of XFS:
>
> meta-data=/dev/mapper/data isize=256 agcount=15, agsize=268435455 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=3906993152, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=0 blks, lazy-count=0
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> I wasn't sure how to do any kind of stripe alignment with the md RAID6
> given the layers in between. Here are the LVM2 properties:
>
> --- Physical volume ---
> PV Name /dev/md5
> VG Name DayTar
> PV Size 14.55 TiB / not usable 116.00 MiB
> Allocatable yes (but full)
> PE Size 256.00 MiB
> Total PE 59616
> Free PE 0
> Allocated PE 59616
> PV UUID jwcRz9-Yl0k-OHRQ-p5yR-AbAP-j09z-PCgSFo
>
> --- Volume group ---
> VG Name DayTar
> System ID
> Format lvm2
> Metadata Areas 1
> Metadata Sequence No 2
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 1
> Open LV 1
> Max PV 0
> Cur PV 1
> Act PV 1
> VG Size 14.55 TiB
> PE Size 256.00 MiB
> Total PE 59616
> Alloc PE / Size 59616 / 14.55 TiB
> Free PE / Size 0 / 0
> VG UUID X8gbkZ-BOMq-D6x2-xx6y-r2wF-cePQ-JTKZQs
>
> --- Logical volume ---
> LV Name /dev/DayTar/DayTar
> VG Name DayTar
> LV UUID cdebg4-EcCR-6QR7-sAhT-EN1h-20Lv-qIFSH8
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 14.55 TiB
> Current LE 59616
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 16384
> Block device 253:0
>
> And finally the cryptsetup properties:
>
> /dev/mapper/data is active:
> cipher: aes-cbc-essiv:sha256
> keysize: 256 bits
> device: /dev/mapper/DayTar-DayTar
> offset: 8192 sectors
> size: 31255945216 sectors
> mode: read/write
>
> Anyone have any suggestions on how to tune this to do better at pure
> writing by eliminating needless reading?
>
> Thanks,
>
> --Bart
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
I once used a tool called dstat. dstat has modules which can tell you
which processes are using disk IO. I haven�t used dstat in a while so
maybe someone else can chime in
Cheers
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 20:13 ` Jonathan Tripathy
@ 2011-03-02 22:19 ` Bart Kus
2011-03-02 23:00 ` Dave Sullivan
0 siblings, 1 reply; 9+ messages in thread
From: Bart Kus @ 2011-03-02 22:19 UTC (permalink / raw)
To: LVM general discussion and development
On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> I once used a tool called dstat. dstat has modules which can tell you
> which processes are using disk IO. I haven�t used dstat in a while so
> maybe someone else can chime in
I know the IO is only being caused by a "cp -a" command, but the issue
is why all the reads? It should be 99% writes. Another thing I noticed
is the average request size is pretty small:
14:06:20 DEV tps rd_sec/s wr_sec/s avgrq-sz
avgqu-sz await svctm %util
[...snip!...]
14:06:21 sde 219.00 11304.00 30640.00 191.53
1.15 5.16 2.10 46.00
14:06:21 sdf 209.00 11016.00 29904.00 195.79
1.06 5.02 2.01 42.00
14:06:21 sdg 178.00 11512.00 28568.00 225.17
0.74 3.99 2.08 37.00
14:06:21 sdh 175.00 10736.00 26832.00 214.67
0.89 4.91 2.00 35.00
14:06:21 sdi 206.00 11512.00 29112.00 197.20
0.83 3.98 1.80 37.00
14:06:21 sdj 209.00 11264.00 30264.00 198.70
0.79 3.78 1.96 41.00
14:06:21 sds 214.00 10984.00 28552.00 184.75
0.78 3.60 1.78 38.00
14:06:21 sdt 194.00 13352.00 27808.00 212.16
0.83 4.23 1.91 37.00
14:06:21 sdu 183.00 12856.00 28872.00 228.02
0.60 3.22 2.13 39.00
14:06:21 sdv 189.00 11984.00 31696.00 231.11
0.57 2.96 1.69 32.00
14:06:21 md5 754.00 0.00 153848.00 204.04
0.00 0.00 0.00 0.00
14:06:21 DayTar-DayTar 753.00 0.00 153600.00 203.98
15.73 20.58 1.33 100.00
14:06:21 data 760.00 0.00 155800.00 205.00
4670.84 6070.91 1.32 100.00
Looks to be about 205 sectors/request, which is 104,960 bytes. This
might be causing read-modify-write cycles if for whatever reason md is
not taking advantage of the stripe cache. stripe_cache_active shows
about 128 blocks (512kB) of RAM in use, per hard drive. Given the chunk
size is 512kB, and the writes being requested are linear, it should not
be doing read-modify-write. And yet, there are tons of reads being
logged, as shown above.
A couple more confusing things:
jo ~ # blockdev --getss /dev/mapper/data
512
jo ~ # blockdev --getpbsz /dev/mapper/data
512
jo ~ # blockdev --getioopt /dev/mapper/data
4194304
jo ~ # blockdev --getiomin /dev/mapper/data
524288
jo ~ # blockdev --getmaxsect /dev/mapper/data
255
jo ~ # blockdev --getbsz /dev/mapper/data
512
jo ~ #
If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data drives
= 4MB stripe), but maxsect count is 255 (255*512=128k) how can optimal
IO ever be done??? I re-mounted XFS with sunit=1024,swidth=8192 but
that hasn't increased the average transaction size as expected. Perhaps
it's respecting this maxsect limit?
--Bart
PS: The RAID6 full stripe has +2 parity drives for a total of 10, but
they're not included in the "data zone" definitions of stripe size,
which are the only important ones for figuring out how large your writes
should be.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 22:19 ` Bart Kus
@ 2011-03-02 23:00 ` Dave Sullivan
2011-03-02 23:44 ` Ray Morris
0 siblings, 1 reply; 9+ messages in thread
From: Dave Sullivan @ 2011-03-02 23:00 UTC (permalink / raw)
To: linux-lvm
http://sourceware.org/systemtap/examples/
look at traceio.stp and disktop.stp
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/index.html
On 03/02/2011 05:19 PM, Bart Kus wrote:
> On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
>> I once used a tool called dstat. dstat has modules which can tell you
>> which processes are using disk IO. I haven�t used dstat in a while so
>> maybe someone else can chime in
>
> I know the IO is only being caused by a "cp -a" command, but the issue
> is why all the reads? It should be 99% writes. Another thing I
> noticed is the average request size is pretty small:
>
> 14:06:20 DEV tps rd_sec/s wr_sec/s avgrq-sz
> avgqu-sz await svctm %util
> [...snip!...]
> 14:06:21 sde 219.00 11304.00 30640.00 191.53
> 1.15 5.16 2.10 46.00
> 14:06:21 sdf 209.00 11016.00 29904.00 195.79
> 1.06 5.02 2.01 42.00
> 14:06:21 sdg 178.00 11512.00 28568.00 225.17
> 0.74 3.99 2.08 37.00
> 14:06:21 sdh 175.00 10736.00 26832.00 214.67
> 0.89 4.91 2.00 35.00
> 14:06:21 sdi 206.00 11512.00 29112.00 197.20
> 0.83 3.98 1.80 37.00
> 14:06:21 sdj 209.00 11264.00 30264.00 198.70
> 0.79 3.78 1.96 41.00
> 14:06:21 sds 214.00 10984.00 28552.00 184.75
> 0.78 3.60 1.78 38.00
> 14:06:21 sdt 194.00 13352.00 27808.00 212.16
> 0.83 4.23 1.91 37.00
> 14:06:21 sdu 183.00 12856.00 28872.00 228.02
> 0.60 3.22 2.13 39.00
> 14:06:21 sdv 189.00 11984.00 31696.00 231.11
> 0.57 2.96 1.69 32.00
> 14:06:21 md5 754.00 0.00 153848.00 204.04
> 0.00 0.00 0.00 0.00
> 14:06:21 DayTar-DayTar 753.00 0.00 153600.00 203.98
> 15.73 20.58 1.33 100.00
> 14:06:21 data 760.00 0.00 155800.00 205.00
> 4670.84 6070.91 1.32 100.00
>
> Looks to be about 205 sectors/request, which is 104,960 bytes. This
> might be causing read-modify-write cycles if for whatever reason md is
> not taking advantage of the stripe cache. stripe_cache_active shows
> about 128 blocks (512kB) of RAM in use, per hard drive. Given the
> chunk size is 512kB, and the writes being requested are linear, it
> should not be doing read-modify-write. And yet, there are tons of
> reads being logged, as shown above.
>
> A couple more confusing things:
>
> jo ~ # blockdev --getss /dev/mapper/data
> 512
> jo ~ # blockdev --getpbsz /dev/mapper/data
> 512
> jo ~ # blockdev --getioopt /dev/mapper/data
> 4194304
> jo ~ # blockdev --getiomin /dev/mapper/data
> 524288
> jo ~ # blockdev --getmaxsect /dev/mapper/data
> 255
> jo ~ # blockdev --getbsz /dev/mapper/data
> 512
> jo ~ #
>
> If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data
> drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how can
> optimal IO ever be done??? I re-mounted XFS with
> sunit=1024,swidth=8192 but that hasn't increased the average
> transaction size as expected. Perhaps it's respecting this maxsect
> limit?
>
> --Bart
>
> PS: The RAID6 full stripe has +2 parity drives for a total of 10, but
> they're not included in the "data zone" definitions of stripe size,
> which are the only important ones for figuring out how large your
> writes should be.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 23:00 ` Dave Sullivan
@ 2011-03-02 23:44 ` Ray Morris
2011-03-03 0:25 ` Bart Kus
0 siblings, 1 reply; 9+ messages in thread
From: Ray Morris @ 2011-03-02 23:44 UTC (permalink / raw)
To: linux-lvm
> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads? It should be 99% writes.
cp has to read something before it can write it elsewhere.
--
Ray Morris
support@bettercgi.com
Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/
Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/
Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php
On Wed, 02 Mar 2011 18:00:53 -0500
Dave Sullivan <dsulliva@redhat.com> wrote:
> http://sourceware.org/systemtap/examples/
>
> look at traceio.stp and disktop.stp
>
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/index.html
>
> On 03/02/2011 05:19 PM, Bart Kus wrote:
> > On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> >> I once used a tool called dstat. dstat has modules which can tell
> >> you which processes are using disk IO. I haven’t used dstat in a
> >> while so maybe someone else can chime in
> >
> > I know the IO is only being caused by a "cp -a" command, but the
> > issue is why all the reads? It should be 99% writes. Another
> > thing I noticed is the average request size is pretty small:
> >
> > 14:06:20 DEV tps rd_sec/s wr_sec/s avgrq-sz
> > avgqu-sz await svctm %util
> > [...snip!...]
> > 14:06:21 sde 219.00 11304.00 30640.00 191.53
> > 1.15 5.16 2.10 46.00
> > 14:06:21 sdf 209.00 11016.00 29904.00 195.79
> > 1.06 5.02 2.01 42.00
> > 14:06:21 sdg 178.00 11512.00 28568.00 225.17
> > 0.74 3.99 2.08 37.00
> > 14:06:21 sdh 175.00 10736.00 26832.00 214.67
> > 0.89 4.91 2.00 35.00
> > 14:06:21 sdi 206.00 11512.00 29112.00 197.20
> > 0.83 3.98 1.80 37.00
> > 14:06:21 sdj 209.00 11264.00 30264.00 198.70
> > 0.79 3.78 1.96 41.00
> > 14:06:21 sds 214.00 10984.00 28552.00 184.75
> > 0.78 3.60 1.78 38.00
> > 14:06:21 sdt 194.00 13352.00 27808.00 212.16
> > 0.83 4.23 1.91 37.00
> > 14:06:21 sdu 183.00 12856.00 28872.00 228.02
> > 0.60 3.22 2.13 39.00
> > 14:06:21 sdv 189.00 11984.00 31696.00 231.11
> > 0.57 2.96 1.69 32.00
> > 14:06:21 md5 754.00 0.00 153848.00 204.04
> > 0.00 0.00 0.00 0.00
> > 14:06:21 DayTar-DayTar 753.00 0.00 153600.00
> > 203.98 15.73 20.58 1.33 100.00
> > 14:06:21 data 760.00 0.00 155800.00 205.00
> > 4670.84 6070.91 1.32 100.00
> >
> > Looks to be about 205 sectors/request, which is 104,960 bytes.
> > This might be causing read-modify-write cycles if for whatever
> > reason md is not taking advantage of the stripe cache.
> > stripe_cache_active shows about 128 blocks (512kB) of RAM in use,
> > per hard drive. Given the chunk size is 512kB, and the writes
> > being requested are linear, it should not be doing
> > read-modify-write. And yet, there are tons of reads being logged,
> > as shown above.
> >
> > A couple more confusing things:
> >
> > jo ~ # blockdev --getss /dev/mapper/data
> > 512
> > jo ~ # blockdev --getpbsz /dev/mapper/data
> > 512
> > jo ~ # blockdev --getioopt /dev/mapper/data
> > 4194304
> > jo ~ # blockdev --getiomin /dev/mapper/data
> > 524288
> > jo ~ # blockdev --getmaxsect /dev/mapper/data
> > 255
> > jo ~ # blockdev --getbsz /dev/mapper/data
> > 512
> > jo ~ #
> >
> > If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data
> > drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how
> > can optimal IO ever be done??? I re-mounted XFS with
> > sunit=1024,swidth=8192 but that hasn't increased the average
> > transaction size as expected. Perhaps it's respecting this maxsect
> > limit?
> >
> > --Bart
> >
> > PS: The RAID6 full stripe has +2 parity drives for a total of 10,
> > but they're not included in the "data zone" definitions of stripe
> > size, which are the only important ones for figuring out how large
> > your writes should be.
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-lvm
> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 23:44 ` Ray Morris
@ 2011-03-03 0:25 ` Bart Kus
2011-03-07 16:02 ` Frank Ch. Eigler
0 siblings, 1 reply; 9+ messages in thread
From: Bart Kus @ 2011-03-03 0:25 UTC (permalink / raw)
To: LVM general discussion and development
On 3/2/2011 3:44 PM, Ray Morris wrote:
>>> On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
>>> I know the IO is only being caused by a "cp -a" command, but the
>>> issue is why all the reads? It should be 99% writes.
> cp has to read something before it can write it elsewhere.
Ray, my bad, I should have specified, the cp reads from a different
volume/set of drives.
Thanks for the systemtap tip, Dave!
--Bart
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-03 0:25 ` Bart Kus
@ 2011-03-07 16:02 ` Frank Ch. Eigler
2011-03-07 18:06 ` Wendy Cheng
0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2011-03-07 16:02 UTC (permalink / raw)
To: LVM general discussion and development
Bart Kus <me@bartk.us> writes:
>>>> issue is why all the reads? It should be 99% writes.
>> cp has to read something before it can write it elsewhere.
> Ray, my bad, I should have specified, the cp reads from a different
> volume/set of drives. [...]
One way to try answering such "why" questions is to plop a systemtap
probe at an event that should not be happening much, and print a
backtrace. In your case you could run this for a little while during
the copy:
# stap -c 'sleep 2' -e '
probe ioblock.request {
if (devname == "sdg2") # adjust to taste
if ((rw & 1) == 0) # ! REQ_WRITE
if (randint(100) < 2) # 2% of occurrences, if you like
{ println(devname, rw, size)
print_backtrace() }
}
'
- FChE
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-07 16:02 ` Frank Ch. Eigler
@ 2011-03-07 18:06 ` Wendy Cheng
0 siblings, 0 replies; 9+ messages in thread
From: Wendy Cheng @ 2011-03-07 18:06 UTC (permalink / raw)
To: LVM general discussion and development
My guess is that these reads come from parity disk(s).... And the more
fragmented your blocks, the more read(s) you'll see.
-- Wendy
On Mon, Mar 7, 2011 at 8:02 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Bart Kus <me@bartk.us> writes:
>
>>>>> issue is why all the reads? �It should be 99% writes.
>>> cp has to read something before it can write it elsewhere.
>> Ray, my bad, I should have specified, the cp reads from a different
>> volume/set of drives. [...]
>
> One way to try answering such "why" questions is to plop a systemtap
> probe at an event that should not be happening much, and print a
> backtrace. �In your case you could run this for a little while during
> the copy:
>
> # stap -c 'sleep 2' -e '
> probe ioblock.request {
> �if (devname == "sdg2") � � � � � � � �# adjust to taste
> � �if ((rw & 1) == 0) � � � � � � � � �# ! REQ_WRITE
> � � �if (randint(100) < 2) � � � � � � # 2% of occurrences, if you like
> � � � � { println(devname, rw, size)
> � � � � � print_backtrace() }
> }
> '
>
>
> - FChE
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Tracing IO requests?
2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
2011-03-02 20:13 ` Jonathan Tripathy
@ 2011-03-03 10:29 ` Bryn M. Reeves
1 sibling, 0 replies; 9+ messages in thread
From: Bryn M. Reeves @ 2011-03-03 10:29 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Bart Kus
On 03/02/2011 08:09 PM, Bart Kus wrote:
> Hello,
>
> I have the following setup:
>
> md_RAID6(10x2TB) -> LVM2 -> cryptsetup -> XFS
>
> When copying data onto the target XFS, I notice a large number of READs
> occurring on the physical hard drives. Is there any way of monitoring
> what might be causing these read ops?
The blktrace command is extremely useful for this kind of I/O tracing. I've used
it numerous times to figure out where I/O is originating and also how it's
making its way through layers of stacked devices.
There's a conference talk overview here from a few years ago:
http://www.gelato.org/pdf/apr2006/gelato_ICE06apr_blktrace_brunelle_hp.pdf
And also a user's guide:
http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html
http://pdfedit.petricek.net/bt/file_download.php?file_id=17&type=bug
Regards,
Bryn.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-03-07 18:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02 20:09 [linux-lvm] Tracing IO requests? Bart Kus
2011-03-02 20:13 ` Jonathan Tripathy
2011-03-02 22:19 ` Bart Kus
2011-03-02 23:00 ` Dave Sullivan
2011-03-02 23:44 ` Ray Morris
2011-03-03 0:25 ` Bart Kus
2011-03-07 16:02 ` Frank Ch. Eigler
2011-03-07 18:06 ` Wendy Cheng
2011-03-03 10:29 ` Bryn M. Reeves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).