From: Hannes Reinecke <hare@suse.de>
To: device-mapper development <dm-devel@redhat.com>
Cc: Neil F Brown <nfbrown@novell.com>,
linux-kernel@vger.kernel.org, agk@redhat.com,
Jens Axboe <jens.axboe@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
stable@kernel.org, devel@openvz.org
Subject: Re: Re: dm: bounce_pfn limit added
Date: Wed, 31 Oct 2007 08:36:01 +0100 [thread overview]
Message-ID: <47283061.8080501@suse.de> (raw)
In-Reply-To: <47282B1D.8030501@sw.ru>
Vasily Averin wrote:
> Alasdair G Kergon wrote:
>> So currently we treat bounce_pfn as a property that does not need to be
>> propagated through the stack.
>>
>> But is that the right approach?
>> - Is there a blk_queue_bounce() missing either from dm or elsewhere?
>> (And BTW can the bio_alloc() that lurks within lead to deadlock?)
>>
>> Firstly, what's going wrong?
>> - What is the dm table you are using? (output of 'dmsetup table')
>> - Which dm targets and with how many underlying devices?
>> - Which underlying driver?
>> - Is this direct I/O to the block device from userspace, or via some
>> filesystem or what?
>
> On my testnode I have 6 Gb memory (1Gb normal zone for i386 kernels),
> i2o hardware and lvm over i2o.
>
> [root@ts10 ~]# dmsetup table
> vzvg-vz: 0 10289152 linear 80:5 384
> vzvg-vzt: 0 263127040 linear 80:5 10289536
> [root@ts10 ~]# cat /proc/partitions
> major minor #blocks name
>
> 80 0 143374336 i2o/hda
> 80 1 514048 i2o/hda1
> 80 2 4096575 i2o/hda2
> 80 3 2040255 i2o/hda3
> 80 4 1 i2o/hda4
> 80 5 136721151 i2o/hda5
> 253 0 5144576 dm-0
> 253 1 131563520 dm-1
>
> Diotest from LTP test suite with ~1Mb buffer size and files on dm-over-i2o
> paritions corrupts i2o_iop0_msg_inpool slab.
>
> I2o on this node is able to handle only requests with up to 38 segments. Device
> mapper correctly creates such requests and as you know it uses
> max_pfn=BLK_BOUNCE_ANY. When this request translates to underlying device, it
> clones bio and cleans BIO_SEG_VALID flag.
>
> In this way underlying device calls blk_recalc_rq_segments() to recount number
> of segments. However blk_recalc_rq_segments uses bounce_pfn=BLK_BOUNCE_HIGH
> taken from underlying device. As result number of segments become over than
> max_hw_segments limit.
>
> Unfortunately there is not any checks and when i2o driver handles this incorrect
> request it fills the memory out of i2o_iop0_msg_inpool slab.
>
We actually had a similar issue with some raid drivers (gdth iirc), and Neil Brown
did a similar patch for it. These were his comments on it:
>
> dm handles max_hw_segments by using an 'io_restrictions' structure
> that keeps the most restrictive values from all component devices.
>
> So it should not allow more than max_hw_segments.
>
> However I just notices that it does not preserve bounce_pfn as a restriction.
> So when the request gets down to the driver, it may be split up in to more
> segments than was expected up at the dm level.
>
So I guess we should take this.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
WARNING: multiple messages have this Message-ID (diff)
From: Hannes Reinecke <hare@suse.de>
To: device-mapper development <dm-devel@redhat.com>
Cc: agk@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, devel@openvz.org,
stable@kernel.org, Jens Axboe <jens.axboe@oracle.com>,
Milan Broz <mbroz@redhat.com>, Neil F Brown <nfbrown@novell.com>
Subject: Re: [dm-devel] Re: dm: bounce_pfn limit added
Date: Wed, 31 Oct 2007 08:36:01 +0100 [thread overview]
Message-ID: <47283061.8080501@suse.de> (raw)
In-Reply-To: <47282B1D.8030501@sw.ru>
Vasily Averin wrote:
> Alasdair G Kergon wrote:
>> So currently we treat bounce_pfn as a property that does not need to be
>> propagated through the stack.
>>
>> But is that the right approach?
>> - Is there a blk_queue_bounce() missing either from dm or elsewhere?
>> (And BTW can the bio_alloc() that lurks within lead to deadlock?)
>>
>> Firstly, what's going wrong?
>> - What is the dm table you are using? (output of 'dmsetup table')
>> - Which dm targets and with how many underlying devices?
>> - Which underlying driver?
>> - Is this direct I/O to the block device from userspace, or via some
>> filesystem or what?
>
> On my testnode I have 6 Gb memory (1Gb normal zone for i386 kernels),
> i2o hardware and lvm over i2o.
>
> [root@ts10 ~]# dmsetup table
> vzvg-vz: 0 10289152 linear 80:5 384
> vzvg-vzt: 0 263127040 linear 80:5 10289536
> [root@ts10 ~]# cat /proc/partitions
> major minor #blocks name
>
> 80 0 143374336 i2o/hda
> 80 1 514048 i2o/hda1
> 80 2 4096575 i2o/hda2
> 80 3 2040255 i2o/hda3
> 80 4 1 i2o/hda4
> 80 5 136721151 i2o/hda5
> 253 0 5144576 dm-0
> 253 1 131563520 dm-1
>
> Diotest from LTP test suite with ~1Mb buffer size and files on dm-over-i2o
> paritions corrupts i2o_iop0_msg_inpool slab.
>
> I2o on this node is able to handle only requests with up to 38 segments. Device
> mapper correctly creates such requests and as you know it uses
> max_pfn=BLK_BOUNCE_ANY. When this request translates to underlying device, it
> clones bio and cleans BIO_SEG_VALID flag.
>
> In this way underlying device calls blk_recalc_rq_segments() to recount number
> of segments. However blk_recalc_rq_segments uses bounce_pfn=BLK_BOUNCE_HIGH
> taken from underlying device. As result number of segments become over than
> max_hw_segments limit.
>
> Unfortunately there is not any checks and when i2o driver handles this incorrect
> request it fills the memory out of i2o_iop0_msg_inpool slab.
>
We actually had a similar issue with some raid drivers (gdth iirc), and Neil Brown
did a similar patch for it. These were his comments on it:
>
> dm handles max_hw_segments by using an 'io_restrictions' structure
> that keeps the most restrictive values from all component devices.
>
> So it should not allow more than max_hw_segments.
>
> However I just notices that it does not preserve bounce_pfn as a restriction.
> So when the request gets down to the driver, it may be split up in to more
> segments than was expected up at the dm level.
>
So I guess we should take this.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
next prev parent reply other threads:[~2007-10-31 7:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-29 6:31 dm: bounce_pfn limit added Vasily Averin
2007-10-30 20:11 ` Andrew Morton
2007-10-30 20:11 ` Andrew Morton
2007-10-30 23:26 ` Alasdair G Kergon
2007-10-30 23:26 ` [dm-devel] " Alasdair G Kergon
2007-10-31 2:01 ` Alasdair G Kergon
2007-10-31 2:01 ` [dm-devel] " Alasdair G Kergon
2007-10-31 2:11 ` Alasdair G Kergon
2007-10-31 2:11 ` [dm-devel] " Alasdair G Kergon
2007-10-31 7:13 ` Vasily Averin
2007-10-31 7:13 ` [dm-devel] " Vasily Averin
2007-10-31 7:36 ` Hannes Reinecke [this message]
2007-10-31 7:36 ` Hannes Reinecke
2007-10-31 22:00 ` Kiyoshi Ueda
2007-10-31 22:00 ` [dm-devel] " Kiyoshi Ueda
2007-11-01 0:00 ` Alasdair G Kergon
2007-11-01 0:00 ` [dm-devel] " Alasdair G Kergon
2007-11-01 8:00 ` dlm_recvd and dlm_sendd causes high load 董洪乾
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47283061.8080501@suse.de \
--to=hare@suse.de \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=devel@openvz.org \
--cc=dm-devel@redhat.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nfbrown@novell.com \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.