* Re: [dm-devel] dm-crypt: Fix error with too large bios
[not found] ` <alpine.LRH.2.11.1608181742260.10662@mail.ewheeler.net>
@ 2016-08-25 18:34 ` Mikulas Patocka
2016-08-25 20:13 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-25 18:34 UTC (permalink / raw)
To: Eric Wheeler
Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public,
Jens Axboe, linux-block
On Thu, 18 Aug 2016, Eric Wheeler wrote:
> > On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
> >
> > > > > be dm-crypt.c. Maybe you've identified some indirect use of
> > > > > BIO_MAX_SIZE?
> > > >
> > > > I mean the recently introduced BIO_MAX_SIZE in -next tree:
> > > >
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
> > >
> > > The crazy bcache bios striking back once again. I really think it's
> > > harmful having a _MAX value and then having a minor driver
> > > reinterpreting it and sending larger ones. Until we can lift the
> > > maximum limit in general nad have common code exercise it we really need
> > > to stop bcache from sending these instead of littering the tree with
> > > workarounds.
> >
> > The bio_kmalloc function allocates bios with up to 1024 vector entries (as
> > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector
> > entries).
> >
> > Device mapper is using bio_alloc_bioset with a bio set, so it is limited
> > to 256 vector entries, but other kernel users may use bio_kmalloc and
> > create larger bios.
> >
> > So, if you don't want bios with more than 256 vector entries to exist, you
> > should impose this limit in bio_kmalloc (and fix all the callers that use
> > it).
>
> FYI, Kent Overstreet notes this about bcache from the other thread here:
> https://lkml.org/lkml/2016/8/15/620
>
> [paste]
> >> bcache originally had workaround code to split too-large bios when it
> >> first went upstream - that was dropped only after the patches to make
> >> generic_make_request() handle arbitrary size bios went in. So to do what
> >> you're suggesting would mean reverting that bcache patch and bringing that
> >> code back, which from my perspective would be a step in the wrong
> >> direction. I just want to get this over and done with.
> >>
> >> re: interactions with other drivers - bio_clone() has already been changed
> >> to only clone biovecs that are live for current bi_iter, so there
> >> shouldn't be any safety issues. A driver would have to be intentionally
> >> doing its own open coded bio cloning that clones all of bi_io_vec, not
> >> just the active ones - but if they're doing that, they're already broken
> >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that
> >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were
> >> created by bio_clone_fast).
> >>
> >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back
> >> when I was working on immutable biovecs and such back in the day, and I
> >> had to do a fair amount of cleanup/refactoring before that stuff could go
> >> in.
> [/paste]
>
> They are making progress in the patch-v3 thread, so perhaps this can be
> fixed for now in generic_make_request().
>
> --
> Eric Wheeler
Device mapper can't split the bio in generic_make_request - it frees the
md->queue->bio_split bioset, to save one kernel thread per device. Device
mapper uses its own splitting mechanism.
So what is the final decision? - should device mapper split the big bio or
should bcache not submit big bios?
I think splitting big bios in the device mapper is better - simply because
it is much less code than reworking bcache to split bios internally.
BTW. In the device mapper, we have a layer dm-io, that was created to work
around bio size limitations - it accepts unlimited I/O request and splits
it to several bios. When bio size limitations are gone, we could simplify
dm-io too.
Mikulas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dm-devel] dm-crypt: Fix error with too large bios
2016-08-25 18:34 ` [dm-devel] dm-crypt: Fix error with too large bios Mikulas Patocka
@ 2016-08-25 20:13 ` Jens Axboe
2016-08-26 14:05 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2016-08-25 20:13 UTC (permalink / raw)
To: Mikulas Patocka, Eric Wheeler
Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public,
linux-block
On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>
>
> On Thu, 18 Aug 2016, Eric Wheeler wrote:
>
>>> On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>>>>> be dm-crypt.c. Maybe you've identified some indirect use of
>>>>>> BIO_MAX_SIZE?
>>>>>
>>>>> I mean the recently introduced BIO_MAX_SIZE in -next tree:
>>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
>>>>
>>>> The crazy bcache bios striking back once again. I really think it's
>>>> harmful having a _MAX value and then having a minor driver
>>>> reinterpreting it and sending larger ones. Until we can lift the
>>>> maximum limit in general nad have common code exercise it we really need
>>>> to stop bcache from sending these instead of littering the tree with
>>>> workarounds.
>>>
>>> The bio_kmalloc function allocates bios with up to 1024 vector entries (as
>>> opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector
>>> entries).
>>>
>>> Device mapper is using bio_alloc_bioset with a bio set, so it is limited
>>> to 256 vector entries, but other kernel users may use bio_kmalloc and
>>> create larger bios.
>>>
>>> So, if you don't want bios with more than 256 vector entries to exist, you
>>> should impose this limit in bio_kmalloc (and fix all the callers that use
>>> it).
>>
>> FYI, Kent Overstreet notes this about bcache from the other thread here:
>> https://lkml.org/lkml/2016/8/15/620
>>
>> [paste]
>>>> bcache originally had workaround code to split too-large bios when it
>>>> first went upstream - that was dropped only after the patches to make
>>>> generic_make_request() handle arbitrary size bios went in. So to do what
>>>> you're suggesting would mean reverting that bcache patch and bringing that
>>>> code back, which from my perspective would be a step in the wrong
>>>> direction. I just want to get this over and done with.
>>>>
>>>> re: interactions with other drivers - bio_clone() has already been changed
>>>> to only clone biovecs that are live for current bi_iter, so there
>>>> shouldn't be any safety issues. A driver would have to be intentionally
>>>> doing its own open coded bio cloning that clones all of bi_io_vec, not
>>>> just the active ones - but if they're doing that, they're already broken
>>>> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that
>>>> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were
>>>> created by bio_clone_fast).
>>>>
>>>> And the cloning and bi_vcnt usage stuff I audited very thoroughly back
>>>> when I was working on immutable biovecs and such back in the day, and I
>>>> had to do a fair amount of cleanup/refactoring before that stuff could go
>>>> in.
>> [/paste]
>>
>> They are making progress in the patch-v3 thread, so perhaps this can be
>> fixed for now in generic_make_request().
>>
>> --
>> Eric Wheeler
>
> Device mapper can't split the bio in generic_make_request - it frees the
> md->queue->bio_split bioset, to save one kernel thread per device. Device
> mapper uses its own splitting mechanism.
>
> So what is the final decision? - should device mapper split the big bio or
> should bcache not submit big bios?
>
> I think splitting big bios in the device mapper is better - simply because
> it is much less code than reworking bcache to split bios internally.
>
> BTW. In the device mapper, we have a layer dm-io, that was created to work
> around bio size limitations - it accepts unlimited I/O request and splits
> it to several bios. When bio size limitations are gone, we could simplify
> dm-io too.
The patch from Ming Lei was applied for 4.8 the other day.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-25 20:13 ` Jens Axboe
@ 2016-08-26 14:05 ` Mike Snitzer
2016-08-27 15:09 ` Mikulas Patocka
0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2016-08-26 14:05 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
johnstonj.public, linux-block
On Thu, Aug 25 2016 at 4:13pm -0400,
Jens Axboe <axboe@kernel.dk> wrote:
> On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> >
> >Device mapper can't split the bio in generic_make_request - it frees the
> >md->queue->bio_split bioset, to save one kernel thread per device. Device
> >mapper uses its own splitting mechanism.
> >
> >So what is the final decision? - should device mapper split the big bio or
> >should bcache not submit big bios?
> >
> >I think splitting big bios in the device mapper is better - simply because
> >it is much less code than reworking bcache to split bios internally.
> >
> >BTW. In the device mapper, we have a layer dm-io, that was created to work
> >around bio size limitations - it accepts unlimited I/O request and splits
> >it to several bios. When bio size limitations are gone, we could simplify
> >dm-io too.
>
> The patch from Ming Lei was applied for 4.8 the other day.
See linux-block commit:
4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-26 14:05 ` Mike Snitzer
@ 2016-08-27 15:09 ` Mikulas Patocka
2016-08-29 5:22 ` Ming Lei
2016-08-29 18:19 ` Mike Snitzer
0 siblings, 2 replies; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-27 15:09 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
johnstonj.public, linux-block
On Fri, 26 Aug 2016, Mike Snitzer wrote:
> On Thu, Aug 25 2016 at 4:13pm -0400,
> Jens Axboe <axboe@kernel.dk> wrote:
>
> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > >
> > >Device mapper can't split the bio in generic_make_request - it frees the
> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > >mapper uses its own splitting mechanism.
> > >
> > >So what is the final decision? - should device mapper split the big bio or
> > >should bcache not submit big bios?
> > >
> > >I think splitting big bios in the device mapper is better - simply because
> > >it is much less code than reworking bcache to split bios internally.
> > >
> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > >around bio size limitations - it accepts unlimited I/O request and splits
> > >it to several bios. When bio size limitations are gone, we could simplify
> > >dm-io too.
> >
> > The patch from Ming Lei was applied for 4.8 the other day.
>
> See linux-block commit:
> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
But this patch won't work for device mapper, blk_bio_segment_split is
called from blk_queue_split and device mapper doesn't use blk_queue_split
(it can't because it frees queue->bio_split).
We still need these two patches:
https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
Mikulas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-27 15:09 ` Mikulas Patocka
@ 2016-08-29 5:22 ` Ming Lei
2016-08-29 21:57 ` Mikulas Patocka
2016-08-29 18:19 ` Mike Snitzer
1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2016-08-29 5:22 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 26 Aug 2016, Mike Snitzer wrote:
>
>> On Thu, Aug 25 2016 at 4:13pm -0400,
>> Jens Axboe <axboe@kernel.dk> wrote:
>>
>> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>> > >
>> > >Device mapper can't split the bio in generic_make_request - it frees the
>> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
>> > >mapper uses its own splitting mechanism.
>> > >
>> > >So what is the final decision? - should device mapper split the big bio or
>> > >should bcache not submit big bios?
>> > >
>> > >I think splitting big bios in the device mapper is better - simply because
>> > >it is much less code than reworking bcache to split bios internally.
>> > >
>> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
>> > >around bio size limitations - it accepts unlimited I/O request and splits
>> > >it to several bios. When bio size limitations are gone, we could simplify
>> > >dm-io too.
>> >
>> > The patch from Ming Lei was applied for 4.8 the other day.
>>
>> See linux-block commit:
>> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
>
> But this patch won't work for device mapper, blk_bio_segment_split is
> called from blk_queue_split and device mapper doesn't use blk_queue_split
> (it can't because it frees queue->bio_split).
>
> We still need these two patches:
> https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
About the 2nd patch, it might not be good enough because in theory
a small size bio still may include big bvecs, such as, each bvec points
to 512byte buffer, so strictly speaking the bvec number should
be checked instead of bio size.
--
Ming Lei
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-27 15:09 ` Mikulas Patocka
2016-08-29 5:22 ` Ming Lei
@ 2016-08-29 18:19 ` Mike Snitzer
2016-08-29 22:01 ` Mikulas Patocka
1 sibling, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2016-08-29 18:19 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
johnstonj.public, linux-block
On Sat, Aug 27 2016 at 11:09am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 26 Aug 2016, Mike Snitzer wrote:
>
> > On Thu, Aug 25 2016 at 4:13pm -0400,
> > Jens Axboe <axboe@kernel.dk> wrote:
> >
> > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > >
> > > >Device mapper can't split the bio in generic_make_request - it frees the
> > > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > > >mapper uses its own splitting mechanism.
> > > >
> > > >So what is the final decision? - should device mapper split the big bio or
> > > >should bcache not submit big bios?
> > > >
> > > >I think splitting big bios in the device mapper is better - simply because
> > > >it is much less code than reworking bcache to split bios internally.
> > > >
> > > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > > >around bio size limitations - it accepts unlimited I/O request and splits
> > > >it to several bios. When bio size limitations are gone, we could simplify
> > > >dm-io too.
> > >
> > > The patch from Ming Lei was applied for 4.8 the other day.
> >
> > See linux-block commit:
> > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
>
> But this patch won't work for device mapper, blk_bio_segment_split is
> called from blk_queue_split and device mapper doesn't use blk_queue_split
> (it can't because it frees queue->bio_split).
>
> We still need these two patches:
> https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
I'm left wondering whether it'd be better to revert commit dbba42d8a9e
("dm: eliminate unused "bioset" process for each bio-based DM device")
And also look for other areas of DM that could leverage the block core's
relatively new splitting capabilities.
Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
code in DM targets because DM is so "special" that it doesn't need the
block core's splitting (even though it clearly would now benefit).
Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-29 5:22 ` Ming Lei
@ 2016-08-29 21:57 ` Mikulas Patocka
2016-08-30 7:33 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-29 21:57 UTC (permalink / raw)
To: Ming Lei
Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Mon, 29 Aug 2016, Ming Lei wrote:
> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
> >
> >> On Thu, Aug 25 2016 at 4:13pm -0400,
> >> Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> >> > >
> >> > >Device mapper can't split the bio in generic_make_request - it frees the
> >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> >> > >mapper uses its own splitting mechanism.
> >> > >
> >> > >So what is the final decision? - should device mapper split the big bio or
> >> > >should bcache not submit big bios?
> >> > >
> >> > >I think splitting big bios in the device mapper is better - simply because
> >> > >it is much less code than reworking bcache to split bios internally.
> >> > >
> >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> >> > >around bio size limitations - it accepts unlimited I/O request and splits
> >> > >it to several bios. When bio size limitations are gone, we could simplify
> >> > >dm-io too.
> >> >
> >> > The patch from Ming Lei was applied for 4.8 the other day.
> >>
> >> See linux-block commit:
> >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> >
> > But this patch won't work for device mapper, blk_bio_segment_split is
> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > (it can't because it frees queue->bio_split).
> >
> > We still need these two patches:
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
>
> About the 2nd patch, it might not be good enough because in theory
> a small size bio still may include big bvecs, such as, each bvec points
> to 512byte buffer, so strictly speaking the bvec number should
> be checked instead of bio size.
>
> Ming Lei
This is not a problem.
dm-crypt allocates new pages for the bio and copies (and encrypts) the
data from the original location to the new pages.
So yes, the original bio may have a long vector with many small fragments,
but the newly allocated memory will be allocated in full pages and the
outgoing bio's vector will have full pages.
Mikulas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-29 18:19 ` Mike Snitzer
@ 2016-08-29 22:01 ` Mikulas Patocka
0 siblings, 0 replies; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-29 22:01 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
johnstonj.public, linux-block
On Mon, 29 Aug 2016, Mike Snitzer wrote:
> On Sat, Aug 27 2016 at 11:09am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> >
> >
> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
> >
> > > On Thu, Aug 25 2016 at 4:13pm -0400,
> > > Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > > >
> > > > >Device mapper can't split the bio in generic_make_request - it frees the
> > > > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > > > >mapper uses its own splitting mechanism.
> > > > >
> > > > >So what is the final decision? - should device mapper split the big bio or
> > > > >should bcache not submit big bios?
> > > > >
> > > > >I think splitting big bios in the device mapper is better - simply because
> > > > >it is much less code than reworking bcache to split bios internally.
> > > > >
> > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > > > >around bio size limitations - it accepts unlimited I/O request and splits
> > > > >it to several bios. When bio size limitations are gone, we could simplify
> > > > >dm-io too.
> > > >
> > > > The patch from Ming Lei was applied for 4.8 the other day.
> > >
> > > See linux-block commit:
> > > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> >
> > But this patch won't work for device mapper, blk_bio_segment_split is
> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > (it can't because it frees queue->bio_split).
> >
> > We still need these two patches:
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
>
> I'm left wondering whether it'd be better to revert commit dbba42d8a9e
> ("dm: eliminate unused "bioset" process for each bio-based DM device")
That would create one more process per dm device.
There is no need to create another process if the bio can be split in the
device mapper.
> And also look for other areas of DM that could leverage the block core's
> relatively new splitting capabilities.
>
> Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
> code in DM targets because DM is so "special" that it doesn't need the
> block core's splitting (even though it clearly would now benefit).
It would be possible to split bios on BIO_MAX_PAGES boundary in the dm
core. But I checked all the dm targets and I found problem only in
dm-crypt and dm-log-writes. So there is no need to split bio bios for the
other targets.
> Mike
Mikulas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-29 21:57 ` Mikulas Patocka
@ 2016-08-30 7:33 ` Ming Lei
2016-08-30 12:19 ` Mikulas Patocka
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2016-08-30 7:33 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Mon, 29 Aug 2016, Ming Lei wrote:
>
>> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >
>> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
>> >
>> >> On Thu, Aug 25 2016 at 4:13pm -0400,
>> >> Jens Axboe <axboe@kernel.dk> wrote:
>> >>
>> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>> >> > >
>> >> > >Device mapper can't split the bio in generic_make_request - it frees the
>> >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
>> >> > >mapper uses its own splitting mechanism.
>> >> > >
>> >> > >So what is the final decision? - should device mapper split the big bio or
>> >> > >should bcache not submit big bios?
>> >> > >
>> >> > >I think splitting big bios in the device mapper is better - simply because
>> >> > >it is much less code than reworking bcache to split bios internally.
>> >> > >
>> >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
>> >> > >around bio size limitations - it accepts unlimited I/O request and splits
>> >> > >it to several bios. When bio size limitations are gone, we could simplify
>> >> > >dm-io too.
>> >> >
>> >> > The patch from Ming Lei was applied for 4.8 the other day.
>> >>
>> >> See linux-block commit:
>> >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
>> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
>> >
>> > But this patch won't work for device mapper, blk_bio_segment_split is
>> > called from blk_queue_split and device mapper doesn't use blk_queue_split
>> > (it can't because it frees queue->bio_split).
>> >
>> > We still need these two patches:
>> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
>> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
>>
>> About the 2nd patch, it might not be good enough because in theory
>> a small size bio still may include big bvecs, such as, each bvec points
>> to 512byte buffer, so strictly speaking the bvec number should
>> be checked instead of bio size.
>>
>> Ming Lei
>
> This is not a problem.
I meant the following code in your 2nd patch:
+ if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
+ (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
+ dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
And the check on .bi_size may not work.
>
> dm-crypt allocates new pages for the bio and copies (and encrypts) the
> data from the original location to the new pages.
>
> So yes, the original bio may have a long vector with many small fragments,
> but the newly allocated memory will be allocated in full pages and the
> outgoing bio's vector will have full pages.
>
> Mikulas
--
Ming Lei
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-30 7:33 ` Ming Lei
@ 2016-08-30 12:19 ` Mikulas Patocka
2016-08-30 20:57 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-30 12:19 UTC (permalink / raw)
To: Ming Lei
Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Tue, 30 Aug 2016, Ming Lei wrote:
> On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Mon, 29 Aug 2016, Ming Lei wrote:
> >
> >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> >
> >> > But this patch won't work for device mapper, blk_bio_segment_split is
> >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> >> > (it can't because it frees queue->bio_split).
> >> >
> >> > We still need these two patches:
> >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> >>
> >> About the 2nd patch, it might not be good enough because in theory
> >> a small size bio still may include big bvecs, such as, each bvec points
> >> to 512byte buffer, so strictly speaking the bvec number should
> >> be checked instead of bio size.
> >>
> >> Ming Lei
> >
> > This is not a problem.
>
> I meant the following code in your 2nd patch:
>
> + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
>
> And the check on .bi_size may not work.
kcryptd_crypt_write_convert calls:
crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)
crypt_alloc_buffer does:
unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will
be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will
succeed.
(BTW. BIO_MAX_SIZE was removed in the current kernel, we should use
(BIO_MAX_PAGES << PAGE_SHIFT) instead).
Mikulas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-30 12:19 ` Mikulas Patocka
@ 2016-08-30 20:57 ` Mike Snitzer
2016-08-30 22:27 ` Mikulas Patocka
0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2016-08-30 20:57 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Tue, Aug 30 2016 at 8:19P -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Tue, 30 Aug 2016, Ming Lei wrote:
>
> > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > >
> > > On Mon, 29 Aug 2016, Ming Lei wrote:
> > >
> > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >> >
> > >> > But this patch won't work for device mapper, blk_bio_segment_split is
> > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > >> > (it can't because it frees queue->bio_split).
> > >> >
> > >> > We still need these two patches:
> > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> > >>
> > >> About the 2nd patch, it might not be good enough because in theory
> > >> a small size bio still may include big bvecs, such as, each bvec points
> > >> to 512byte buffer, so strictly speaking the bvec number should
> > >> be checked instead of bio size.
> > >>
> > >> Ming Lei
> > >
> > > This is not a problem.
> >
> > I meant the following code in your 2nd patch:
> >
> > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> > + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> >
> > And the check on .bi_size may not work.
>
> kcryptd_crypt_write_convert calls:
> crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)
>
> crypt_alloc_buffer does:
> unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
>
> So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will
> be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will
> succeed.
>
> (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use
> (BIO_MAX_PAGES << PAGE_SHIFT) instead).
Is this revised patch OK with you?
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 30 Aug 2016 16:38:42 -0400
Subject: [PATCH] dm crypt: fix error with too large bcache bios
When dm-crypt processes writes, it allocates a new bio in
crypt_alloc_buffer(). The bio is allocated from a bio set and it can
have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
larger if it was allocated by bcache. If the incoming bio is larger,
bio_alloc_bioset() fails and an error is returned.
To avoid the error, we test for a too large bio in the function
crypt_map() and use dm_accept_partial_bio() to split the bio.
dm_accept_partial_bio() trims the current bio to the desired size and
asks DM core to send another bio with the rest of the data.
This fix is wrapped with a check for CONFIG_BCACHE because there
currently isn't any other code that generates too large bios. So unless
bcache is configured there is no point wasting time making this check.
Signed-off-by: Mikulas Patocka <mpatocka redhat com>
Cc: stable@vger.kernel.org # v3.16+
---
drivers/md/dm-crypt.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index eedba67..743f548 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}
+#ifdef CONFIG_BCACHE
+ if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
+ bio_data_dir(bio) == WRITE)
+ dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
+#endif
+
io = dm_per_bio_data(bio, cc->per_bio_data_size);
crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
io->ctx.req = (struct skcipher_request *)(io + 1);
--
2.7.4 (Apple Git-66)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-30 20:57 ` Mike Snitzer
@ 2016-08-30 22:27 ` Mikulas Patocka
2016-08-31 6:26 ` Milan Broz
0 siblings, 1 reply; 14+ messages in thread
From: Mikulas Patocka @ 2016-08-30 22:27 UTC (permalink / raw)
To: Mike Snitzer
Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block
On Tue, 30 Aug 2016, Mike Snitzer wrote:
> On Tue, Aug 30 2016 at 8:19P -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> >
> >
> > On Tue, 30 Aug 2016, Ming Lei wrote:
> >
> > > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >
> > > >
> > > > On Mon, 29 Aug 2016, Ming Lei wrote:
> > > >
> > > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >> >
> > > >> > But this patch won't work for device mapper, blk_bio_segment_split is
> > > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > > >> > (it can't because it frees queue->bio_split).
> > > >> >
> > > >> > We still need these two patches:
> > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> > > >>
> > > >> About the 2nd patch, it might not be good enough because in theory
> > > >> a small size bio still may include big bvecs, such as, each bvec points
> > > >> to 512byte buffer, so strictly speaking the bvec number should
> > > >> be checked instead of bio size.
> > > >>
> > > >> Ming Lei
> > > >
> > > > This is not a problem.
> > >
> > > I meant the following code in your 2nd patch:
> > >
> > > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> > > + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> > > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> > >
> > > And the check on .bi_size may not work.
> >
> > kcryptd_crypt_write_convert calls:
> > crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)
> >
> > crypt_alloc_buffer does:
> > unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
> >
> > So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will
> > be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will
> > succeed.
> >
> > (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use
> > (BIO_MAX_PAGES << PAGE_SHIFT) instead).
>
> Is this revised patch OK with you?
Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big
bio, not just bcache.
That one test has no performance impact, there is no need to hide it
behind #ifdef.
Mikulas
> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 30 Aug 2016 16:38:42 -0400
> Subject: [PATCH] dm crypt: fix error with too large bcache bios
>
> When dm-crypt processes writes, it allocates a new bio in
> crypt_alloc_buffer(). The bio is allocated from a bio set and it can
> have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
> larger if it was allocated by bcache. If the incoming bio is larger,
> bio_alloc_bioset() fails and an error is returned.
>
> To avoid the error, we test for a too large bio in the function
> crypt_map() and use dm_accept_partial_bio() to split the bio.
> dm_accept_partial_bio() trims the current bio to the desired size and
> asks DM core to send another bio with the rest of the data.
>
> This fix is wrapped with a check for CONFIG_BCACHE because there
> currently isn't any other code that generates too large bios. So unless
> bcache is configured there is no point wasting time making this check.
>
> Signed-off-by: Mikulas Patocka <mpatocka redhat com>
> Cc: stable@vger.kernel.org # v3.16+
> ---
> drivers/md/dm-crypt.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index eedba67..743f548 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
> return DM_MAPIO_REMAPPED;
> }
>
> +#ifdef CONFIG_BCACHE
> + if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
> + bio_data_dir(bio) == WRITE)
> + dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
> +#endif
> +
> io = dm_per_bio_data(bio, cc->per_bio_data_size);
> crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
> io->ctx.req = (struct skcipher_request *)(io + 1);
> --
> 2.7.4 (Apple Git-66)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-30 22:27 ` Mikulas Patocka
@ 2016-08-31 6:26 ` Milan Broz
2016-08-31 13:39 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Milan Broz @ 2016-08-31 6:26 UTC (permalink / raw)
To: Mikulas Patocka, Mike Snitzer
Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
johnstonj.public, linux-block
On 08/31/2016 12:27 AM, Mikulas Patocka wrote:
...
>
> Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big
> bio, not just bcache.
Yes. Please, do not hide it behind #ifdef.
If it is in code, it should be enabled always.
There can third party modules or some new code appears and creating strange
config dependence only adds more problems later.
Milan
>
> That one test has no performance impact, there is no need to hide it
> behind #ifdef.
>
> Mikulas
>
>> From: Mikulas Patocka <mpatocka@redhat.com>
>> Date: Tue, 30 Aug 2016 16:38:42 -0400
>> Subject: [PATCH] dm crypt: fix error with too large bcache bios
>>
>> When dm-crypt processes writes, it allocates a new bio in
>> crypt_alloc_buffer(). The bio is allocated from a bio set and it can
>> have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
>> larger if it was allocated by bcache. If the incoming bio is larger,
>> bio_alloc_bioset() fails and an error is returned.
>>
>> To avoid the error, we test for a too large bio in the function
>> crypt_map() and use dm_accept_partial_bio() to split the bio.
>> dm_accept_partial_bio() trims the current bio to the desired size and
>> asks DM core to send another bio with the rest of the data.
>>
>> This fix is wrapped with a check for CONFIG_BCACHE because there
>> currently isn't any other code that generates too large bios. So unless
>> bcache is configured there is no point wasting time making this check.
>>
>> Signed-off-by: Mikulas Patocka <mpatocka redhat com>
>> Cc: stable@vger.kernel.org # v3.16+
>> ---
>> drivers/md/dm-crypt.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>> index eedba67..743f548 100644
>> --- a/drivers/md/dm-crypt.c
>> +++ b/drivers/md/dm-crypt.c
>> @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
>> return DM_MAPIO_REMAPPED;
>> }
>>
>> +#ifdef CONFIG_BCACHE
>> + if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
>> + bio_data_dir(bio) == WRITE)
>> + dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
>> +#endif
>> +
>> io = dm_per_bio_data(bio, cc->per_bio_data_size);
>> crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
>> io->ctx.req = (struct skcipher_request *)(io + 1);
>> --
>> 2.7.4 (Apple Git-66)
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios
2016-08-31 6:26 ` Milan Broz
@ 2016-08-31 13:39 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2016-08-31 13:39 UTC (permalink / raw)
To: Milan Broz
Cc: Mikulas Patocka, Ming Lei, Jens Axboe, Eric Wheeler,
Christoph Hellwig, johnstonj.public, linux-block
On Wed, Aug 31 2016 at 2:26am -0400,
Milan Broz <gmazyland@gmail.com> wrote:
> On 08/31/2016 12:27 AM, Mikulas Patocka wrote:
>
> ...
> >
> > Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big
> > bio, not just bcache.
>
> Yes. Please, do not hide it behind #ifdef.
> If it is in code, it should be enabled always.
>
> There can third party modules or some new code appears and creating strange
> config dependence only adds more problems later.
I did last night, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=4e870e948fbabf62b78e8410f04c67703e7c816b
Will go to Linus for v4.8-rc5 by the end of the week.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-08-31 13:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LRH.2.11.1608102050200.10662@mail.ewheeler.net>
[not found] ` <alpine.LRH.2.02.1608131342250.3291@file01.intranet.prod.int.rdu2.redhat.com>
[not found] ` <20160814000740.GA5317@redhat.com>
[not found] ` <alpine.LRH.2.02.1608151245120.13026@file01.intranet.prod.int.rdu2.redhat.com>
[not found] ` <alpine.LRH.2.11.1608181742260.10662@mail.ewheeler.net>
2016-08-25 18:34 ` [dm-devel] dm-crypt: Fix error with too large bios Mikulas Patocka
2016-08-25 20:13 ` Jens Axboe
2016-08-26 14:05 ` Mike Snitzer
2016-08-27 15:09 ` Mikulas Patocka
2016-08-29 5:22 ` Ming Lei
2016-08-29 21:57 ` Mikulas Patocka
2016-08-30 7:33 ` Ming Lei
2016-08-30 12:19 ` Mikulas Patocka
2016-08-30 20:57 ` Mike Snitzer
2016-08-30 22:27 ` Mikulas Patocka
2016-08-31 6:26 ` Milan Broz
2016-08-31 13:39 ` Mike Snitzer
2016-08-29 18:19 ` Mike Snitzer
2016-08-29 22:01 ` Mikulas Patocka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).