* Re: [dm-devel] dm-crypt: Fix error with too large bios [not found] ` <alpine.LRH.2.11.1608181742260.10662@mail.ewheeler.net> @ 2016-08-25 18:34 ` Mikulas Patocka 2016-08-25 20:13 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2016-08-25 18:34 UTC (permalink / raw) To: Eric Wheeler Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public, Jens Axboe, linux-block On Thu, 18 Aug 2016, Eric Wheeler wrote: > > On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > be dm-crypt.c. Maybe you've identified some indirect use of > > > > > BIO_MAX_SIZE? > > > > > > > > I mean the recently introduced BIO_MAX_SIZE in -next tree: > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486 > > > > > > The crazy bcache bios striking back once again. I really think it's > > > harmful having a _MAX value and then having a minor driver > > > reinterpreting it and sending larger ones. Until we can lift the > > > maximum limit in general nad have common code exercise it we really need > > > to stop bcache from sending these instead of littering the tree with > > > workarounds. > > > > The bio_kmalloc function allocates bios with up to 1024 vector entries (as > > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector > > entries). > > > > Device mapper is using bio_alloc_bioset with a bio set, so it is limited > > to 256 vector entries, but other kernel users may use bio_kmalloc and > > create larger bios. > > > > So, if you don't want bios with more than 256 vector entries to exist, you > > should impose this limit in bio_kmalloc (and fix all the callers that use > > it). > > FYI, Kent Overstreet notes this about bcache from the other thread here: > https://lkml.org/lkml/2016/8/15/620 > > [paste] > >> bcache originally had workaround code to split too-large bios when it > >> first went upstream - that was dropped only after the patches to make > >> generic_make_request() handle arbitrary size bios went in. So to do what > >> you're suggesting would mean reverting that bcache patch and bringing that > >> code back, which from my perspective would be a step in the wrong > >> direction. I just want to get this over and done with. > >> > >> re: interactions with other drivers - bio_clone() has already been changed > >> to only clone biovecs that are live for current bi_iter, so there > >> shouldn't be any safety issues. A driver would have to be intentionally > >> doing its own open coded bio cloning that clones all of bi_io_vec, not > >> just the active ones - but if they're doing that, they're already broken > >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that > >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were > >> created by bio_clone_fast). > >> > >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back > >> when I was working on immutable biovecs and such back in the day, and I > >> had to do a fair amount of cleanup/refactoring before that stuff could go > >> in. > [/paste] > > They are making progress in the patch-v3 thread, so perhaps this can be > fixed for now in generic_make_request(). > > -- > Eric Wheeler Device mapper can't split the bio in generic_make_request - it frees the md->queue->bio_split bioset, to save one kernel thread per device. Device mapper uses its own splitting mechanism. So what is the final decision? - should device mapper split the big bio or should bcache not submit big bios? I think splitting big bios in the device mapper is better - simply because it is much less code than reworking bcache to split bios internally. BTW. In the device mapper, we have a layer dm-io, that was created to work around bio size limitations - it accepts unlimited I/O request and splits it to several bios. When bio size limitations are gone, we could simplify dm-io too. Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dm-devel] dm-crypt: Fix error with too large bios 2016-08-25 18:34 ` [dm-devel] dm-crypt: Fix error with too large bios Mikulas Patocka @ 2016-08-25 20:13 ` Jens Axboe 2016-08-26 14:05 ` Mike Snitzer 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2016-08-25 20:13 UTC (permalink / raw) To: Mikulas Patocka, Eric Wheeler Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public, linux-block On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > > > On Thu, 18 Aug 2016, Eric Wheeler wrote: > >>> On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote: >>> >>>>>> be dm-crypt.c. Maybe you've identified some indirect use of >>>>>> BIO_MAX_SIZE? >>>>> >>>>> I mean the recently introduced BIO_MAX_SIZE in -next tree: >>>>> >>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486 >>>> >>>> The crazy bcache bios striking back once again. I really think it's >>>> harmful having a _MAX value and then having a minor driver >>>> reinterpreting it and sending larger ones. Until we can lift the >>>> maximum limit in general nad have common code exercise it we really need >>>> to stop bcache from sending these instead of littering the tree with >>>> workarounds. >>> >>> The bio_kmalloc function allocates bios with up to 1024 vector entries (as >>> opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector >>> entries). >>> >>> Device mapper is using bio_alloc_bioset with a bio set, so it is limited >>> to 256 vector entries, but other kernel users may use bio_kmalloc and >>> create larger bios. >>> >>> So, if you don't want bios with more than 256 vector entries to exist, you >>> should impose this limit in bio_kmalloc (and fix all the callers that use >>> it). >> >> FYI, Kent Overstreet notes this about bcache from the other thread here: >> https://lkml.org/lkml/2016/8/15/620 >> >> [paste] >>>> bcache originally had workaround code to split too-large bios when it >>>> first went upstream - that was dropped only after the patches to make >>>> generic_make_request() handle arbitrary size bios went in. So to do what >>>> you're suggesting would mean reverting that bcache patch and bringing that >>>> code back, which from my perspective would be a step in the wrong >>>> direction. I just want to get this over and done with. >>>> >>>> re: interactions with other drivers - bio_clone() has already been changed >>>> to only clone biovecs that are live for current bi_iter, so there >>>> shouldn't be any safety issues. A driver would have to be intentionally >>>> doing its own open coded bio cloning that clones all of bi_io_vec, not >>>> just the active ones - but if they're doing that, they're already broken >>>> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that >>>> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were >>>> created by bio_clone_fast). >>>> >>>> And the cloning and bi_vcnt usage stuff I audited very thoroughly back >>>> when I was working on immutable biovecs and such back in the day, and I >>>> had to do a fair amount of cleanup/refactoring before that stuff could go >>>> in. >> [/paste] >> >> They are making progress in the patch-v3 thread, so perhaps this can be >> fixed for now in generic_make_request(). >> >> -- >> Eric Wheeler > > Device mapper can't split the bio in generic_make_request - it frees the > md->queue->bio_split bioset, to save one kernel thread per device. Device > mapper uses its own splitting mechanism. > > So what is the final decision? - should device mapper split the big bio or > should bcache not submit big bios? > > I think splitting big bios in the device mapper is better - simply because > it is much less code than reworking bcache to split bios internally. > > BTW. In the device mapper, we have a layer dm-io, that was created to work > around bio size limitations - it accepts unlimited I/O request and splits > it to several bios. When bio size limitations are gone, we could simplify > dm-io too. The patch from Ming Lei was applied for 4.8 the other day. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-25 20:13 ` Jens Axboe @ 2016-08-26 14:05 ` Mike Snitzer 2016-08-27 15:09 ` Mikulas Patocka 0 siblings, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2016-08-26 14:05 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel, johnstonj.public, linux-block On Thu, Aug 25 2016 at 4:13pm -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > > > >Device mapper can't split the bio in generic_make_request - it frees the > >md->queue->bio_split bioset, to save one kernel thread per device. Device > >mapper uses its own splitting mechanism. > > > >So what is the final decision? - should device mapper split the big bio or > >should bcache not submit big bios? > > > >I think splitting big bios in the device mapper is better - simply because > >it is much less code than reworking bcache to split bios internally. > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work > >around bio size limitations - it accepts unlimited I/O request and splits > >it to several bios. When bio size limitations are gone, we could simplify > >dm-io too. > > The patch from Ming Lei was applied for 4.8 the other day. See linux-block commit: 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-26 14:05 ` Mike Snitzer @ 2016-08-27 15:09 ` Mikulas Patocka 2016-08-29 5:22 ` Ming Lei 2016-08-29 18:19 ` Mike Snitzer 0 siblings, 2 replies; 14+ messages in thread From: Mikulas Patocka @ 2016-08-27 15:09 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel, johnstonj.public, linux-block On Fri, 26 Aug 2016, Mike Snitzer wrote: > On Thu, Aug 25 2016 at 4:13pm -0400, > Jens Axboe <axboe@kernel.dk> wrote: > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > > > > > >Device mapper can't split the bio in generic_make_request - it frees the > > >md->queue->bio_split bioset, to save one kernel thread per device. Device > > >mapper uses its own splitting mechanism. > > > > > >So what is the final decision? - should device mapper split the big bio or > > >should bcache not submit big bios? > > > > > >I think splitting big bios in the device mapper is better - simply because > > >it is much less code than reworking bcache to split bios internally. > > > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work > > >around bio size limitations - it accepts unlimited I/O request and splits > > >it to several bios. When bio size limitations are gone, we could simplify > > >dm-io too. > > > > The patch from Ming Lei was applied for 4.8 the other day. > > See linux-block commit: > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c But this patch won't work for device mapper, blk_bio_segment_split is called from blk_queue_split and device mapper doesn't use blk_queue_split (it can't because it frees queue->bio_split). We still need these two patches: https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-27 15:09 ` Mikulas Patocka @ 2016-08-29 5:22 ` Ming Lei 2016-08-29 21:57 ` Mikulas Patocka 2016-08-29 18:19 ` Mike Snitzer 1 sibling, 1 reply; 14+ messages in thread From: Ming Lei @ 2016-08-29 5:22 UTC (permalink / raw) To: Mikulas Patocka Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 26 Aug 2016, Mike Snitzer wrote: > >> On Thu, Aug 25 2016 at 4:13pm -0400, >> Jens Axboe <axboe@kernel.dk> wrote: >> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: >> > > >> > >Device mapper can't split the bio in generic_make_request - it frees the >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device >> > >mapper uses its own splitting mechanism. >> > > >> > >So what is the final decision? - should device mapper split the big bio or >> > >should bcache not submit big bios? >> > > >> > >I think splitting big bios in the device mapper is better - simply because >> > >it is much less code than reworking bcache to split bios internally. >> > > >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work >> > >around bio size limitations - it accepts unlimited I/O request and splits >> > >it to several bios. When bio size limitations are gone, we could simplify >> > >dm-io too. >> > >> > The patch from Ming Lei was applied for 4.8 the other day. >> >> See linux-block commit: >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c > > But this patch won't work for device mapper, blk_bio_segment_split is > called from blk_queue_split and device mapper doesn't use blk_queue_split > (it can't because it frees queue->bio_split). > > We still need these two patches: > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html About the 2nd patch, it might not be good enough because in theory a small size bio still may include big bvecs, such as, each bvec points to 512byte buffer, so strictly speaking the bvec number should be checked instead of bio size. -- Ming Lei ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-29 5:22 ` Ming Lei @ 2016-08-29 21:57 ` Mikulas Patocka 2016-08-30 7:33 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2016-08-29 21:57 UTC (permalink / raw) To: Ming Lei Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Mon, 29 Aug 2016, Ming Lei wrote: > On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 26 Aug 2016, Mike Snitzer wrote: > > > >> On Thu, Aug 25 2016 at 4:13pm -0400, > >> Jens Axboe <axboe@kernel.dk> wrote: > >> > >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > >> > > > >> > >Device mapper can't split the bio in generic_make_request - it frees the > >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device > >> > >mapper uses its own splitting mechanism. > >> > > > >> > >So what is the final decision? - should device mapper split the big bio or > >> > >should bcache not submit big bios? > >> > > > >> > >I think splitting big bios in the device mapper is better - simply because > >> > >it is much less code than reworking bcache to split bios internally. > >> > > > >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work > >> > >around bio size limitations - it accepts unlimited I/O request and splits > >> > >it to several bios. When bio size limitations are gone, we could simplify > >> > >dm-io too. > >> > > >> > The patch from Ming Lei was applied for 4.8 the other day. > >> > >> See linux-block commit: > >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs > >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c > > > > But this patch won't work for device mapper, blk_bio_segment_split is > > called from blk_queue_split and device mapper doesn't use blk_queue_split > > (it can't because it frees queue->bio_split). > > > > We still need these two patches: > > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html > > About the 2nd patch, it might not be good enough because in theory > a small size bio still may include big bvecs, such as, each bvec points > to 512byte buffer, so strictly speaking the bvec number should > be checked instead of bio size. > > Ming Lei This is not a problem. dm-crypt allocates new pages for the bio and copies (and encrypts) the data from the original location to the new pages. So yes, the original bio may have a long vector with many small fragments, but the newly allocated memory will be allocated in full pages and the outgoing bio's vector will have full pages. Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-29 21:57 ` Mikulas Patocka @ 2016-08-30 7:33 ` Ming Lei 2016-08-30 12:19 ` Mikulas Patocka 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2016-08-30 7:33 UTC (permalink / raw) To: Mikulas Patocka Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 29 Aug 2016, Ming Lei wrote: > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Fri, 26 Aug 2016, Mike Snitzer wrote: >> > >> >> On Thu, Aug 25 2016 at 4:13pm -0400, >> >> Jens Axboe <axboe@kernel.dk> wrote: >> >> >> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: >> >> > > >> >> > >Device mapper can't split the bio in generic_make_request - it frees the >> >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device >> >> > >mapper uses its own splitting mechanism. >> >> > > >> >> > >So what is the final decision? - should device mapper split the big bio or >> >> > >should bcache not submit big bios? >> >> > > >> >> > >I think splitting big bios in the device mapper is better - simply because >> >> > >it is much less code than reworking bcache to split bios internally. >> >> > > >> >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work >> >> > >around bio size limitations - it accepts unlimited I/O request and splits >> >> > >it to several bios. When bio size limitations are gone, we could simplify >> >> > >dm-io too. >> >> > >> >> > The patch from Ming Lei was applied for 4.8 the other day. >> >> >> >> See linux-block commit: >> >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c >> > >> > But this patch won't work for device mapper, blk_bio_segment_split is >> > called from blk_queue_split and device mapper doesn't use blk_queue_split >> > (it can't because it frees queue->bio_split). >> > >> > We still need these two patches: >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html >> >> About the 2nd patch, it might not be good enough because in theory >> a small size bio still may include big bvecs, such as, each bvec points >> to 512byte buffer, so strictly speaking the bvec number should >> be checked instead of bio size. >> >> Ming Lei > > This is not a problem. I meant the following code in your 2nd patch: + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) && + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE) + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT); And the check on .bi_size may not work. > > dm-crypt allocates new pages for the bio and copies (and encrypts) the > data from the original location to the new pages. > > So yes, the original bio may have a long vector with many small fragments, > but the newly allocated memory will be allocated in full pages and the > outgoing bio's vector will have full pages. > > Mikulas -- Ming Lei ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-30 7:33 ` Ming Lei @ 2016-08-30 12:19 ` Mikulas Patocka 2016-08-30 20:57 ` Mike Snitzer 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2016-08-30 12:19 UTC (permalink / raw) To: Ming Lei Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Tue, 30 Aug 2016, Ming Lei wrote: > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 29 Aug 2016, Ming Lei wrote: > > > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > But this patch won't work for device mapper, blk_bio_segment_split is > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split > >> > (it can't because it frees queue->bio_split). > >> > > >> > We still need these two patches: > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html > >> > >> About the 2nd patch, it might not be good enough because in theory > >> a small size bio still may include big bvecs, such as, each bvec points > >> to 512byte buffer, so strictly speaking the bvec number should > >> be checked instead of bio size. > >> > >> Ming Lei > > > > This is not a problem. > > I meant the following code in your 2nd patch: > > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) && > + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE) > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT); > > And the check on .bi_size may not work. kcryptd_crypt_write_convert calls: crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size) crypt_alloc_buffer does: unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs); So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will succeed. (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use (BIO_MAX_PAGES << PAGE_SHIFT) instead). Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-30 12:19 ` Mikulas Patocka @ 2016-08-30 20:57 ` Mike Snitzer 2016-08-30 22:27 ` Mikulas Patocka 0 siblings, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2016-08-30 20:57 UTC (permalink / raw) To: Mikulas Patocka Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Tue, Aug 30 2016 at 8:19P -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Tue, 30 Aug 2016, Ming Lei wrote: > > > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > On Mon, 29 Aug 2016, Ming Lei wrote: > > > > > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > >> > > > >> > But this patch won't work for device mapper, blk_bio_segment_split is > > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split > > >> > (it can't because it frees queue->bio_split). > > >> > > > >> > We still need these two patches: > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html > > >> > > >> About the 2nd patch, it might not be good enough because in theory > > >> a small size bio still may include big bvecs, such as, each bvec points > > >> to 512byte buffer, so strictly speaking the bvec number should > > >> be checked instead of bio size. > > >> > > >> Ming Lei > > > > > > This is not a problem. > > > > I meant the following code in your 2nd patch: > > > > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) && > > + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE) > > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT); > > > > And the check on .bi_size may not work. > > kcryptd_crypt_write_convert calls: > crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size) > > crypt_alloc_buffer does: > unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; > clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs); > > So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will > be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will > succeed. > > (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use > (BIO_MAX_PAGES << PAGE_SHIFT) instead). Is this revised patch OK with you? From: Mikulas Patocka <mpatocka@redhat.com> Date: Tue, 30 Aug 2016 16:38:42 -0400 Subject: [PATCH] dm crypt: fix error with too large bcache bios When dm-crypt processes writes, it allocates a new bio in crypt_alloc_buffer(). The bio is allocated from a bio set and it can have at most BIO_MAX_PAGES vector entries, however the incoming bio can be larger if it was allocated by bcache. If the incoming bio is larger, bio_alloc_bioset() fails and an error is returned. To avoid the error, we test for a too large bio in the function crypt_map() and use dm_accept_partial_bio() to split the bio. dm_accept_partial_bio() trims the current bio to the desired size and asks DM core to send another bio with the rest of the data. This fix is wrapped with a check for CONFIG_BCACHE because there currently isn't any other code that generates too large bios. So unless bcache is configured there is no point wasting time making this check. Signed-off-by: Mikulas Patocka <mpatocka redhat com> Cc: stable@vger.kernel.org # v3.16+ --- drivers/md/dm-crypt.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index eedba67..743f548 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } +#ifdef CONFIG_BCACHE + if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) && + bio_data_dir(bio) == WRITE) + dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT)); +#endif + io = dm_per_bio_data(bio, cc->per_bio_data_size); crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); io->ctx.req = (struct skcipher_request *)(io + 1); -- 2.7.4 (Apple Git-66) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-30 20:57 ` Mike Snitzer @ 2016-08-30 22:27 ` Mikulas Patocka 2016-08-31 6:26 ` Milan Broz 0 siblings, 1 reply; 14+ messages in thread From: Mikulas Patocka @ 2016-08-30 22:27 UTC (permalink / raw) To: Mike Snitzer Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig, open list:DEVICE-MAPPER (LVM), johnstonj.public, linux-block On Tue, 30 Aug 2016, Mike Snitzer wrote: > On Tue, Aug 30 2016 at 8:19P -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Tue, 30 Aug 2016, Ming Lei wrote: > > > > > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > > On Mon, 29 Aug 2016, Ming Lei wrote: > > > > > > > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > >> > > > > >> > But this patch won't work for device mapper, blk_bio_segment_split is > > > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split > > > >> > (it can't because it frees queue->bio_split). > > > >> > > > > >> > We still need these two patches: > > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html > > > >> > > > >> About the 2nd patch, it might not be good enough because in theory > > > >> a small size bio still may include big bvecs, such as, each bvec points > > > >> to 512byte buffer, so strictly speaking the bvec number should > > > >> be checked instead of bio size. > > > >> > > > >> Ming Lei > > > > > > > > This is not a problem. > > > > > > I meant the following code in your 2nd patch: > > > > > > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) && > > > + (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE) > > > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT); > > > > > > And the check on .bi_size may not work. > > > > kcryptd_crypt_write_convert calls: > > crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size) > > > > crypt_alloc_buffer does: > > unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; > > clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs); > > > > So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will > > be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will > > succeed. > > > > (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use > > (BIO_MAX_PAGES << PAGE_SHIFT) instead). > > Is this revised patch OK with you? Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big bio, not just bcache. That one test has no performance impact, there is no need to hide it behind #ifdef. Mikulas > From: Mikulas Patocka <mpatocka@redhat.com> > Date: Tue, 30 Aug 2016 16:38:42 -0400 > Subject: [PATCH] dm crypt: fix error with too large bcache bios > > When dm-crypt processes writes, it allocates a new bio in > crypt_alloc_buffer(). The bio is allocated from a bio set and it can > have at most BIO_MAX_PAGES vector entries, however the incoming bio can be > larger if it was allocated by bcache. If the incoming bio is larger, > bio_alloc_bioset() fails and an error is returned. > > To avoid the error, we test for a too large bio in the function > crypt_map() and use dm_accept_partial_bio() to split the bio. > dm_accept_partial_bio() trims the current bio to the desired size and > asks DM core to send another bio with the rest of the data. > > This fix is wrapped with a check for CONFIG_BCACHE because there > currently isn't any other code that generates too large bios. So unless > bcache is configured there is no point wasting time making this check. > > Signed-off-by: Mikulas Patocka <mpatocka redhat com> > Cc: stable@vger.kernel.org # v3.16+ > --- > drivers/md/dm-crypt.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > index eedba67..743f548 100644 > --- a/drivers/md/dm-crypt.c > +++ b/drivers/md/dm-crypt.c > @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) > return DM_MAPIO_REMAPPED; > } > > +#ifdef CONFIG_BCACHE > + if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) && > + bio_data_dir(bio) == WRITE) > + dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT)); > +#endif > + > io = dm_per_bio_data(bio, cc->per_bio_data_size); > crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); > io->ctx.req = (struct skcipher_request *)(io + 1); > -- > 2.7.4 (Apple Git-66) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-30 22:27 ` Mikulas Patocka @ 2016-08-31 6:26 ` Milan Broz 2016-08-31 13:39 ` Mike Snitzer 0 siblings, 1 reply; 14+ messages in thread From: Milan Broz @ 2016-08-31 6:26 UTC (permalink / raw) To: Mikulas Patocka, Mike Snitzer Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig, johnstonj.public, linux-block On 08/31/2016 12:27 AM, Mikulas Patocka wrote: ... > > Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big > bio, not just bcache. Yes. Please, do not hide it behind #ifdef. If it is in code, it should be enabled always. There can third party modules or some new code appears and creating strange config dependence only adds more problems later. Milan > > That one test has no performance impact, there is no need to hide it > behind #ifdef. > > Mikulas > >> From: Mikulas Patocka <mpatocka@redhat.com> >> Date: Tue, 30 Aug 2016 16:38:42 -0400 >> Subject: [PATCH] dm crypt: fix error with too large bcache bios >> >> When dm-crypt processes writes, it allocates a new bio in >> crypt_alloc_buffer(). The bio is allocated from a bio set and it can >> have at most BIO_MAX_PAGES vector entries, however the incoming bio can be >> larger if it was allocated by bcache. If the incoming bio is larger, >> bio_alloc_bioset() fails and an error is returned. >> >> To avoid the error, we test for a too large bio in the function >> crypt_map() and use dm_accept_partial_bio() to split the bio. >> dm_accept_partial_bio() trims the current bio to the desired size and >> asks DM core to send another bio with the rest of the data. >> >> This fix is wrapped with a check for CONFIG_BCACHE because there >> currently isn't any other code that generates too large bios. So unless >> bcache is configured there is no point wasting time making this check. >> >> Signed-off-by: Mikulas Patocka <mpatocka redhat com> >> Cc: stable@vger.kernel.org # v3.16+ >> --- >> drivers/md/dm-crypt.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c >> index eedba67..743f548 100644 >> --- a/drivers/md/dm-crypt.c >> +++ b/drivers/md/dm-crypt.c >> @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) >> return DM_MAPIO_REMAPPED; >> } >> >> +#ifdef CONFIG_BCACHE >> + if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) && >> + bio_data_dir(bio) == WRITE) >> + dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT)); >> +#endif >> + >> io = dm_per_bio_data(bio, cc->per_bio_data_size); >> crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); >> io->ctx.req = (struct skcipher_request *)(io + 1); >> -- >> 2.7.4 (Apple Git-66) >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-31 6:26 ` Milan Broz @ 2016-08-31 13:39 ` Mike Snitzer 0 siblings, 0 replies; 14+ messages in thread From: Mike Snitzer @ 2016-08-31 13:39 UTC (permalink / raw) To: Milan Broz Cc: Mikulas Patocka, Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig, johnstonj.public, linux-block On Wed, Aug 31 2016 at 2:26am -0400, Milan Broz <gmazyland@gmail.com> wrote: > On 08/31/2016 12:27 AM, Mikulas Patocka wrote: > > ... > > > > Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big > > bio, not just bcache. > > Yes. Please, do not hide it behind #ifdef. > If it is in code, it should be enabled always. > > There can third party modules or some new code appears and creating strange > config dependence only adds more problems later. I did last night, see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=4e870e948fbabf62b78e8410f04c67703e7c816b Will go to Linus for v4.8-rc5 by the end of the week. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-27 15:09 ` Mikulas Patocka 2016-08-29 5:22 ` Ming Lei @ 2016-08-29 18:19 ` Mike Snitzer 2016-08-29 22:01 ` Mikulas Patocka 1 sibling, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2016-08-29 18:19 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel, johnstonj.public, linux-block On Sat, Aug 27 2016 at 11:09am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 26 Aug 2016, Mike Snitzer wrote: > > > On Thu, Aug 25 2016 at 4:13pm -0400, > > Jens Axboe <axboe@kernel.dk> wrote: > > > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > > > > > > > >Device mapper can't split the bio in generic_make_request - it frees the > > > >md->queue->bio_split bioset, to save one kernel thread per device. Device > > > >mapper uses its own splitting mechanism. > > > > > > > >So what is the final decision? - should device mapper split the big bio or > > > >should bcache not submit big bios? > > > > > > > >I think splitting big bios in the device mapper is better - simply because > > > >it is much less code than reworking bcache to split bios internally. > > > > > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work > > > >around bio size limitations - it accepts unlimited I/O request and splits > > > >it to several bios. When bio size limitations are gone, we could simplify > > > >dm-io too. > > > > > > The patch from Ming Lei was applied for 4.8 the other day. > > > > See linux-block commit: > > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c > > But this patch won't work for device mapper, blk_bio_segment_split is > called from blk_queue_split and device mapper doesn't use blk_queue_split > (it can't because it frees queue->bio_split). > > We still need these two patches: > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html I'm left wondering whether it'd be better to revert commit dbba42d8a9e ("dm: eliminate unused "bioset" process for each bio-based DM device") And also look for other areas of DM that could leverage the block core's relatively new splitting capabilities. Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting code in DM targets because DM is so "special" that it doesn't need the block core's splitting (even though it clearly would now benefit). Mike ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: dm-crypt: Fix error with too large bios 2016-08-29 18:19 ` Mike Snitzer @ 2016-08-29 22:01 ` Mikulas Patocka 0 siblings, 0 replies; 14+ messages in thread From: Mikulas Patocka @ 2016-08-29 22:01 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel, johnstonj.public, linux-block On Mon, 29 Aug 2016, Mike Snitzer wrote: > On Sat, Aug 27 2016 at 11:09am -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Fri, 26 Aug 2016, Mike Snitzer wrote: > > > > > On Thu, Aug 25 2016 at 4:13pm -0400, > > > Jens Axboe <axboe@kernel.dk> wrote: > > > > > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote: > > > > > > > > > >Device mapper can't split the bio in generic_make_request - it frees the > > > > >md->queue->bio_split bioset, to save one kernel thread per device. Device > > > > >mapper uses its own splitting mechanism. > > > > > > > > > >So what is the final decision? - should device mapper split the big bio or > > > > >should bcache not submit big bios? > > > > > > > > > >I think splitting big bios in the device mapper is better - simply because > > > > >it is much less code than reworking bcache to split bios internally. > > > > > > > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work > > > > >around bio size limitations - it accepts unlimited I/O request and splits > > > > >it to several bios. When bio size limitations are gone, we could simplify > > > > >dm-io too. > > > > > > > > The patch from Ming Lei was applied for 4.8 the other day. > > > > > > See linux-block commit: > > > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs > > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c > > > > But this patch won't work for device mapper, blk_bio_segment_split is > > called from blk_queue_split and device mapper doesn't use blk_queue_split > > (it can't because it frees queue->bio_split). > > > > We still need these two patches: > > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html > > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html > > I'm left wondering whether it'd be better to revert commit dbba42d8a9e > ("dm: eliminate unused "bioset" process for each bio-based DM device") That would create one more process per dm device. There is no need to create another process if the bio can be split in the device mapper. > And also look for other areas of DM that could leverage the block core's > relatively new splitting capabilities. > > Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting > code in DM targets because DM is so "special" that it doesn't need the > block core's splitting (even though it clearly would now benefit). It would be possible to split bios on BIO_MAX_PAGES boundary in the dm core. But I checked all the dm targets and I found problem only in dm-crypt and dm-log-writes. So there is no need to split bio bios for the other targets. > Mike Mikulas ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-08-31 13:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <alpine.LRH.2.11.1608102050200.10662@mail.ewheeler.net> [not found] ` <alpine.LRH.2.02.1608131342250.3291@file01.intranet.prod.int.rdu2.redhat.com> [not found] ` <20160814000740.GA5317@redhat.com> [not found] ` <alpine.LRH.2.02.1608151245120.13026@file01.intranet.prod.int.rdu2.redhat.com> [not found] ` <alpine.LRH.2.11.1608181742260.10662@mail.ewheeler.net> 2016-08-25 18:34 ` [dm-devel] dm-crypt: Fix error with too large bios Mikulas Patocka 2016-08-25 20:13 ` Jens Axboe 2016-08-26 14:05 ` Mike Snitzer 2016-08-27 15:09 ` Mikulas Patocka 2016-08-29 5:22 ` Ming Lei 2016-08-29 21:57 ` Mikulas Patocka 2016-08-30 7:33 ` Ming Lei 2016-08-30 12:19 ` Mikulas Patocka 2016-08-30 20:57 ` Mike Snitzer 2016-08-30 22:27 ` Mikulas Patocka 2016-08-31 6:26 ` Milan Broz 2016-08-31 13:39 ` Mike Snitzer 2016-08-29 18:19 ` Mike Snitzer 2016-08-29 22:01 ` Mikulas Patocka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).