From: Ming Lei <tom.leiming@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>,
Stefan Haberland <sth@linux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-s390 <linux-s390@vger.kernel.org>,
Sebastian Ott <sebott@linux.vnet.ibm.com>,
tom.leiming@gmail.com
Subject: Re: [BUG] Regression introduced with "block: split bios to max possible length"
Date: Fri, 22 Jan 2016 23:06:05 +0800 [thread overview]
Message-ID: <20160122230605.22859608@tom-T450> (raw)
In-Reply-To: <CA+55aFwGRQb_2SUTj8A9MzTC4WCVC37p64t5obUfTvLNyctjKA@mail.gmail.com>
On Thu, 21 Jan 2016 20:15:37 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Jan 21, 2016 at 7:21 PM, Keith Busch <keith.busch@intel.com> wrote:
> > On Thu, Jan 21, 2016 at 05:12:13PM -0800, Linus Torvalds wrote:
> >>
> >> I assume that in this case it's simply that
> >>
> >> - max_sectors is some odd number in sectors (ie 65535)
> >>
> >> - the block size is larger than a sector (ie 4k)
> >
> > Wouldn't that make max sectors non-sensical? Or am I mistaken to think max
> > sectors is supposed to be a valid transfer in multiples of the physical
> > sector size?
>
> If the controller interface is some 16-bit register, then the maximum
> number of sectors you can specify is 65535.
>
> But if the disk then doesn't like 512-byte accesses, but wants 4kB or
> whatever, then clearly you can't actually *feed* it that maximum
> number. Not because it's a maximal, but because it's not aligned.
>
> But that doesn't mean that it's non-sensical. It just means that you
> have to take both things into account. There may be two totally
> independent things that cause the two (very different) rules on what
> the IO can look like.
>
> Obviously there are probably games we could play, like always limiting
> the maximum sector number to a multiple of the sector size. That would
> presumably work for Stefan's case, by simply "artificially" making
> max_sectors be 65528 instead.
Yes, it is one problem, something like below does fix my test
with 4K block size.
--
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1699df5..49e0394 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -81,6 +81,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
unsigned front_seg_size = bio->bi_seg_front_size;
bool do_split = true;
struct bio *new = NULL;
+ unsigned max_sectors;
bio_for_each_segment(bv, bio, iter) {
/*
@@ -90,20 +91,23 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
goto split;
- if (sectors + (bv.bv_len >> 9) >
- blk_max_size_offset(q, bio->bi_iter.bi_sector)) {
+ /* I/O shoule be aligned to logical block size */
+ max_sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
+ max_sectors = ((max_sectors << 9) &
+ ~(queue_logical_block_size(q) - 1)) >> 9;
+ if (sectors + (bv.bv_len >> 9) > max_sectors) {
/*
* Consider this a new segment if we're splitting in
* the middle of this vector.
*/
if (nsegs < queue_max_segments(q) &&
- sectors < blk_max_size_offset(q,
- bio->bi_iter.bi_sector)) {
+ sectors < max_sectors) {
nsegs++;
- sectors = blk_max_size_offset(q,
- bio->bi_iter.bi_sector);
+ sectors = max_sectors;
}
- goto split;
+ if (sectors)
+ goto split;
+ /* It is OK to put single bvec into one segment */
}
if (bvprvp && blk_queue_cluster(q)) {
>
> But I do think it's better to consider them independent issues, and
> just make sure that we always honor those things independently.
>
> That "honor things independently" used to happen automatically before,
> simply because we'd never split in the middle of a bio segment. And
> since each bio segment was created with the limitations of the device
> in mind, that all worked.
>
> Now that it splits in the middle of a vector entry, that splitting
> just needs to honor _all_ the rules. Not just the max sector one.
>
> >> What I think it _should_ do is:
> >>
> >> (a) check against max sectors like it used to do:
> >>
> >> if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> >> goto split;
> >
> > This can create less optimal splits for h/w that advertise chunk size. I
> > know it's a quirky feature (wasn't my idea), but the h/w is very slow
> > to not split at the necessary alignments, and we used to handle this
> > split correctly.
>
> I suspect few high-performance controllers will really have big issues
> with the max_sectors thing. If you have big enough IO that you could
> hit the maximum sector number, you're already pretty well off, you
> might as well split at that point.
>
> So I think it's ok to split at the max sector case early.
>
> For the case of nvme, for example, I think the max sector number is so
> high that you'll never hit that anyway, and you'll only ever hit the
> chunk limit. No?
>
> So in practice it won't matter, I suspect.
>
> Linus
next prev parent reply other threads:[~2016-01-22 15:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-21 14:57 [BUG] Regression introduced with "block: split bios to max possible length" Stefan Haberland
2016-01-21 21:34 ` Jens Axboe
2016-01-21 22:51 ` Keith Busch
2016-01-22 1:12 ` Linus Torvalds
2016-01-22 3:21 ` Keith Busch
2016-01-22 4:15 ` Linus Torvalds
2016-01-22 14:56 ` Keith Busch
2016-01-22 17:15 ` Jens Axboe
2016-01-22 15:06 ` Ming Lei [this message]
2016-01-22 17:03 ` Linus Torvalds
2016-01-22 17:48 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160122230605.22859608@tom-T450 \
--to=tom.leiming@gmail.com \
--cc=axboe@fb.com \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=sebott@linux.vnet.ibm.com \
--cc=sth@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox