All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>,
	Stefan Haberland <sth@linux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Sebastian Ott <sebott@linux.vnet.ibm.com>,
	tom.leiming@gmail.com
Subject: Re: [BUG] Regression introduced with "block: split bios to max possible length"
Date: Fri, 22 Jan 2016 23:06:05 +0800	[thread overview]
Message-ID: <20160122230605.22859608@tom-T450> (raw)
In-Reply-To: <CA+55aFwGRQb_2SUTj8A9MzTC4WCVC37p64t5obUfTvLNyctjKA@mail.gmail.com>

On Thu, 21 Jan 2016 20:15:37 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Jan 21, 2016 at 7:21 PM, Keith Busch <keith.busch@intel.com> wrote:
> > On Thu, Jan 21, 2016 at 05:12:13PM -0800, Linus Torvalds wrote:
> >>
> >> I assume that in this case it's simply that
> >>
> >>  - max_sectors is some odd number in sectors (ie 65535)
> >>
> >>  - the block size is larger than a sector (ie 4k)
> >
> > Wouldn't that make max sectors non-sensical? Or am I mistaken to think max
> > sectors is supposed to be a valid transfer in multiples of the physical
> > sector size?
> 
> If the controller interface is some 16-bit register, then the maximum
> number of sectors you can specify is 65535.
> 
> But if the disk then doesn't like 512-byte accesses, but wants 4kB or
> whatever, then clearly you can't actually *feed* it that maximum
> number. Not because it's a maximal, but because it's not aligned.
> 
> But that doesn't mean that it's non-sensical. It just means that you
> have to take both things into account.  There may be two totally
> independent things that cause the two (very different) rules on what
> the IO can look like.
> 
> Obviously there are probably games we could play, like always limiting
> the maximum sector number to a multiple of the sector size. That would
> presumably work for Stefan's case, by simply "artificially" making
> max_sectors be 65528 instead.

Yes, it is one problem, something like below does fix my test
with 4K block size.

--
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1699df5..49e0394 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -81,6 +81,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	unsigned front_seg_size = bio->bi_seg_front_size;
 	bool do_split = true;
 	struct bio *new = NULL;
+	unsigned max_sectors;
 
 	bio_for_each_segment(bv, bio, iter) {
 		/*
@@ -90,20 +91,23 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
 			goto split;
 
-		if (sectors + (bv.bv_len >> 9) >
-				blk_max_size_offset(q, bio->bi_iter.bi_sector)) {
+		/* I/O shoule be aligned to logical block size */
+		max_sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
+		max_sectors = ((max_sectors << 9) &
+				~(queue_logical_block_size(q) - 1)) >> 9;
+		if (sectors + (bv.bv_len >> 9) > max_sectors) {
 			/*
 			 * Consider this a new segment if we're splitting in
 			 * the middle of this vector.
 			 */
 			if (nsegs < queue_max_segments(q) &&
-			    sectors < blk_max_size_offset(q,
-						bio->bi_iter.bi_sector)) {
+			    sectors < max_sectors) {
 				nsegs++;
-				sectors = blk_max_size_offset(q,
-						bio->bi_iter.bi_sector);
+				sectors = max_sectors;
 			}
-			goto split;
+			if (sectors)
+				goto split;
+			/* It is OK to put single bvec into one segment */
 		}
 
 		if (bvprvp && blk_queue_cluster(q)) {


> 
> But I do think it's better to consider them independent issues, and
> just make sure that we always honor those things independently.
> 
> That "honor things independently" used to happen automatically before,
> simply because we'd never split in the middle of a bio segment. And
> since each bio segment was created with the limitations of the device
> in mind, that all worked.
> 
> Now that it splits in the middle of a vector entry, that splitting
> just needs to honor _all_ the rules. Not just the max sector one.
> 
> >> What I think it _should_ do is:
> >>
> >>  (a) check against max sectors like it used to do:
> >>
> >>                 if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> >>                         goto split;
> >
> > This can create less optimal splits for h/w that advertise chunk size. I
> > know it's a quirky feature (wasn't my idea), but the h/w is very slow
> > to not split at the necessary alignments, and we used to handle this
> > split correctly.
> 
> I suspect few high-performance controllers will really have big issues
> with the max_sectors thing. If you have big enough IO that you could
> hit the maximum sector number, you're already pretty well off, you
> might as well split at that point.
> 
> So I think it's ok to split at the max sector case early.
> 
> For the case of nvme, for example, I think the max sector number is so
> high that you'll never hit that anyway, and you'll only ever hit the
> chunk limit. No?
> 
> So in practice it won't matter, I suspect.
> 
>                  Linus

  parent reply	other threads:[~2016-01-22 15:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-21 14:57 [BUG] Regression introduced with "block: split bios to max possible length" Stefan Haberland
2016-01-21 21:34 ` Jens Axboe
2016-01-21 21:34   ` Jens Axboe
2016-01-21 22:51   ` Keith Busch
2016-01-22  1:12     ` Linus Torvalds
2016-01-22  3:21       ` Keith Busch
2016-01-22  4:15         ` Linus Torvalds
2016-01-22 14:56           ` Keith Busch
2016-01-22 17:15             ` Jens Axboe
2016-01-22 15:06           ` Ming Lei [this message]
2016-01-22 17:03             ` Linus Torvalds
2016-01-22 17:48               ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160122230605.22859608@tom-T450 \
    --to=tom.leiming@gmail.com \
    --cc=axboe@fb.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sebott@linux.vnet.ibm.com \
    --cc=sth@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.