linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Farman <farman@linux.ibm.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@fb.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	axboe@kernel.dk, Kernel Team <Kernel-team@fb.com>,
	hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com,
	ebiggers@kernel.org, pankydev8@gmail.com,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
Date: Mon, 27 Jun 2022 11:21:20 -0400	[thread overview]
Message-ID: <c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com> (raw)
In-Reply-To: <e0038866ac54176beeac944c9116f7a9bdec7019.camel@linux.ibm.com>

On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > > 
> > > > > > Use the address alignment requirements from the
> > > > > > block_device
> > > > > > for
> > > > > > direct
> > > > > > io instead of requiring addresses be aligned to the block
> > > > > > size.
> > > > > 
> > > > > Hi Keith,
> > > > > 
> > > > > Our s390 PV guests recently started failing to boot from a
> > > > > -next
> > > > > host,
> > > > > and git blame brought me here.
> > > > > 
> > > > > As near as I have been able to tell, we start tripping up on
> > > > > this
> > > > > code
> > > > > from patch 9 [1] that gets invoked with this patch:
> > > > > 
> > > > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > > > 
> > > > > > 		if (len > size)
> > > > > > 			len = size;
> > > > > > 		if (len & len_mask)
> > > > > > 			return false;
> > > > > 
> > > > > The iovec we're failing on has two segments, one with a len
> > > > > of
> > > > > x200
> > > > > (and base of x...000) and another with a len of xe00 (and a
> > > > > base
> > > > > of
> > > > > x...200), while len_mask is of course xfff.
> > > > > 
> > > > > So before I go any further on what we might have broken, do
> > > > > you
> > > > > happen
> > > > > to have any suggestions what might be going on here, or
> > > > > something
> > > > > I
> > > > > should try?
> > > > 
> > > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > > intended to
> > > > have any difference from the previous code with respect to the
> > > > vector lengths.
> > > > 
> > > > Could you tell me if you're accessing this through the block
> > > > device
> > > > direct-io,
> > > > or through iomap filesystem?
> > 
> > Reasonably certain the failure's on iomap. I'd reverted the subject
> > patch from next-20220622 and got things in working order.
> > 
> > > If using iomap, the previous check was this:
> > > 
> > > 	unsigned int blkbits =
> > > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > 	...
> > > 	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > 		return -EINVAL;
> > > 
> > > 
> > ...
> > > The result of "iov_iter_alignment()" would include "0xe00 |
> > > 0x200"
> > > in
> > > your
> > > example, and checked against 0xfff should have been failing prior
> > > to
> > > this
> > > patch. Unless I'm missing something...
> > 
> > Nope, you're not. I didn't look back at what the old check was
> > doing,
> > just saw "0xe00 and 0x200" and thought "oh there's one page"
> > instead
> > of
> > noting the code was or'ing them. My bad.
> > 
> > That was the last entry in my trace before the guest gave up, as
> > everything else through this code up to that point seemed okay.
> > I'll
> > pick up the working case and see if I can get a clearer picture
> > between
> > the two.
> 
> Looking over the trace again, I realize I did dump
> iov_iter_alignment()
> as a comparator, and I see one pass through that had a non-zero
> response but bdev_iter_is_aligned() returned true...
> 
> count = x1000
> iov_offset = x0
> nr_segs = 1
> iov_len = x1000	(len_mask = xfff)
> iov_base = x...200 (addr_mask = x1ff)
> 
> That particular pass through is in the middle of the stuff it tried
> to
> do, so I don't know if that's the cause or not but it strikes me as
> unusual. Will look into that tomorrow and report back.
> 

Apologies, it took me an extra day to get back to this, but it is
indeed this pass through that's causing our boot failures. I note that
the old code (in iomap_dio_bio_iter), did:

        if ((pos | length | align) & ((1 << blkbits) - 1))
                return -EINVAL;

With blkbits equal to 12, the resulting mask was 0x0fff against an
align value (from iov_iter_alignment) of x200 kicks us out.

The new code (in iov_iter_aligned_iovec), meanwhile, compares this:

                if ((unsigned long)(i->iov[k].iov_base + skip) &
addr_mask)
                        return false;

iov_base (and the output of the old iov_iter_aligned_iovec() routine)
is x200, but since addr_mask is x1ff this check provides a different
response than it used to.

To check this, I changed the comparator to len_mask (almost certainly
not the right answer since addr_mask is then unused, but it was good
for a quick test), and our PV guests are able to boot again with -next
running in the host.

Thanks,
Eric


  reply	other threads:[~2022-06-27 15:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220610195830.3574005-1-kbusch@fb.com>
2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe
     [not found] ` <20220610195830.3574005-12-kbusch@fb.com>
2022-06-23 18:29   ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Eric Farman
2022-06-23 18:51     ` Keith Busch
2022-06-23 19:11       ` Keith Busch
2022-06-23 20:32         ` Eric Farman
2022-06-23 21:34           ` Eric Farman
2022-06-27 15:21             ` Eric Farman [this message]
2022-06-27 15:36               ` Keith Busch
2022-06-28  9:00                 ` Halil Pasic
2022-06-28 15:20                   ` Eric Farman
2022-06-29  3:18                     ` Eric Farman
2022-06-29  3:52                       ` Keith Busch
2022-06-29 18:04                         ` Eric Farman
2022-06-29 19:07                           ` Keith Busch
2022-06-29 19:28                             ` Eric Farman
2022-06-30  5:45                             ` Christian Borntraeger
2022-07-22  7:36   ` Eric Biggers
2022-07-22 14:43     ` Keith Busch
2022-07-22 18:01       ` Eric Biggers
2022-07-22 20:26         ` Keith Busch
2022-07-25 18:19           ` Eric Biggers
2022-07-24  2:13         ` Jaegeuk Kim
2022-07-22 17:53     ` Darrick J. Wong
2022-07-22 18:12       ` Eric Biggers
2022-07-23  5:03         ` Darrick J. Wong
     [not found] ` <20220610195830.3574005-6-kbusch@fb.com>
2022-07-22 21:53   ` [PATCHv6 05/11] block: add a helper function for dio alignment Bart Van Assche
     [not found] ` <20220610195830.3574005-7-kbusch@fb.com>
2022-07-22 21:57   ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Bart Van Assche
     [not found] ` <20220610195830.3574005-8-kbusch@fb.com>
2022-06-13 14:22   ` [PATCHv6 07/11] block/bounce: " Christoph Hellwig
2022-07-22 22:01   ` Bart Van Assche
2022-07-25 14:46     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com \
    --to=farman@linux.ibm.com \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=borntraeger@linux.ibm.com \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=ebiggers@kernel.org \
    --cc=hch@lst.de \
    --cc=kbusch@fb.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=pankydev8@gmail.com \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).