From: Eric Farman <farman@linux.ibm.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@fb.com>,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
axboe@kernel.dk, Kernel Team <Kernel-team@fb.com>,
hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com,
ebiggers@kernel.org, pankydev8@gmail.com,
Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
Date: Mon, 27 Jun 2022 11:21:20 -0400 [thread overview]
Message-ID: <c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com> (raw)
In-Reply-To: <e0038866ac54176beeac944c9116f7a9bdec7019.camel@linux.ibm.com>
On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > >
> > > > > > Use the address alignment requirements from the
> > > > > > block_device
> > > > > > for
> > > > > > direct
> > > > > > io instead of requiring addresses be aligned to the block
> > > > > > size.
> > > > >
> > > > > Hi Keith,
> > > > >
> > > > > Our s390 PV guests recently started failing to boot from a
> > > > > -next
> > > > > host,
> > > > > and git blame brought me here.
> > > > >
> > > > > As near as I have been able to tell, we start tripping up on
> > > > > this
> > > > > code
> > > > > from patch 9 [1] that gets invoked with this patch:
> > > > >
> > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > > size_t len = i->iov[k].iov_len - skip;
> > > > > >
> > > > > > if (len > size)
> > > > > > len = size;
> > > > > > if (len & len_mask)
> > > > > > return false;
> > > > >
> > > > > The iovec we're failing on has two segments, one with a len
> > > > > of
> > > > > x200
> > > > > (and base of x...000) and another with a len of xe00 (and a
> > > > > base
> > > > > of
> > > > > x...200), while len_mask is of course xfff.
> > > > >
> > > > > So before I go any further on what we might have broken, do
> > > > > you
> > > > > happen
> > > > > to have any suggestions what might be going on here, or
> > > > > something
> > > > > I
> > > > > should try?
> > > >
> > > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > > intended to
> > > > have any difference from the previous code with respect to the
> > > > vector lengths.
> > > >
> > > > Could you tell me if you're accessing this through the block
> > > > device
> > > > direct-io,
> > > > or through iomap filesystem?
> >
> > Reasonably certain the failure's on iomap. I'd reverted the subject
> > patch from next-20220622 and got things in working order.
> >
> > > If using iomap, the previous check was this:
> > >
> > > unsigned int blkbits =
> > > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > ...
> > > if ((pos | length | align) & ((1 << blkbits) - 1))
> > > return -EINVAL;
> > >
> > >
> > ...
> > > The result of "iov_iter_alignment()" would include "0xe00 |
> > > 0x200"
> > > in
> > > your
> > > example, and checked against 0xfff should have been failing prior
> > > to
> > > this
> > > patch. Unless I'm missing something...
> >
> > Nope, you're not. I didn't look back at what the old check was
> > doing,
> > just saw "0xe00 and 0x200" and thought "oh there's one page"
> > instead
> > of
> > noting the code was or'ing them. My bad.
> >
> > That was the last entry in my trace before the guest gave up, as
> > everything else through this code up to that point seemed okay.
> > I'll
> > pick up the working case and see if I can get a clearer picture
> > between
> > the two.
>
> Looking over the trace again, I realize I did dump
> iov_iter_alignment()
> as a comparator, and I see one pass through that had a non-zero
> response but bdev_iter_is_aligned() returned true...
>
> count = x1000
> iov_offset = x0
> nr_segs = 1
> iov_len = x1000 (len_mask = xfff)
> iov_base = x...200 (addr_mask = x1ff)
>
> That particular pass through is in the middle of the stuff it tried
> to
> do, so I don't know if that's the cause or not but it strikes me as
> unusual. Will look into that tomorrow and report back.
>
Apologies, it took me an extra day to get back to this, but it is
indeed this pass through that's causing our boot failures. I note that
the old code (in iomap_dio_bio_iter), did:
if ((pos | length | align) & ((1 << blkbits) - 1))
return -EINVAL;
With blkbits equal to 12, the resulting mask was 0x0fff against an
align value (from iov_iter_alignment) of x200 kicks us out.
The new code (in iov_iter_aligned_iovec), meanwhile, compares this:
if ((unsigned long)(i->iov[k].iov_base + skip) &
addr_mask)
return false;
iov_base (and the output of the old iov_iter_aligned_iovec() routine)
is x200, but since addr_mask is x1ff this check provides a different
response than it used to.
To check this, I changed the comparator to len_mask (almost certainly
not the right answer since addr_mask is then unused, but it was good
for a quick test), and our PV guests are able to boot again with -next
running in the host.
Thanks,
Eric
next prev parent reply other threads:[~2022-06-27 15:22 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220610195830.3574005-1-kbusch@fb.com>
2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe
[not found] ` <20220610195830.3574005-12-kbusch@fb.com>
2022-06-23 18:29 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Eric Farman
2022-06-23 18:51 ` Keith Busch
2022-06-23 19:11 ` Keith Busch
2022-06-23 20:32 ` Eric Farman
2022-06-23 21:34 ` Eric Farman
2022-06-27 15:21 ` Eric Farman [this message]
2022-06-27 15:36 ` Keith Busch
2022-06-28 9:00 ` Halil Pasic
2022-06-28 15:20 ` Eric Farman
2022-06-29 3:18 ` Eric Farman
2022-06-29 3:52 ` Keith Busch
2022-06-29 18:04 ` Eric Farman
2022-06-29 19:07 ` Keith Busch
2022-06-29 19:28 ` Eric Farman
2022-06-30 5:45 ` Christian Borntraeger
2022-07-22 7:36 ` Eric Biggers
2022-07-22 14:43 ` Keith Busch
2022-07-22 18:01 ` Eric Biggers
2022-07-22 20:26 ` Keith Busch
2022-07-25 18:19 ` Eric Biggers
2022-07-24 2:13 ` Jaegeuk Kim
2022-07-22 17:53 ` Darrick J. Wong
2022-07-22 18:12 ` Eric Biggers
2022-07-23 5:03 ` Darrick J. Wong
[not found] ` <20220610195830.3574005-6-kbusch@fb.com>
2022-07-22 21:53 ` [PATCHv6 05/11] block: add a helper function for dio alignment Bart Van Assche
[not found] ` <20220610195830.3574005-7-kbusch@fb.com>
2022-07-22 21:57 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Bart Van Assche
[not found] ` <20220610195830.3574005-8-kbusch@fb.com>
2022-06-13 14:22 ` [PATCHv6 07/11] block/bounce: " Christoph Hellwig
2022-07-22 22:01 ` Bart Van Assche
2022-07-25 14:46 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com \
--to=farman@linux.ibm.com \
--cc=Kernel-team@fb.com \
--cc=axboe@kernel.dk \
--cc=borntraeger@linux.ibm.com \
--cc=bvanassche@acm.org \
--cc=damien.lemoal@opensource.wdc.com \
--cc=ebiggers@kernel.org \
--cc=hch@lst.de \
--cc=kbusch@fb.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=pankydev8@gmail.com \
--cc=pasic@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).