From: Dave Chinner <david@fromorbit.com>
To: "hch@lst.de" <hch@lst.de>
Cc: "Verma, Vishal L" <vishal.l.verma@intel.com>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"darrick.wong@oracle.com" <darrick.wong@oracle.com>
Subject: Re: 5.3-rc1 regression with XFS log recovery
Date: Tue, 20 Aug 2019 14:41:35 +1000 [thread overview]
Message-ID: <20190820044135.GC1119@dread.disaster.area> (raw)
In-Reply-To: <20190819044012.GA15800@lst.de>
On Mon, Aug 19, 2019 at 06:40:12AM +0200, hch@lst.de wrote:
> On Mon, Aug 19, 2019 at 06:29:05AM +0200, hch@lst.de wrote:
> > On Mon, Aug 19, 2019 at 02:22:59PM +1000, Dave Chinner wrote:
> > > That implies a kmalloc heap issue.
> > >
> > > Oh, is memory poisoning or something that modifies the alignment of
> > > slabs turned on?
> > >
> > > i.e. 4k/8k allocations from the kmalloc heap slabs might not be
> > > appropriately aligned for IO, similar to the problems we have with
> > > the xen blk driver?
> >
> > That is what I suspect, and as you can see in the attached config I
> > usually run with slab debuggig on.
>
> Yep, looks like an unaligned allocation:
>
> root@testvm:~# mount /dev/pmem1 /mnt/
> [ 62.346660] XFS (pmem1): Mounting V5 Filesystem
> [ 62.347960] unaligned allocation, offset = 680
> [ 62.349019] unaligned allocation, offset = 680
> [ 62.349872] unaligned allocation, offset = 680
> [ 62.350703] XFS (pmem1): totally zeroed log
> [ 62.351443] unaligned allocation, offset = 680
> [ 62.452203] unaligned allocation, offset = 344
> [ 62.528964] XFS: Assertion failed: head_blk != tail_blk, file:
> fs/xfs/xfs_lo6
> [ 62.529879] ------------[ cut here ]------------
> [ 62.530334] kernel BUG at fs/xfs/xfs_message.c:102!
> [ 62.530824] invalid opcode: 0000 [#1] SMP PTI
>
> With the following debug patch. Based on that I think I'll just
> formally submit the vmalloc switch as we're at -rc5, and then we
> can restart the unaligned slub allocation drama..
This still doesn't make sense to me, because the pmem and brd code
have no aligment limitations in their make_request code - they can
handle byte adressing and should not have any problem at all with
8 byte aligned memory in bios.
Digging a little furhter, I note that both brd and pmem use
identical mechanisms to marshall data in and out of bios, so they
are likely to have the same issue.
So, brd_make_request() does:
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
int err;
err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
bio_op(bio), sector);
if (err)
goto io_error;
sector += len >> SECTOR_SHIFT;
}
So, the code behind bio_for_each_segment() splits multi-page bvecs
into individual pages, which are passed to brd_do_bvec(). An
unaligned 4kB io traces out as:
[ 121.295550] p,o,l,s 00000000a77f0146,768,3328,0x7d0048
[ 121.297635] p,o,l,s 000000006ceca91e,0,768,0x7d004e
i.e. page offset len sector
00000000a77f0146 768 3328 0x7d0048
000000006ceca91e 0 768 0x7d004e
You should be able to guess what the problems are from this.
Both pmem and brd are _sector_ based. We've done a partial sector
copy on the first bvec, then the second bvec has started the copy
from the wrong offset into the sector we've done a partial copy
from.
IOWs, no error is reported when the bvec buffer isn't sector
aligned, no error is reported when the length of data to copy was
not a multiple of sector size, and no error was reported when we
copied the same partial sector twice.
There's nothing quite like being repeatedly bitten by the same
misalignment bug because there's no validation in the infrastructure
that could catch it immediately and throw a useful warning/error
message.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-08-20 4:42 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-16 20:59 5.3-rc1 regression with XFS log recovery Verma, Vishal L
2019-08-18 7:11 ` hch
2019-08-18 7:41 ` hch
2019-08-18 17:34 ` hch
2019-08-19 0:08 ` Dave Chinner
2019-08-19 3:49 ` hch
2019-08-19 4:11 ` hch
2019-08-19 4:22 ` Dave Chinner
2019-08-19 4:29 ` hch
2019-08-19 4:40 ` hch
2019-08-19 5:31 ` Dave Chinner
2019-08-20 6:14 ` hch
2019-08-20 4:41 ` Dave Chinner [this message]
2019-08-20 5:53 ` hch
2019-08-20 7:44 ` Dave Chinner
2019-08-20 8:13 ` Ming Lei
2019-08-20 9:24 ` Ming Lei
2019-08-20 16:30 ` Verma, Vishal L
2019-08-20 21:44 ` Dave Chinner
2019-08-20 22:08 ` Verma, Vishal L
2019-08-20 23:53 ` Dave Chinner
2019-08-21 2:19 ` Ming Lei
2019-08-21 1:56 ` Ming Lei
2019-08-19 4:15 ` Dave Chinner
2019-08-19 17:19 ` Verma, Vishal L
2019-08-21 0:26 ` Dave Chinner
2019-08-21 0:44 ` hch
2019-08-21 1:08 ` Dave Chinner
2019-08-21 1:56 ` Verma, Vishal L
2019-08-21 6:15 ` Dave Chinner
2019-08-26 17:32 ` Verma, Vishal L
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190820044135.GC1119@dread.disaster.area \
--to=david@fromorbit.com \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox