From: "Darrick J. Wong" <djwong@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Brian Foster <bfoster@redhat.com>,
xfs <linux-xfs@vger.kernel.org>,
fstests <fstests@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: generic/068 crash on 5.18-rc2?
Date: Wed, 4 May 2022 21:24:05 -0700 [thread overview]
Message-ID: <20220505042405.GB27195@magnolia> (raw)
In-Reply-To: <YnNQE6N7MwgLtR12@casper.infradead.org>
On Thu, May 05, 2022 at 05:18:27AM +0100, Matthew Wilcox wrote:
> On Wed, May 04, 2022 at 07:40:12PM -0700, Darrick J. Wong wrote:
> > On Tue, May 03, 2022 at 10:25:32AM -0700, Darrick J. Wong wrote:
> > > On Tue, May 03, 2022 at 05:31:01AM +0100, Matthew Wilcox wrote:
> > > > On Mon, May 02, 2022 at 08:25:34PM -0700, Darrick J. Wong wrote:
> > > > > On Mon, May 02, 2022 at 08:20:00AM -0400, Brian Foster wrote:
> > > > > > On Sat, Apr 30, 2022 at 10:40:31PM +0100, Matthew Wilcox wrote:
> > > > > > > On Sat, Apr 30, 2022 at 04:44:07AM +0100, Matthew Wilcox wrote:
> > > > > > > > (I do not love this, have not even compiled it; it's late. We may be
> > > > > > > > better off just storing next_folio inside the folio_iter).
> > > > > > >
> > > > > > > Does anyone have a preference for fixing this between Option A:
> > > > > > >
> > > > > >
> > > > > > After seeing the trace in my previous mail and several thousand
> > > > > > successful iterations of the test hack, I had reworked it into this
> > > > > > (which survived weekend testing until it ran into some other XFS problem
> > > > > > that looks unrelated):
> > > > > >
> > > > > > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > > > > > index 278cc81cc1e7..aa820e09978e 100644
> > > > > > --- a/include/linux/bio.h
> > > > > > +++ b/include/linux/bio.h
> > > > > > @@ -269,6 +269,7 @@ struct folio_iter {
> > > > > > size_t offset;
> > > > > > size_t length;
> > > > > > /* private: for use by the iterator */
> > > > > > + struct folio *_next;
> > > > > > size_t _seg_count;
> > > > > > int _i;
> > > > > > };
> > > > > > @@ -279,6 +280,7 @@ static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio,
> > > > > > struct bio_vec *bvec = bio_first_bvec_all(bio) + i;
> > > > > >
> > > > > > fi->folio = page_folio(bvec->bv_page);
> > > > > > + fi->_next = folio_next(fi->folio);
> > > > > > fi->offset = bvec->bv_offset +
> > > > > > PAGE_SIZE * (bvec->bv_page - &fi->folio->page);
> > > > > > fi->_seg_count = bvec->bv_len;
> > > > > > @@ -290,13 +292,15 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio)
> > > > > > {
> > > > > > fi->_seg_count -= fi->length;
> > > > > > if (fi->_seg_count) {
> > > > > > - fi->folio = folio_next(fi->folio);
> > > > > > + fi->folio = fi->_next;
> > > > > > + fi->_next = folio_next(fi->folio);
> > > > > > fi->offset = 0;
> > > > > > fi->length = min(folio_size(fi->folio), fi->_seg_count);
> > > > > > } else if (fi->_i + 1 < bio->bi_vcnt) {
> > > > > > bio_first_folio(fi, bio, fi->_i + 1);
> > > > > > } else {
> > > > > > fi->folio = NULL;
> > > > > > + fi->_next = NULL;
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > So FWIW, that is just to say that I find option A to be cleaner and more
> > > > > > readable.
> > > > >
> > > > > Me too. I'll queue up the usual nightly tests with that patch added and
> > > > > we'll see how that does.
> > > >
> > > > I've just pushed essentially that patch to my for-next tree in case
> > > > anybody does any testing with that. I'll give it a couple of days
> > > > before creating a folio-5.18f tag and asking Linus to pull the first two
> > > > commits on
> > > >
> > > > git://git.infradead.org/users/willy/pagecache.git for-next
> > > >
> > > > That is, commits
> > > >
> > > > 1a4c97e2dd5b ("block: Do not call folio_next() on an unreferenced folio")
> > > > 095099da208b ("mm/readahead: Fix readahead with large folios")
> > >
> > > Hmm. Well, I added 1a4c97 to my tree last night, and it seems to have
> > > cleared up all but two of the problems I saw with the for-next branch.
> > >
> > > generic/388 still fails (40 minutes in) with:
> > >
> > > WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
> > > VM_BUG_ON_FOLIO(i_blocks_per_folio(inode, folio) > 1 && !iop, folio);
> > >
> > > Which I think is the same problem where the fs goes down, XFS throws an
> > > error back to iomap_do_writepages, and it tries to discard a folio that
> > > already had writeback running on it.
> > >
> > > There's also the same problem I reported a few days ago in xfs/501
> > > on a 64k-page ARM64 VM where:
> > >
> > > run fstests xfs/501 at 2022-05-02 21:17:31
> > > XFS: Assertion failed: IS_ALIGNED((unsigned long)lv->lv_buf, sizeof(uint64_t)), file: fs/xfs/xfs_log_cil.c, line: 430
> > > XFS: Assertion failed: IS_ALIGNED((unsigned long)buf, sizeof(uint64_t)), file: fs/xfs/xfs_log.c, line: 137
> > > XFS: Assertion failed: IS_ALIGNED((unsigned long)buf, sizeof(uint64_t)), file: fs/xfs/xfs_log.c, line: 137
> > >
> > > But I think that's a new bug that came in with all the log buffer
> > > alignment changes in the 5.19 branch.
> > >
> > > Oh. My tree still had the "disable large folios" patch in it. I guess
> > > the "successful" results are mostly invalid then.
> >
> > Well... with large folios turned back on and those two patches added to
> > the branch, *most* of the problems go away. The generic/388 problem
> > persists, and last night's run showed that the weird xfs_dquot leak that
> > I"ve occasionally seen on 5.18 with xfs/43[46] also exists in 5.17.
>
> OK, so let me just restate what I understand here ... these two patches
> cure some, but not all of the currently observed problems with 5.18.
> The problems that remain with 5.18 were also present in either 5.17
> or in 5.18 with large folios disabled, so at this point we know of no
> functional regressions that large folios can be blamed for?
Not quite -- the generic/388 one is definitely new as of 5.18-rc1.
Frustratingly, the problems "go away" if you enable KASAN, so I might
try KFENCE (or whatever the new less molasses memory debugger is named)
next.
I suspect the xfs/43[46] problems are probably more of the ongoing log
abend whackamole, since every time we change something in the logging
code, weird latent bugs from 20 years ago start pouring out like spiders
fleeing the first winter rains.
> I'll send these patches to Linus tomorrow then, since they fix problems
> that have been observed, and I don't think there's any reason to keep
> them from him.
Right, I don't think there's a reason to hang on to those any longer.
--D
next prev parent reply other threads:[~2022-05-05 4:24 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 3:34 generic/068 crash on 5.18-rc2? Darrick J. Wong
2022-04-13 14:50 ` Matthew Wilcox
2022-04-13 16:23 ` Darrick J. Wong
2022-04-13 16:35 ` Matthew Wilcox
2022-04-18 18:44 ` Darrick J. Wong
2022-04-18 17:47 ` Darrick J. Wong
2022-04-20 0:37 ` Darrick J. Wong
2022-04-22 21:59 ` Darrick J. Wong
2022-04-28 15:53 ` Brian Foster
2022-04-30 3:10 ` Darrick J. Wong
2022-04-30 3:44 ` Matthew Wilcox
2022-04-30 21:40 ` Matthew Wilcox
2022-05-02 12:20 ` Brian Foster
2022-05-03 3:25 ` Darrick J. Wong
2022-05-03 4:31 ` Matthew Wilcox
2022-05-03 17:25 ` Darrick J. Wong
2022-05-05 2:40 ` Darrick J. Wong
2022-05-05 4:18 ` Matthew Wilcox
2022-05-05 4:24 ` Darrick J. Wong [this message]
2022-05-06 17:03 ` Darrick J. Wong
2022-05-02 12:18 ` Brian Foster
2022-05-02 13:00 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220505042405.GB27195@magnolia \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=fstests@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).