Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Daniel Dao <dqminh@cloudflare.com>,
	linux-fsdevel@vger.kernel.org,
	kernel-team <kernel-team@cloudflare.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	djwong@kernel.org
Subject: Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
Date: Tue, 25 Jul 2023 04:41:58 +0100	[thread overview]
Message-ID: <ZL9EhledFQbN9djT@casper.infradead.org> (raw)
In-Reply-To: <ZL7w9dEH8BSXRzyu@dread.disaster.area>

On Tue, Jul 25, 2023 at 07:45:25AM +1000, Dave Chinner wrote:
> On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote:
> > Hi again,
> > 
> > We had another example of xarray corruption involving xfs and zsmalloc. We are
> > running zram as swap. We have 2 tasks deadlock waiting for page to be released
> 
> Do your problems on 6.1 go away if you stop using zram as swap?

I think zram is the victim here, not the culprit.  I think what's
going on is that -- somehow -- there are stale pointers in the xarray.
zram allocates these pages (I suspect most of the memory in this machine
is allocated to zram or page cache) and then we blow up when finding
a folio in the page cache which has a ->mapping that is actually a
movable_ops structure.

But how do we get stale pointers in the xarray?  I've been worrying at
that problem for months.  At some point, the refcount must go down to
zero:

static inline void folio_put(struct folio *folio)
{
        if (folio_put_testzero(folio))
                __folio_put(folio);
}

(assume we're talking about a large folio; everything seems to point
that way):

__folio_put_large:
        if (!folio_test_hugetlb(folio))
                __page_cache_release(folio);
        destroy_large_folio(folio);

destroy_large_folio:
	free_transhuge_page()
free_transhuge_page:
        free_compound_page(page);
free_compound_page:
        free_the_page(page, compound_order(page));
free_the_page:
                __free_pages_ok(page, order, FPI_NONE);
__free_pages_ok:
        if (!free_pages_prepare(page, order, fpi_flags))
free_pages_prepare:
       if (PageMappingFlags(page))
                page->mapping = NULL;
(doesn't trigger; PageMappingFlags are false for page cache)
        if (is_check_pages_enabled()) {
                if (free_page_is_bad(page))
free_page_is_bad:
        if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
                return false;

        /* Something has gone sideways, find it */
        free_page_is_bad_report(page);
page_expected_state:
        if (unlikely((unsigned long)page->mapping | ...
                return false;

free_page_is_bad_report:
        bad_page(page,
                 page_bad_reason(page, PAGE_FLAGS_CHECK_AT_FREE));
page_bad_reason:
        if (unlikely(page->mapping != NULL))
                bad_reason = "non-NULL mapping";

So (assuming that Daniel has check_pages_enabled set and isn't ignoring
important parts of dmesg, which seem like reasonable assumptions), the
last put of a folio must be after the folio has had its ->mapping cleared

But we remove the folio from the page cache in page_cache_delete(),
right before we set the mapping to NULL.  And again in
delete_from_page_cache_batch() (in the other order; I don't think that's
relevant?)

So where do we set folio->mapping to NULL without removing folio from
the XArray?  I'm beginning to suspect it's a mishandled failure in
split_huge_page(), so I'll re-review that code path tomorrow.

next prev parent reply	other threads:[~2023-07-25  3:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 10:49 Kernel NULL pointer deref and data corruptions with xfs on 6.1 Daniel Dao
2023-07-24 11:23 ` Daniel Dao
2023-07-24 21:45   ` Dave Chinner
2023-07-24 22:04     ` Daniel Dao
2023-07-25  3:41     ` Matthew Wilcox [this message]
2023-07-27  3:27 ` Matthew Wilcox
2023-07-27 10:25   ` Daniel Dao
2023-07-27 12:27     ` Matthew Wilcox
2023-08-04 16:57       ` Frederick Lawler
2023-08-30 19:26         ` Frederick Lawler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZL9EhledFQbN9djT@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dqminh@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).