From: Mel Gorman <mgorman@techsingularity.net>
To: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: linux-mm@kvack.org, mhocko@suse.com, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: Stuck looping on list_empty(list) in free_pcppages_bulk()
Date: Tue, 31 Aug 2021 13:44:49 +0100 [thread overview]
Message-ID: <20210831124449.GB4128@techsingularity.net> (raw)
In-Reply-To: <YS1l83lmwEYXuQsY@sultan-box.localdomain>
On Mon, Aug 30, 2021 at 04:12:51PM -0700, Sultan Alsawaf wrote:
> I apologize in advance for reporting a bug on an EOL kernel. I don't see any
> changes as of 5.14 that could address something like this, so I'm emailing in
> case whatever happened here may be a bug affecting newer kernels.
>
> With gdb, it appears that the CPU got stuck in the list_empty(list) loop inside
> free_pcppages_bulk():
> ----------------8<----------------
> do {
> batch_free++;
> if (++migratetype == MIGRATE_PCPTYPES)
> migratetype = 0;
> list = &pcp->lists[migratetype];
> } while (list_empty(list));
> ---------------->8----------------
>
> Although this code snippet is slightly different in 5.14, it's still ultimately
> the same. Side note: I noticed that the way `migratetype` is incremented causes
> `&pcp->lists[1]` to get looked at first rather than `&pcp->lists[0]`, since
> `migratetype` will start out at 1. This quirk is still present in 5.14, though
> the variable in question is now called `pindex`.
>
> With some more gdb digging, I found that the `count` variable was stored in %ESI
> at the time of the stall. According to register dump in the splat, %ESI was 7.
>
> It looks like, for some reason, the pcp count was 7 higher than the number of
> pages actually present in the pcp lists.
>
That's your answer -- the PCP count has been corrupted or misaccounted.
Given this is a Fedora kernel, check for any patches affecting
mm/page_alloc.c that could be accounting related or that would affect
the IRQ disabling or zone lock acquisition for problems. Another
possibility is memory corruption -- either kernel or the hardware
itself.
> I tried to find some way that this could happen, but the only thing I could
> think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> and then pages could be added to an out-of-bounds pcp list while still
> incrementing the overall pcp count. This seems pretty unlikely though.
It's unlikely because it would be an outright bug to specify both flags.
> As
> another side note, it looks like there's nothing stopping this from occurring;
> there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
> set.
>
There is no explicit check for it because they should not be both set.
I don't think this happens in kernel but if an out-of-tree module did
it, it might corrupt adjacent PCPs.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2021-08-31 12:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-30 23:12 Stuck looping on list_empty(list) in free_pcppages_bulk() Sultan Alsawaf
2021-08-31 12:44 ` Mel Gorman [this message]
2021-08-31 16:37 ` Sultan Alsawaf
2021-08-31 16:51 ` Vlastimil Babka
2021-08-31 16:58 ` Sultan Alsawaf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210831124449.GB4128@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=sultan@kerneltoast.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.