From: Rafael Aquini <aquini@redhat.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org
Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference
Date: Thu, 24 Sep 2020 02:30:38 -0400 [thread overview]
Message-ID: <20200924063038.GD1023012@optiplex-lnx> (raw)
In-Reply-To: <877dsjessq.fsf@yhuang-dev.intel.com>
On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote:
> Rafael Aquini <aquini@redhat.com> writes:
> > The bug here is quite simple: split_swap_cluster() misses checking for
> > lock_cluster() returning NULL before committing to change cluster_info->flags.
>
> I don't think so. We shouldn't run into this situation firstly. So the
> "fix" hides the real bug instead of fixing it. Just like we call
> VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
> instead of returning if !PageLocked(head) silently.
>
Not the same thing, obviously, as you are going for an apples-to-carrots
comparison, but since you mentioned:
split_huge_page_to_list() asserts (in debug builds) *page is locked,
and later checks if *head bears the SwapCache flag.
deferred_split_scan(), OTOH, doesn't hand down the compound head locked,
but the 2nd page in the group instead.
This doesn't necessarely means it's a problem, though, but might help
on hitting the issue.
> > The fundamental problem has nothing to do with allocating, or not allocating
> > a swap cluster, but it has to do with the fact that the THP deferred split scan
> > can transiently race with swapcache insertion, and the fact that when you run
> > your swap area on rotational storage cluster_info is _always_ NULL.
> > split_swap_cluster() needs to check for lock_cluster() returning NULL because
> > that's one possible case, and it clearly fails to do so.
>
> If there's a race, we should fix the race. But the code path for
> swapcache insertion is,
>
> add_to_swap()
> get_swap_page() /* Return if fails to allocate */
> add_to_swap_cache()
> SetPageSwapCache()
>
> While the code path to split THP is,
>
> split_huge_page_to_list()
> if PageSwapCache()
> split_swap_cluster()
>
> Both code paths are protected by the page lock. So there should be some
> other reasons to trigger the bug.
As mentioned above, no they seem to not be protected (at least, not the
same page, depending on the case). While add_to_swap() will assure a
page_lock on the compound head, split_huge_page_to_list() does not.
> And again, for HDD, a THP shouldn't have PageSwapCache() set at the
> first place. If so, the bug is that the flag is set and we should fix
> the setting.
>
I fail to follow your claim here. Where is the guarantee, in the code, that
you'll never have a compound head in the swapcache?
> > Run a workload that cause multiple THP COW, and add a memory hogger to create
> > memory pressure so you'll force the reclaimers to kick the registered
> > shrinkers. The trigger is not heavy swapping, and that's probably why
> > most swap test cases don't hit it. The window is tight, but you will get the
> > NULL pointer dereference.
>
> Do you have a script to reproduce the bug?
>
Nope, a convoluted set of internal regression tests we have usually
triggers it. In the wild, customers running HANNA are seeing it,
occasionally.
> > Regardless you find furhter bugs, or not, this patch is needed to correct a
> > blunt coding mistake.
>
> As above. I don't agree with that.
>
It's OK to disagree, split_swap_cluster still misses the cluster_info NULL check,
though.
next prev parent reply other threads:[~2020-09-24 6:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 18:48 [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference Rafael Aquini
2020-09-22 19:47 ` Andrew Morton
2020-09-23 13:42 ` Rafael Aquini
2020-09-25 2:59 ` Andrew Morton
2020-09-25 3:06 ` Huang, Ying
2020-09-25 3:10 ` Andrew Morton
2020-09-23 2:21 ` Huang, Ying
2020-09-23 4:34 ` Rafael Aquini
2020-09-23 5:13 ` Huang, Ying
2020-09-23 13:01 ` Rafael Aquini
2020-09-24 0:59 ` Huang, Ying
2020-09-24 2:09 ` Rafael Aquini
2020-09-24 3:51 ` Huang, Ying
2020-09-24 6:30 ` Rafael Aquini [this message]
2020-09-24 6:57 ` Huang, Ying
2020-09-24 7:45 ` Huang, Ying
2020-09-24 15:08 ` Rafael Aquini
2020-09-25 3:21 ` Huang, Ying
2020-09-26 15:16 ` Rafael Aquini
2020-09-27 5:33 ` Huang, Ying
2020-10-01 14:31 ` Rafael Aquini
2020-10-05 13:39 ` Rafael Aquini
2020-10-09 0:18 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200924063038.GD1023012@optiplex-lnx \
--to=aquini@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.