From: Suzuki K Poulose <suzuki.poulose@arm.com>
To: mgorman@techsingularity.net
Cc: mhocko@suse.com, kvm@vger.kernel.org, marc.zyngier@arm.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, cai@lca.pw,
akpm@linux-foundation.org, kvmarm@lists.cs.columbia.edu
Subject: Re: mm/compaction: BUG: NULL pointer dereference
Date: Fri, 24 May 2019 11:42:54 +0100 [thread overview]
Message-ID: <98b93f38-64a7-dcd1-c027-6d1195f3380f@arm.com> (raw)
In-Reply-To: <20190524103924.GN18914@techsingularity.net>
Hi Mel,
Thanks for your quick response.
On 24/05/2019 11:39, Mel Gorman wrote:
> On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote:
>> Hi,
>>
>> We are hitting NULL pointer dereferences while running stress tests with KVM.
>> See splat [0]. The test is to spawn 100 VMs all doing standard debian
>> installation (Thanks to Marc's automated scripts, available here [1] ).
>> The problem has been reproduced with a better rate of success from 5.1-rc6
>> onwards.
>>
>> The issue is only reproducible with swapping enabled and the entire
>> memory is used up, when swapping heavily. Also this issue is only reproducible
>> on only one server with 128GB, which has the following memory layout:
>>
>> [32GB@4GB, hole , 96GB@544GB]
>>
>> Here is my non-expert analysis of the issue so far.
>>
>> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable()
>> to figure out the cached values for migrate/free pfn for a zone, by scanning through
>> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ],
>> with the following area of holes : [ 0x20_0000, 0x880_0000 ].
>> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which
>> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2,
>> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000.
>>
>> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However,
>> since we cant find anything during the search we fall back to using the page belonging
>> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid
>> PFN or not. This is then passed on to fast_isolate_around() which tries to do :
>> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer.
>>
>> The following patch seems to fix the issue for me, but I am not quite convinced that
>> it is the right fix. Thoughts ?
>>
>
> I think the patch is valid and the alternatives would be unnecessarily
> complicated. During a normal scan for free pages to isolate, there
> is a check for pageblock_pfn_to_page() which uses a pfn_valid check
> for non-contiguous zones in __pageblock_pfn_to_page. Now, while the
I had the initial version with the pageblock_pfn_to_page(), but as you said,
it is a complicated way of perform the same check as pfn_valid().
> non-contiguous check could be made in the area you highlight, it would be a
> relatively small optimisation that would be unmeasurable overall. However,
> it is definitely the case that if the PFN you highlight is invalid that
> badness happens. If you want to express this as a signed-off patch with
> an adjusted changelog then I'd be happy to add
Sure, will send it right away.
>
> Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
>
Thanks.
Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2019-05-24 10:43 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-24 9:20 mm/compaction: BUG: NULL pointer dereference Suzuki K Poulose
2019-05-24 10:39 ` Mel Gorman
2019-05-24 10:42 ` Suzuki K Poulose [this message]
2019-05-24 15:31 ` [PATCH] mm, compaction: Make sure we isolate a valid PFN Suzuki K Poulose
2019-05-24 15:51 ` Mel Gorman
2019-05-27 5:38 ` Anshuman Khandual
2019-05-24 10:56 ` mm/compaction: BUG: NULL pointer dereference Anshuman Khandual
2019-05-24 12:30 ` Mel Gorman
2019-05-24 13:13 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98b93f38-64a7-dcd1-c027-6d1195f3380f@arm.com \
--to=suzuki.poulose@arm.com \
--cc=akpm@linux-foundation.org \
--cc=cai@lca.pw \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marc.zyngier@arm.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox