From: Pavan Kondeti <quic_pkondeti@quicinc.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: <akpm@linux-foundation.org>, <osalvador@suse.de>,
<dan.j.williams@intel.com>, <david@redhat.com>, <vbabka@suse.cz>,
<mgorman@techsingularity.net>, <aneesh.kumar@linux.ibm.com>,
<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/sparsemem: fix race in accessing memory_section->usage
Date: Mon, 16 Oct 2023 16:03:44 +0530 [thread overview]
Message-ID: <e1102c1b-1773-4e59-8a70-00a7deff2bde@quicinc.com> (raw)
In-Reply-To: <1697202267-23600-1-git-send-email-quic_charante@quicinc.com>
On Fri, Oct 13, 2023 at 06:34:27PM +0530, Charan Teja Kalla wrote:
> The below race is observed on a PFN which falls into the device memory
> region with the system memory configuration where PFN's are such that
> [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and
> end pfn contains the device memory PFN's as well, the compaction
> triggered will try on the device memory PFN's too though they end up in
> NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory
> sections). When from other core, the section mappings are being removed
> for the ZONE_DEVICE region, that the PFN in question belongs to,
> on which compaction is currently being operated is resulting into the
> kernel crash with CONFIG_SPASEMEM_VMEMAP enabled.
>
> compact_zone() memunmap_pages
> ------------- ---------------
> __pageblock_pfn_to_page
> ......
> (a)pfn_valid():
> valid_section()//return true
> (b)__remove_pages()->
> sparse_remove_section()->
> section_deactivate():
> [Free the array ms->usage and set
> ms->usage = NULL]
> pfn_section_valid()
> [Access ms->usage which
> is NULL]
>
> NOTE: From the above it can be said that the race is reduced to between
> the pfn_valid()/pfn_section_valid() and the section deactivate with
> SPASEMEM_VMEMAP enabled.
>
> The commit b943f045a9af("mm/sparse: fix kernel crash with
> pfn_section_valid check") tried to address the same problem by clearing
> the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
> false thus ms->usage is not accessed.
>
> Fix this issue by the below steps:
> a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
> b) RCU protected read side critical section will either return NULL when
> SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
> c) Synchronize the rcu on the write side and free the ->usage. No
> attempt will be made to access ->usage after this as the
> SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
>
> Since the section_deactivate() is a rare operation and will come in the
> hot remove path, impact of synchronize_rcu() should be negligble.
struct mem_section_usage has other field like pageblock_flags. Do we
need to protect its readers with RCU? Also can we annotate usage field
in struct mem_section with __rcu and use RCU accessors like
rcu_dereference() while using memsection::usage field?
>
> Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot")
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> ---
> include/linux/mmzone.h | 11 +++++++++--
> mm/sparse.c | 14 ++++++++------
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc..c877396 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1987,6 +1987,7 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
> + int ret;
>
> /*
> * Ensure the upper PAGE_SHIFT bits are clear in the
> @@ -2000,13 +2001,19 @@ static inline int pfn_valid(unsigned long pfn)
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> ms = __pfn_to_section(pfn);
> - if (!valid_section(ms))
> + rcu_read_lock();
> + if (!valid_section(ms)) {
> + rcu_read_unlock();
> return 0;
> + }
> /*
> * Traditionally early sections always returned pfn_valid() for
> * the entire section-sized span.
> */
> - return early_section(ms) || pfn_section_valid(ms, pfn);
> + ret = early_section(ms) || pfn_section_valid(ms, pfn);
> + rcu_read_unlock();
> +
> + return ret;
> }
> #endif
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e5..ca7dbe1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -792,6 +792,13 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> + * Mark the section invalid so that valid_section()
> + * return false. This prevents code from dereferencing
> + * ms->usage array.
> + */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> +
This trick may not be needed if we add proper NULL checks around ms->usage. We are anyway
introducing a new rule this check needs to be done under RCU lock, so why not revisit it?
> + /*
> * When removing an early section, the usage map is kept (as the
> * usage maps of other sections fall into the same page). It
> * will be re-used when re-adding the section - which is then no
> @@ -799,16 +806,11 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * was allocated during boot.
> */
> if (!PageReserved(virt_to_page(ms->usage))) {
> + synchronize_rcu();
> kfree(ms->usage);
> ms->usage = NULL;
> }
If we add NULL checks around ms->usage, this becomes
tmp = rcu_replace_pointer(ms->usage, NULL, hotplug_locked());
syncrhonize_rcu();
kfree(tmp);
btw, Do we come here with any global locks? if yes, synchronize_rcu() can add delays in releasing
the lock. In that case we may have to go for async RCU free.
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - /*
> - * Mark the section invalid so that valid_section()
> - * return false. This prevents code from dereferencing
> - * ms->usage array.
> - */
> - ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> }
>
> /*
>
Thanks,
Pavan
next prev parent reply other threads:[~2023-10-16 10:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-13 13:04 [PATCH] mm/sparsemem: fix race in accessing memory_section->usage Charan Teja Kalla
2023-10-14 22:25 ` Andrew Morton
2023-10-16 8:23 ` David Hildenbrand
2023-10-16 13:38 ` Charan Teja Kalla
2023-10-16 22:34 ` Andrew Morton
2023-10-18 7:52 ` David Hildenbrand
2023-10-16 10:33 ` Pavan Kondeti [this message]
2023-10-17 14:10 ` Charan Teja Kalla
2023-10-17 14:53 ` David Hildenbrand
2023-10-25 21:35 ` Andrew Morton
2023-10-26 7:00 ` David Hildenbrand
2023-10-26 7:18 ` Charan Teja Kalla
2024-01-15 18:44 ` Alexander Potapenko
2024-01-15 20:34 ` Marco Elver
2024-01-17 19:18 ` Marco Elver
2024-01-18 9:01 ` Alexander Potapenko
2024-01-18 9:43 ` Marco Elver
2024-01-25 13:20 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1102c1b-1773-4e59-8a70-00a7deff2bde@quicinc.com \
--to=quic_pkondeti@quicinc.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=osalvador@suse.de \
--cc=quic_charante@quicinc.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox