From: Pavan Kondeti <quic_pkondeti@quicinc.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: <akpm@linux-foundation.org>, <osalvador@suse.de>,
<dan.j.williams@intel.com>, <david@redhat.com>, <vbabka@suse.cz>,
<mgorman@techsingularity.net>, <aneesh.kumar@linux.ibm.com>,
<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/sparsemem: fix race in accessing memory_section->usage
Date: Mon, 16 Oct 2023 16:03:44 +0530 [thread overview]
Message-ID: <e1102c1b-1773-4e59-8a70-00a7deff2bde@quicinc.com> (raw)
In-Reply-To: <1697202267-23600-1-git-send-email-quic_charante@quicinc.com>
On Fri, Oct 13, 2023 at 06:34:27PM +0530, Charan Teja Kalla wrote:
> The below race is observed on a PFN which falls into the device memory
> region with the system memory configuration where PFN's are such that
> [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and
> end pfn contains the device memory PFN's as well, the compaction
> triggered will try on the device memory PFN's too though they end up in
> NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory
> sections). When from other core, the section mappings are being removed
> for the ZONE_DEVICE region, that the PFN in question belongs to,
> on which compaction is currently being operated is resulting into the
> kernel crash with CONFIG_SPASEMEM_VMEMAP enabled.
>
> compact_zone() memunmap_pages
> ------------- ---------------
> __pageblock_pfn_to_page
> ......
> (a)pfn_valid():
> valid_section()//return true
> (b)__remove_pages()->
> sparse_remove_section()->
> section_deactivate():
> [Free the array ms->usage and set
> ms->usage = NULL]
> pfn_section_valid()
> [Access ms->usage which
> is NULL]
>
> NOTE: From the above it can be said that the race is reduced to between
> the pfn_valid()/pfn_section_valid() and the section deactivate with
> SPASEMEM_VMEMAP enabled.
>
> The commit b943f045a9af("mm/sparse: fix kernel crash with
> pfn_section_valid check") tried to address the same problem by clearing
> the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
> false thus ms->usage is not accessed.
>
> Fix this issue by the below steps:
> a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
> b) RCU protected read side critical section will either return NULL when
> SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
> c) Synchronize the rcu on the write side and free the ->usage. No
> attempt will be made to access ->usage after this as the
> SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
>
> Since the section_deactivate() is a rare operation and will come in the
> hot remove path, impact of synchronize_rcu() should be negligble.
struct mem_section_usage has other field like pageblock_flags. Do we
need to protect its readers with RCU? Also can we annotate usage field
in struct mem_section with __rcu and use RCU accessors like
rcu_dereference() while using memsection::usage field?
>
> Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot")
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> ---
> include/linux/mmzone.h | 11 +++++++++--
> mm/sparse.c | 14 ++++++++------
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc..c877396 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1987,6 +1987,7 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
> + int ret;
>
> /*
> * Ensure the upper PAGE_SHIFT bits are clear in the
> @@ -2000,13 +2001,19 @@ static inline int pfn_valid(unsigned long pfn)
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> ms = __pfn_to_section(pfn);
> - if (!valid_section(ms))
> + rcu_read_lock();
> + if (!valid_section(ms)) {
> + rcu_read_unlock();
> return 0;
> + }
> /*
> * Traditionally early sections always returned pfn_valid() for
> * the entire section-sized span.
> */
> - return early_section(ms) || pfn_section_valid(ms, pfn);
> + ret = early_section(ms) || pfn_section_valid(ms, pfn);
> + rcu_read_unlock();
> +
> + return ret;
> }
> #endif
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e5..ca7dbe1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -792,6 +792,13 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> + * Mark the section invalid so that valid_section()
> + * return false. This prevents code from dereferencing
> + * ms->usage array.
> + */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> +
This trick may not be needed if we add proper NULL checks around ms->usage. We are anyway
introducing a new rule this check needs to be done under RCU lock, so why not revisit it?
> + /*
> * When removing an early section, the usage map is kept (as the
> * usage maps of other sections fall into the same page). It
> * will be re-used when re-adding the section - which is then no
> @@ -799,16 +806,11 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * was allocated during boot.
> */
> if (!PageReserved(virt_to_page(ms->usage))) {
> + synchronize_rcu();
> kfree(ms->usage);
> ms->usage = NULL;
> }
If we add NULL checks around ms->usage, this becomes
tmp = rcu_replace_pointer(ms->usage, NULL, hotplug_locked());
syncrhonize_rcu();
kfree(tmp);
btw, Do we come here with any global locks? if yes, synchronize_rcu() can add delays in releasing
the lock. In that case we may have to go for async RCU free.
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - /*
> - * Mark the section invalid so that valid_section()
> - * return false. This prevents code from dereferencing
> - * ms->usage array.
> - */
> - ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> }
>
> /*
>
Thanks,
Pavan
next prev parent reply other threads:[~2023-10-16 10:33 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-13 13:04 [PATCH] mm/sparsemem: fix race in accessing memory_section->usage Charan Teja Kalla
2023-10-14 22:25 ` Andrew Morton
2023-10-16 8:23 ` David Hildenbrand
2023-10-16 13:38 ` Charan Teja Kalla
2023-10-16 22:34 ` Andrew Morton
2023-10-18 7:52 ` David Hildenbrand
2023-10-16 10:33 ` Pavan Kondeti [this message]
2023-10-17 14:10 ` Charan Teja Kalla
2023-10-17 14:53 ` David Hildenbrand
2023-10-25 21:35 ` Andrew Morton
2023-10-26 7:00 ` David Hildenbrand
2023-10-26 7:18 ` Charan Teja Kalla
2024-01-15 18:44 ` Alexander Potapenko
2024-01-15 20:34 ` Marco Elver
2024-01-17 19:18 ` Marco Elver
2024-01-18 9:01 ` Alexander Potapenko
2024-01-18 9:43 ` Marco Elver
2024-01-25 13:20 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1102c1b-1773-4e59-8a70-00a7deff2bde@quicinc.com \
--to=quic_pkondeti@quicinc.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=osalvador@suse.de \
--cc=quic_charante@quicinc.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.