From: Michael Ellerman <mpe@ellerman.id.au>
To: Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>,
Nathan Lynch <nathanl@linux.ibm.com>,
Bharata B Rao <bharata@linux.ibm.com>,
linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>,
Mel Gorman <mgorman@suse.de>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org, Christopher Lameter <cl@linux.com>
Subject: Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
Date: Thu, 19 Mar 2020 12:11:51 +1100 [thread overview]
Message-ID: <87mu8ddu7c.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <87tv2ldw1k.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Vlastimil Babka <vbabka@suse.cz> writes:
>> On 3/18/20 11:02 AM, Michal Hocko wrote:
>>> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
>>>> Calling a kmalloc_node on a possible node which is not yet onlined can
>>>> lead to panic. Currently node_present_pages() doesn't verify the node is
>>>> online before accessing the pgdat for the node. However pgdat struct may
>>>> not be available resulting in a crash.
>>>>
>>>> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
>>>> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> Call Trace:
>>>> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
>>>> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
>>>> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
>>>> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
>>>> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
>>>> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
>>>> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
>>>> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
>>>> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
>>>> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
>>>> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
>>>>
>>>> Fix this by verifying the node is online before accessing the pgdat
>>>> structure. Fix the same for node_spanned_pages() too.
>>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: Mel Gorman <mgorman@suse.de>
>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Christopher Lameter <cl@linux.com>
>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>>>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>>>>
>>>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> ---
>>>> include/linux/mmzone.h | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index f3f264826423..88078a3b95e5 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>>>> atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS];
>>>> } pg_data_t;
>>>>
>>>> -#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
>>>> -#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
>>>> +#define node_present_pages(nid) \
>>>> + (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
>>>> +#define node_spanned_pages(nid) \
>>>> + (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>>>
>>> I believe this is a wrong approach. We really do not want to special
>>> case all the places which require NODE_DATA. Can we please go and
>>> allocate pgdat for all possible nodes?
>>>
>>> The current state of memory less hacks subtle bugs poping up here and
>>> there just prove that we should have done that from the very begining
>>> IMHO.
>>
>> Yes. So here's an alternative proposal for fixing the current situation in SLUB,
>> before the long-term solution of having all possible nodes provide valid pgdat
>> with zonelists:
>>
>> - fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
>> as fallback instead of node_to_mem_node()
>> - this removes all uses of node_to_mem_node (luckily it's just SLUB),
>> kill it completely instead of trying to fix it up
>> - patch 1/4 is not needed with the fix
>> - perhaps many of your other patches are alss not needed
>> - once we get the long-term solution, some of the !node_online() checks can be removed
>
> Seems like a nice solution to me :)
>
>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..1d4f2d7a0080 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>> struct page *page;
>> unsigned int order = oo_order(oo);
>>
>> - if (node == NUMA_NO_NODE)
>> + if (node == NUMA_NO_NODE || !node_online(node))
>
> Why don't we need the node_present_pages() check here?
>
>> page = alloc_pages(flags, order);
>> else
>> page = __alloc_pages_node(node, flags, order);
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>
>> if (node == NUMA_NO_NODE)
>> searchnode = numa_mem_id();
>> - else if (!node_present_pages(node))
>> - searchnode = node_to_mem_node(node);
>>
>> object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> if (object || node != NUMA_NO_NODE)
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> redo:
>>
>> if (unlikely(!node_match(page, node))) {
>> - int searchnode = node;
>> -
>> - if (node != NUMA_NO_NODE && !node_present_pages(node))
>> - searchnode = node_to_mem_node(node);
>> -
>> - if (unlikely(!node_match(page, searchnode))) {
>> + /*
>> + * node_match() false implies node != NUMA_NO_NODE
>> + * but if the node is not online and has no pages, just
> ^
> this should be 'or' ?
Sorry I see you've already fixed this in the version you posted.
cheers
WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
Sachin Sant <sachinp@linux.vnet.ibm.com>,
Christopher Lameter <cl@linux.com>,
linuxppc-dev@lists.ozlabs.org,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Bharata B Rao <bharata@linux.ibm.com>,
Nathan Lynch <nathanl@linux.ibm.com>
Subject: Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
Date: Thu, 19 Mar 2020 12:11:51 +1100 [thread overview]
Message-ID: <87mu8ddu7c.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <87tv2ldw1k.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Vlastimil Babka <vbabka@suse.cz> writes:
>> On 3/18/20 11:02 AM, Michal Hocko wrote:
>>> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
>>>> Calling a kmalloc_node on a possible node which is not yet onlined can
>>>> lead to panic. Currently node_present_pages() doesn't verify the node is
>>>> online before accessing the pgdat for the node. However pgdat struct may
>>>> not be available resulting in a crash.
>>>>
>>>> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
>>>> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> Call Trace:
>>>> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
>>>> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
>>>> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
>>>> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
>>>> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
>>>> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
>>>> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
>>>> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
>>>> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
>>>> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
>>>> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
>>>>
>>>> Fix this by verifying the node is online before accessing the pgdat
>>>> structure. Fix the same for node_spanned_pages() too.
>>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: Mel Gorman <mgorman@suse.de>
>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Christopher Lameter <cl@linux.com>
>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>>>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>>>>
>>>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> ---
>>>> include/linux/mmzone.h | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index f3f264826423..88078a3b95e5 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>>>> atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS];
>>>> } pg_data_t;
>>>>
>>>> -#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
>>>> -#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
>>>> +#define node_present_pages(nid) \
>>>> + (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
>>>> +#define node_spanned_pages(nid) \
>>>> + (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>>>
>>> I believe this is a wrong approach. We really do not want to special
>>> case all the places which require NODE_DATA. Can we please go and
>>> allocate pgdat for all possible nodes?
>>>
>>> The current state of memory less hacks subtle bugs poping up here and
>>> there just prove that we should have done that from the very begining
>>> IMHO.
>>
>> Yes. So here's an alternative proposal for fixing the current situation in SLUB,
>> before the long-term solution of having all possible nodes provide valid pgdat
>> with zonelists:
>>
>> - fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
>> as fallback instead of node_to_mem_node()
>> - this removes all uses of node_to_mem_node (luckily it's just SLUB),
>> kill it completely instead of trying to fix it up
>> - patch 1/4 is not needed with the fix
>> - perhaps many of your other patches are alss not needed
>> - once we get the long-term solution, some of the !node_online() checks can be removed
>
> Seems like a nice solution to me :)
>
>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..1d4f2d7a0080 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>> struct page *page;
>> unsigned int order = oo_order(oo);
>>
>> - if (node == NUMA_NO_NODE)
>> + if (node == NUMA_NO_NODE || !node_online(node))
>
> Why don't we need the node_present_pages() check here?
>
>> page = alloc_pages(flags, order);
>> else
>> page = __alloc_pages_node(node, flags, order);
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>
>> if (node == NUMA_NO_NODE)
>> searchnode = numa_mem_id();
>> - else if (!node_present_pages(node))
>> - searchnode = node_to_mem_node(node);
>>
>> object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> if (object || node != NUMA_NO_NODE)
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> redo:
>>
>> if (unlikely(!node_match(page, node))) {
>> - int searchnode = node;
>> -
>> - if (node != NUMA_NO_NODE && !node_present_pages(node))
>> - searchnode = node_to_mem_node(node);
>> -
>> - if (unlikely(!node_match(page, searchnode))) {
>> + /*
>> + * node_match() false implies node != NUMA_NO_NODE
>> + * but if the node is not online and has no pages, just
> ^
> this should be 'or' ?
Sorry I see you've already fixed this in the version you posted.
cheers
next prev parent reply other threads:[~2020-03-19 1:13 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-18 7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
2020-03-18 7:28 ` Srikar Dronamraju
2020-03-18 7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
2020-03-18 7:28 ` Srikar Dronamraju
2020-03-18 10:02 ` Michal Hocko
2020-03-18 10:02 ` Michal Hocko
2020-03-18 11:02 ` Srikar Dronamraju
2020-03-18 11:02 ` Srikar Dronamraju
2020-03-18 11:14 ` Michal Hocko
2020-03-18 11:14 ` Michal Hocko
2020-03-18 11:53 ` Vlastimil Babka
2020-03-18 11:53 ` Vlastimil Babka
2020-03-18 12:52 ` Michal Hocko
2020-03-18 12:52 ` Michal Hocko
2020-03-19 0:32 ` Michael Ellerman
2020-03-19 0:32 ` Michael Ellerman
2020-03-19 1:11 ` Michael Ellerman [this message]
2020-03-19 1:11 ` Michael Ellerman
2020-03-19 9:38 ` Vlastimil Babka
2020-03-19 9:38 ` Vlastimil Babka
2020-03-18 7:28 ` [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
2020-03-18 7:28 ` Srikar Dronamraju
2020-03-18 7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
2020-03-18 7:28 ` Srikar Dronamraju
2020-03-18 19:20 ` Christopher Lameter
2020-03-18 19:20 ` Christopher Lameter
2020-03-19 7:44 ` Michal Hocko
2020-03-19 7:44 ` Michal Hocko
2020-03-18 7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
2020-03-18 7:28 ` Srikar Dronamraju
2020-03-18 14:28 ` kbuild test robot
2020-03-18 14:28 ` kbuild test robot
2020-03-18 18:56 ` kbuild test robot
2020-03-18 18:56 ` kbuild test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mu8ddu7c.fsf@mpe.ellerman.id.au \
--to=mpe@ellerman.id.au \
--cc=akpm@linux-foundation.org \
--cc=bharata@linux.ibm.com \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=nathanl@linux.ibm.com \
--cc=sachinp@linux.vnet.ibm.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.