From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>,
Alexei Starovoitov <ast@kernel.org>,
kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
"houtao1@huawei.com" <houtao1@huawei.com>
Subject: Re: [linus:master] [bpf] c930472552: WARNING:at_kernel/bpf/memalloc.c:#bpf_mem_alloc_init
Date: Tue, 7 Nov 2023 08:04:37 -0800 [thread overview]
Message-ID: <40ba7904-ade6-4a06-bb40-26b3853d0c0b@linux.dev> (raw)
In-Reply-To: <f413cce7-db58-fd5c-63b9-5dfc5876c1aa@huaweicloud.com>
On 11/7/23 2:43 AM, Hou Tao wrote:
> Hi,
>
> On 11/4/2023 12:49 AM, Yonghong Song wrote:
>> On 11/2/23 11:54 PM, Hou Tao wrote:
>>> Hi,
>>>
>>> On 11/3/2023 12:08 AM, Yonghong Song wrote:
>>>> On 11/2/23 6:40 AM, Hou Tao wrote:
>>>>> Hi Alexei,
>>>>>
>>>>> On 10/31/2023 4:01 PM, Hou Tao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 10/30/2023 10:11 PM, kernel test robot wrote:
>>>>>>> hi, Hou Tao,
>>>>>>>
>>>>>>> we noticed a WARN_ONCE added in this commit was hit in our tests.
>>>>>>> FYI.
>>>>>>>
>>>>>>>
> SNIP
>>>>> I see what has happened. The problem is twofold:
>>>>> (1) The object_size of kmalloc-cg-96 is adjust from 96 to 128 due to
>>>>> slab merge in __kmem_cache_alias(). For SLAB, SLAB_HWCACHE_ALIGN is
>>>>> enabled by default for kmalloc slab, so align is 64 and size is 128
>>>>> for
>>>>> kmalloc-cg-96. So when unit_alloc() does kmalloc_node(96,
>>>>> __GFP_ACCOUNT,
>>>>> node), ksize() will return 128 instead of 96 for the returned pointer.
>>>>> SLUB has a similar merge logic, but because its align is 8 under
>>>>> x86-64,
>>>>> so the warning doesn't happen for i386 + SLUB, but I think the similar
>>>>> problem may exist for other architectures.
>>>>> (2) kmalloc_size_roundup() returns the object_size of kmalloc-96
>>>>> instead
>>>>> of kmalloc-cg-96, so bpf_mem_cache_adjust_size() doesn't adjust
>>>>> size_index accordingly. The reason why the object_size of
>>>>> kmalloc-96 is
>>>>> 96 instead of 128 is that there is slab merge for kmalloc-96.
>>>>>
>>>>> About how to fix the problem, I have two ideas:
>>>>> The first is to introduce kmalloc_size_roundup_flags(), so
>>>>> bpf_mem_cache_adjust_size() could use kmalloc_size_roundup_flags(size,
>>>>> __GFP_ACCOUNT) to get the object_size of kmalloc-cg-xxx. It could fix
>>>>> the warning for now, but the warning may pop-up occasionally due to
>>>>> SLUB
>>>>> merge and unusual slab align. The second is just using the
>>>>> bpf_mem_cache
>>>>> pointer to get the unit_size which is saved before the to-be-free
>>>>> pointer. Its downside is that it may can not be able to skip the free
>>>>> operation for pointer which is not allocated from bpf ma, but I
>>>>> think it
>>>>> is acceptable. I prefer the latter solution. What do you think ?
>>>> Is it possible that in bpf_mem_cache_adjust_size(), we do a series of
>>>> kmalloc (for supported bucket size) and call ksize() to get the actual
>>>> allocated object size. So eventually all possible allocated object
>>>> sizes
>>>> will be used for size_index[]. This will avoid all kind of special
>>>> corner cases due to config/macro/arch etc. WDYT?
>>> It is basically the same as the first proposed solution and it has the
>>> same flaw. The problem is that slab merge can happen in any time, so the
>>> return value of ksize() may change even all passed pointers are
>>> allocated from the same slab. Considering the following case: during the
>>> invocation of bpf_mem_cache_adjust_size() or the initialization of
>>> bpf_global_ma, there is no slab merge and ksize() for a 96-bytes object
>>> returns 96. But after these invocations, a new slab created by a kernel
>>> module is merged to kmalloc-cg-96 and the object_size of kmalloc-cg-96
>>> is adjust from 96 to 128 (which is possible for x86-64 + CONFIG_SLAB,
>>> because it is alignment requirement is 64 for 96-bytes slab). So soon or
>> So, the object_size for allocated objects in that is adjusted from 96
>> to 128
>> while previously allocated objects should have no change, it is merely
>> ksize(old_obj)
>> previous return 96, now returns 128, right? Okay, so this is indeed a
>> problem
>> since we use ksize() to decide the bucket.
> Yes. The object_size of underlying slab changes, so the return value of
> ksize() will change as well.
>>
>>> later, when bpf_global_ma frees a 96-byte-sized pointer which is
>>> allocated from a bpf_mem_cache in which unit_size is 96, bpf_mem_free()
>>> will free the pointer through a bpf_mem_cache in which unit_size is 128,
>>> because the return value of ksize() changes. Maybe we should introduce a
>>> new API in mm which returns size instead of object_size of underlying
>>> slab, so the return value will not change due to slab merge.
>> In this case, to avoid the warning, indeed we need to use '96' instead
>> of '128'.
>> So use the original ksize() return value is indeed a solution.
>> We could use the mechanism similar to percpu alloc to save '96' in the
>> memory.
> We have already saved the pointer of bpf_mem_cache in the extra space
> (aka LLIST_NODE_SZ) which is allocated together with the returned
> pointer, so I think we could use bpf_mem_cache->unit_size to get the
> size of the free pointer directly. I will check whether or not there is
> performance degradation before posting the patch.
See:
bpf_local_storage.c: err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
bpf_local_storage.c: err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
core.c: ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
core.c: ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
cpumask.c: ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false);
hashtab.c: err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false);
hashtab.c: err = bpf_mem_alloc_init(&htab->pcpu_ma,
memalloc.c:int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
Some 'size' parameter in core.c is zero.
Not sure how exactly you will resolve this issue based on
bpf_mem_cache->unit_size. But looking forward to your patch!
>
> Regards,
> Tao
>
next prev parent reply other threads:[~2023-11-07 16:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 14:11 [linus:master] [bpf] c930472552: WARNING:at_kernel/bpf/memalloc.c:#bpf_mem_alloc_init kernel test robot
2023-10-31 8:01 ` Hou Tao
2023-11-02 13:40 ` Hou Tao
2023-11-02 16:08 ` Yonghong Song
2023-11-03 6:54 ` Hou Tao
2023-11-03 16:49 ` Yonghong Song
2023-11-07 10:43 ` Hou Tao
2023-11-07 16:04 ` Yonghong Song [this message]
2023-11-18 13:12 ` Hou Tao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40ba7904-ade6-4a06-bb40-26b3853d0c0b@linux.dev \
--to=yonghong.song@linux.dev \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.