public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: "Hou Tao" <houtao@huaweicloud.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	bpf@vger.kernel.org
Cc: linux-riscv@lists.infradead.org,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342
Date: Mon, 28 Aug 2023 20:54:21 -0400	[thread overview]
Message-ID: <7b01b195-ee7e-b400-76f2-8b9085ac28bb@linux.dev> (raw)
In-Reply-To: <65c9e8d9-7682-2c8d-cd4d-9f0ca1213066@huaweicloud.com>



On 8/28/23 6:57 AM, Hou Tao wrote:
> Hi,
> 
> On 8/27/2023 10:53 PM, Yonghong Song wrote:
>>
>>
>> On 8/27/23 1:37 AM, Björn Töpel wrote:
>>> Björn Töpel <bjorn@kernel.org> writes:
>>>
>>>> Hou Tao <houtao@huaweicloud.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> On 8/26/2023 5:23 PM, Björn Töpel wrote:
>>>>>> Hou Tao <houtao@huaweicloud.com> writes:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 8/25/2023 11:28 PM, Yonghong Song wrote:
>>>>>>>>
>>>>>>>> On 8/25/23 3:32 AM, Björn Töpel wrote:
>>>>>>>>> I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
>>>>>>>>> selftests on bpf-next 9e3b47abeb8f.
>>>>>>>>>
>>>>>>>>> I'm able to reproduce the hang by multiple runs of:
>>>>>>>>>     | ./test_progs -a link_api -a linked_list
>>>>>>>>> I'm currently investigating that.
>>>>>>>>>
>>>>>>>>> But! Sometimes (every blue moon) I get a warn_on_once hit:
>>>>>>>>>     | ------------[ cut here ]------------
>>>>>>>>>     | WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342
>>>>>>>>> bpf_mem_refill+0x1fc/0x206
>>>>>>>>>     | Modules linked in: bpf_testmod(OE)
>>>>>>>>>     | CPU: 3 PID: 261 Comm: test_progs-cpuv Tainted: G           OE
>>>>>>>>> N 6.5.0-rc5-01743-gdcb152bb8328 #2
>>>>>>>>>     | Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>>     | epc : bpf_mem_refill+0x1fc/0x206
>>>>>>>>>     |  ra : irq_work_single+0x68/0x70
>>>>>>>>>     | epc : ffffffff801b1bc4 ra : ffffffff8015fe84 sp :
>>>>>>>>> ff2000000001be20
>>>>>>>>>     |  gp : ffffffff82d26138 tp : ff6000008477a800 t0 :
>>>>>>>>> 0000000000046600
>>>>>>>>>     |  t1 : ffffffff812b6ddc t2 : 0000000000000000 s0 :
>>>>>>>>> ff2000000001be70
>>>>>>>>>     |  s1 : ff5ffffffffe8998 a0 : ff5ffffffffe8998 a1 :
>>>>>>>>> ff600003fef4b000
>>>>>>>>>     |  a2 : 000000000000003f a3 : ffffffff80008250 a4 :
>>>>>>>>> 0000000000000060
>>>>>>>>>     |  a5 : 0000000000000080 a6 : 0000000000000000 a7 :
>>>>>>>>> 0000000000735049
>>>>>>>>>     |  s2 : ff5ffffffffe8998 s3 : 0000000000000022 s4 :
>>>>>>>>> 0000000000001000
>>>>>>>>>     |  s5 : 0000000000000007 s6 : ff5ffffffffe8570 s7 :
>>>>>>>>> ffffffff82d6bd30
>>>>>>>>>     |  s8 : 000000000000003f s9 : ffffffff82d2c5e8 s10:
>>>>>>>>> 000000000000ffff
>>>>>>>>>     |  s11: ffffffff82d2c5d8 t3 : ffffffff81ea8f28 t4 :
>>>>>>>>> 0000000000000000
>>>>>>>>>     |  t5 : ff6000008fd28278 t6 : 0000000000040000
>>>>>>>>>     | status: 0000000200000100 badaddr: 0000000000000000 cause:
>>>>>>>>> 0000000000000003
>>>>>>>>>     | [<ffffffff801b1bc4>] bpf_mem_refill+0x1fc/0x206
>>>>>>>>>     | [<ffffffff8015fe84>] irq_work_single+0x68/0x70
>>>>>>>>>     | [<ffffffff8015feb4>] irq_work_run_list+0x28/0x36
>>>>>>>>>     | [<ffffffff8015fefa>] irq_work_run+0x38/0x66
>>>>>>>>>     | [<ffffffff8000828a>] handle_IPI+0x3a/0xb4
>>>>>>>>>     | [<ffffffff800a5c3a>] handle_percpu_devid_irq+0xa4/0x1f8
>>>>>>>>>     | [<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
>>>>>>>>>     | [<ffffffff800ae570>] ipi_mux_process+0xac/0xfa
>>>>>>>>>     | [<ffffffff8000a8ea>] sbi_ipi_handle+0x2e/0x88
>>>>>>>>>     | [<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
>>>>>>>>>     | [<ffffffff807ee70e>] riscv_intc_irq+0x36/0x4e
>>>>>>>>>     | [<ffffffff812b5d3a>] handle_riscv_irq+0x54/0x86
>>>>>>>>>     | [<ffffffff812b6904>] do_irq+0x66/0x98
>>>>>>>>>     | ---[ end trace 0000000000000000 ]---
>>>>>>>>>
>>>>>>>>> Code:
>>>>>>>>>     | static void free_bulk(struct bpf_mem_cache *c)
>>>>>>>>>     | {
>>>>>>>>>     |     struct bpf_mem_cache *tgt = c->tgt;
>>>>>>>>>     |     struct llist_node *llnode, *t;
>>>>>>>>>     |     unsigned long flags;
>>>>>>>>>     |     int cnt;
>>>>>>>>>     |
>>>>>>>>>     |     WARN_ON_ONCE(tgt->unit_size != c->unit_size);
>>>>>>>>>     | ...
>>>>>>>>>
>>>>>>>>> I'm not well versed in the memory allocator; Before I dive into
>>>>>>>>> it --
>>>>>>>>> has anyone else hit it? Ideas on why the warn_on_once is hit?
>>>>>>>> Maybe take a look at the patch
>>>>>>>>     822fb26bdb55  bpf: Add a hint to allocated objects.
>>>>>>>>
>>>>>>>> In the above patch, we have
>>>>>>>>
>>>>>>>> +       /*
>>>>>>>> +        * Remember bpf_mem_cache that allocated this object.
>>>>>>>> +        * The hint is not accurate.
>>>>>>>> +        */
>>>>>>>> +       c->tgt = *(struct bpf_mem_cache **)llnode;
>>>>>>>>
>>>>>>>> I suspect that the warning may be related to the above.
>>>>>>>> I tried the above ./test_progs command line (running multiple
>>>>>>>> at the same time) and didn't trigger the issue.
>>>>>>> The extra 8-bytes before the freed pointer is used to save the
>>>>>>> pointer
>>>>>>> of the original bpf memory allocator where the freed pointer came
>>>>>>> from,
>>>>>>> so unit_free() could free the pointer back to the original
>>>>>>> allocator to
>>>>>>> prevent alloc-and-free unbalance.
>>>>>>>
>>>>>>> I suspect that a wrong pointer was passed to bpf_obj_drop, but do
>>>>>>> not
>>>>>>> find anything suspicious after checking linked_list. Another
>>>>>>> possibility
>>>>>>> is that there is write-after-free problem which corrupts the extra
>>>>>>> 8-bytes before the freed pointer. Could you please apply the
>>>>>>> following
>>>>>>> debug patch to check whether or not the extra 8-bytes are
>>>>>>> corrupted ?
>>>>>> Thanks for getting back!
>>>>>>
>>>>>> I took your patch for a run, and there's a hit:
>>>>>>     | bad cache ff5ffffffffe8570: got size 96 work
>>>>>> ffffffff801b19c8, cache ff5ffffffffe8980 exp size 128 work
>>>>>> ffffffff801b19c8
>>>>>
>>>>> The extra 8-bytes are not corrupted. Both of these two
>>>>> bpf_mem_cache are
>>>>> valid and there are in the cache array defined in bpf_mem_caches. BPF
>>>>> memory allocator allocated the pointer from 96-bytes sized-cache,
>>>>> but it
>>>>> tried to free the pointer through 128-bytes sized-cache.
>>>>>
>>>>> Now I suspect there is no 96-bytes slab in your system and ksize(ptr -
>>>>> LLIST_NODE_SZ) returns 128, so bpf memory allocator selected the
>>>>> 128-byte sized-cache instead of 96-bytes sized-cache. Could you please
>>>>> check the value of KMALLOC_MIN_SIZE in your kernel .config and
>>>>> using the
>>>>> following command to check whether there is 96-bytes slab in your
>>>>> system:
>>>>
>>>> KMALLOC_MIN_SIZE is 64.
>>>>
>>>>> $ cat /proc/slabinfo |grep kmalloc-96
>>>>> dma-kmalloc-96         0      0     96   42    1 : tunables    0    0
>>>>> 0 : slabdata      0      0      0
>>>>> kmalloc-96          1865   2268     96   42    1 : tunables    0    0
>>>>> 0 : slabdata     54     54      0
>>>>>
>>>>> In my system, slab has 96-bytes cached, so grep outputs something,
>>>>> but I
>>>>> think there will no output in your system.
>>>>
>>>> You're right! No kmalloc-96.
>>>
>>> To get rid of the warning, limit available sizes from
>>> bpf_mem_alloc_init()?
> 
> It is not enough. We need to adjust size_index accordingly during
> initialization. Could you please try the attached patch below ? It is
> not a formal patch and I am considering to disable prefilling for these
> redirected bpf_mem_caches.
>>
>> Do you know why your system does not have kmalloc-96?
> 
> According to the implementation of setup_kmalloc_cache_index_table() and
> create_kmalloc_caches(),  when KMALLOC_MIN_SIZE is greater than 64,
> kmalloc-96 will be omitted. If KMALLOC_MIN_SIZE is greater than 128,
> kmalloc-192 will be omitted as well.

Thanks! This indeed the case except the above 'greater than' should be
'greater than or equal to'. For example, greater than or equal to 64
means the minimum alignment is 64, so 96 is removed. Similar for
greater than or equal to 128.

I agree it is a good idea to align memalloc cache unit size matching
underlying kmalloc cache unit size.

>>
>>>
>>>
>>> Björn
>>
>> .
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-08-29  0:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-25 10:32 WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342 Björn Töpel
2023-08-25 15:28 ` Yonghong Song
2023-08-25 18:53   ` Alexei Starovoitov
2023-08-25 19:49     ` Alexei Starovoitov
2023-08-25 21:31       ` Andrii Nakryiko
2023-08-26 22:49         ` Kumar Kartikeya Dwivedi
2023-08-26  3:48   ` Hou Tao
2023-08-26  9:23     ` Björn Töpel
2023-08-26 10:27       ` Hou Tao
2023-08-26 10:49         ` Björn Töpel
2023-08-27  8:37           ` Björn Töpel
2023-08-27 14:53             ` Yonghong Song
2023-08-28 13:57               ` Hou Tao
2023-08-29  0:54                 ` Yonghong Song [this message]
2023-08-29  7:26                 ` Björn Töpel
2023-08-29 11:46                   ` Björn Töpel
2023-08-30 12:15                     ` Hou Tao
2023-08-29 12:54                   ` Björn Töpel
2023-08-29 15:26                 ` Alexei Starovoitov
2023-08-30 12:08                   ` Hou Tao
2023-08-30 21:05                     ` Alexei Starovoitov
2023-08-26 13:44 ` RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342) Björn Töpel
2023-08-26 18:12   ` Nam Cao
2023-08-26 18:31     ` Nam Cao
2023-08-27  8:11     ` Björn Töpel
2023-08-27  8:35       ` Nam Cao
2023-08-27  9:04         ` Björn Töpel
2023-08-27  9:39           ` Nam Cao
2023-08-27 19:20             ` Björn Töpel
2023-08-27 19:41               ` Nam Cao
2023-08-27 20:15               ` Nam Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b01b195-ee7e-b400-76f2-8b9085ac28bb@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=houtao@huaweicloud.com \
    --cc=linux-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox