All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Hou Tao <houtao@huaweicloud.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Tonghao Zhang <xiangxia.m.yue@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>
Cc: netdev@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	bpf <bpf@vger.kernel.org>, Hao Luo <haoluo@google.com>,
	"houtao1@huawei.com" <houtao1@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [net-next] bpf: avoid hashtab deadlock with try_lock
Date: Tue, 29 Nov 2022 11:06:51 -0500	[thread overview]
Message-ID: <541aa740-dcf3-35f5-9f9b-e411978eaa06@redhat.com> (raw)
In-Reply-To: <59fc54b7-c276-2918-6741-804634337881@huaweicloud.com>

On 11/29/22 07:45, Hou Tao wrote:
> Hi,
>
> On 11/29/2022 2:06 PM, Tonghao Zhang wrote:
>> On Tue, Nov 29, 2022 at 12:32 PM Hou Tao <houtao1@huawei.com> wrote:
>>> Hi,
>>>
>>> On 11/29/2022 5:55 AM, Hao Luo wrote:
>>>> On Sun, Nov 27, 2022 at 7:15 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>> Hi Tonghao,
>>>>
>>>> With a quick look at the htab_lock_bucket() and your problem
>>>> statement, I agree with Hou Tao that using hash &
>>>> min(HASHTAB_MAP_LOCK_MASK, n_bucket - 1) to index in map_locked seems
>>>> to fix the potential deadlock. Can you actually send your changes as
>>>> v2 so we can take a look and better help you? Also, can you explain
>>>> your solution in your commit message? Right now, your commit message
>>>> has only a problem statement and is not very clear. Please include
>>>> more details on what you do to fix the issue.
>>>>
>>>> Hao
>>> It would be better if the test case below can be rewritten as a bpf selftests.
>>> Please see comments below on how to improve it and reproduce the deadlock.
>>>>> Hi
>>>>> only a warning from lockdep.
>>> Thanks for your details instruction.  I can reproduce the warning by using your
>>> setup. I am not a lockdep expert, it seems that fixing such warning needs to set
>>> different lockdep class to the different bucket. Because we use map_locked to
>>> protect the acquisition of bucket lock, so I think we can define  lock_class_key
>>> array in bpf_htab (e.g., lockdep_key[HASHTAB_MAP_LOCK_COUNT]) and initialize the
>>> bucket lock accordingly.
> The proposed lockdep solution doesn't work. Still got lockdep warning after
> that, so cc +locking expert +lkml.org for lockdep help.
>
> Hi lockdep experts,
>
> We are trying to fix the following lockdep warning from bpf subsystem:
>
> [   36.092222] ================================
> [   36.092230] WARNING: inconsistent lock state
> [   36.092234] 6.1.0-rc5+ #81 Tainted: G            E
> [   36.092236] --------------------------------
> [   36.092237] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> [   36.092238] perf/1515 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [   36.092242] ffff888341acd1a0 (&htab->lockdep_key){....}-{2:2}, at:
> htab_lock_bucket+0x4d/0x58
> [   36.092253] {INITIAL USE} state was registered at:
> [   36.092255]   mark_usage+0x1d/0x11d
> [   36.092262]   __lock_acquire+0x3c9/0x6ed
> [   36.092266]   lock_acquire+0x23d/0x29a
> [   36.092270]   _raw_spin_lock_irqsave+0x43/0x7f
> [   36.092274]   htab_lock_bucket+0x4d/0x58
> [   36.092276]   htab_map_delete_elem+0x82/0xfb
> [   36.092278]   map_delete_elem+0x156/0x1ac
> [   36.092282]   __sys_bpf+0x138/0xb71
> [   36.092285]   __do_sys_bpf+0xd/0x15
> [   36.092288]   do_syscall_64+0x6d/0x84
> [   36.092291]   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [   36.092295] irq event stamp: 120346
> [   36.092296] hardirqs last  enabled at (120345): [<ffffffff8180b97f>]
> _raw_spin_unlock_irq+0x24/0x39
> [   36.092299] hardirqs last disabled at (120346): [<ffffffff81169e85>]
> generic_exec_single+0x40/0xb9
> [   36.092303] softirqs last  enabled at (120268): [<ffffffff81c00347>]
> __do_softirq+0x347/0x387
> [   36.092307] softirqs last disabled at (120133): [<ffffffff810ba4f0>]
> __irq_exit_rcu+0x67/0xc6
> [   36.092311]
> [   36.092311] other info that might help us debug this:
> [   36.092312]  Possible unsafe locking scenario:
> [   36.092312]
> [   36.092313]        CPU0
> [   36.092313]        ----
> [   36.092314]   lock(&htab->lockdep_key);
> [   36.092315]   <Interrupt>
> [   36.092316]     lock(&htab->lockdep_key);
> [   36.092318]
> [   36.092318]  *** DEADLOCK ***
> [   36.092318]
> [   36.092318] 3 locks held by perf/1515:
> [   36.092320]  #0: ffff8881b9805cc0 (&cpuctx_mutex){+.+.}-{4:4}, at:
> perf_event_ctx_lock_nested+0x8e/0xba
> [   36.092327]  #1: ffff8881075ecc20 (&event->child_mutex){+.+.}-{4:4}, at:
> perf_event_for_each_child+0x35/0x76
> [   36.092332]  #2: ffff8881b9805c20 (&cpuctx_lock){-.-.}-{2:2}, at:
> perf_ctx_lock+0x12/0x27
> [   36.092339]
> [   36.092339] stack backtrace:
> [   36.092341] CPU: 0 PID: 1515 Comm: perf Tainted: G            E
> 6.1.0-rc5+ #81
> [   36.092344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [   36.092349] Call Trace:
> [   36.092351]  <NMI>
> [   36.092354]  dump_stack_lvl+0x57/0x81
> [   36.092359]  lock_acquire+0x1f4/0x29a
> [   36.092363]  ? handle_pmi_common+0x13f/0x1f0
> [   36.092366]  ? htab_lock_bucket+0x4d/0x58
> [   36.092371]  _raw_spin_lock_irqsave+0x43/0x7f
> [   36.092374]  ? htab_lock_bucket+0x4d/0x58
> [   36.092377]  htab_lock_bucket+0x4d/0x58
> [   36.092379]  htab_map_update_elem+0x11e/0x220
> [   36.092386]  bpf_prog_f3a535ca81a8128a_bpf_prog2+0x3e/0x42
> [   36.092392]  trace_call_bpf+0x177/0x215
> [   36.092398]  perf_trace_run_bpf_submit+0x52/0xaa
> [   36.092403]  ? x86_pmu_stop+0x97/0x97
> [   36.092407]  perf_trace_nmi_handler+0xb7/0xe0
> [   36.092415]  nmi_handle+0x116/0x254
> [   36.092418]  ? x86_pmu_stop+0x97/0x97
> [   36.092423]  default_do_nmi+0x3d/0xf6
> [   36.092428]  exc_nmi+0xa1/0x109
> [   36.092432]  end_repeat_nmi+0x16/0x67
> [   36.092436] RIP: 0010:wrmsrl+0xd/0x1b

So the lock is really taken in a NMI context. In general, we advise 
again using lock in a NMI context unless it is a lock that is used only 
in that context. Otherwise, deadlock is certainly a possibility as there 
is no way to mask off again NMI.

Cheers,
Longman


  reply	other threads:[~2022-11-29 16:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 10:05 [net-next] bpf: avoid the multi checking xiangxia.m.yue
2022-11-21 10:05 ` [net-next] bpf: avoid hashtab deadlock with try_lock xiangxia.m.yue
2022-11-21 20:19   ` Jakub Kicinski
2022-11-22  1:15   ` Hou Tao
2022-11-22  3:12     ` Tonghao Zhang
2022-11-22  4:01       ` Hou Tao
2022-11-22  4:06         ` Hou Tao
2022-11-24 12:57           ` Tonghao Zhang
2022-11-24 14:13             ` Hou Tao
2022-11-28  3:15               ` Tonghao Zhang
2022-11-28 21:55                 ` Hao Luo
2022-11-29  4:32                   ` Hou Tao
2022-11-29  6:06                     ` Tonghao Zhang
2022-11-29  7:56                       ` Hou Tao
2022-11-29 12:45                       ` Hou Tao
2022-11-29 16:06                         ` Waiman Long [this message]
2022-11-29 17:23                           ` Boqun Feng
2022-11-29 17:32                             ` Boqun Feng
2022-11-29 19:36                               ` Hao Luo
2022-11-29 21:13                                 ` Waiman Long
2022-11-30  1:50                                 ` Hou Tao
2022-11-30  2:47                                   ` Tonghao Zhang
2022-11-30  3:06                                     ` Waiman Long
2022-11-30  3:32                                       ` Tonghao Zhang
2022-11-30  4:07                                         ` Waiman Long
2022-11-30  4:13                                     ` Hou Tao
2022-11-30  5:02                                       ` Hao Luo
2022-11-30  5:56                                         ` Tonghao Zhang
2022-11-30  5:55                                       ` Tonghao Zhang
2022-12-01  2:53                                         ` Hou Tao
2022-11-30  1:37                             ` Hou Tao
2022-11-22 22:16 ` [net-next] bpf: avoid the multi checking Daniel Borkmann
  -- strict thread matches above, loose matches on Subject: below --
2022-11-23  0:06 [net-next] bpf: avoid hashtab deadlock with try_lock kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=541aa740-dcf3-35f5-9f9b-e411978eaa06@redhat.com \
    --to=longman@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=will@kernel.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.