From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81D68C83F17 for ; Mon, 14 Jul 2025 17:52:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 203DD8D000C; Mon, 14 Jul 2025 13:52:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DBC08D0001; Mon, 14 Jul 2025 13:52:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F1918D000C; Mon, 14 Jul 2025 13:52:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F03428D0001 for ; Mon, 14 Jul 2025 13:52:29 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9B7A41D89B7 for ; Mon, 14 Jul 2025 17:52:29 +0000 (UTC) X-FDA: 83663614818.14.5462085 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by imf10.hostedemail.com (Postfix) with ESMTP id 9A8FEC0005 for ; Mon, 14 Jul 2025 17:52:27 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F2O2MS4g; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752515547; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CrXOR3em4+gjn8PtaxMxqX8DlqyBrsVytx3gz0g+Udg=; b=R77gBrXdlAMIhIpjEojL9cOpwYdBMhU/8gQmFFpeLpHax0o1Bop+ZU3oTf07i9AyZiUWY5 V0lHcXCeWv+ZV5DHlqMOmhiokBKaJcmsqZTgYiMdJ8zDvC2oN3+cWasg2KbdTOHfdBIXxs R3cdaYguR1Tw2ugd4z36aKadPbXDZ30= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752515547; a=rsa-sha256; cv=none; b=2noxEiPnkv8PvbO4F/19Pp0e16s+dDBXckefbz0gAfA43OUAZk9bH7rXmWt5RLxp1Vy9SJ 0uEZZMguhXtw8YSFpKwFaOqhHXRpUZy6YRilH8Xp1+Ss3TTlMm7tbJSVUqL1y5+wDUEZ/f UawoLe/GPm+QUpbAu7wm2F3BON2mzLg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F2O2MS4g; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-3a54700a46eso2816407f8f.1 for ; Mon, 14 Jul 2025 10:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752515546; x=1753120346; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CrXOR3em4+gjn8PtaxMxqX8DlqyBrsVytx3gz0g+Udg=; b=F2O2MS4gY+b6E3tIwyanoTieKbr7FPzfC4PRuzq3ukEIazvI+JR79dT9UqbtVcFxs7 sdUWzDo9CaPAgzYZPwnePOLlcszPsYujsjkww2kJzgcf5Gc/78Zme9yTlihyZGGwwNBk f6A4F9qLNEWAZm7DBg4B5Q/952wtu/A1OdAzsNKXOdMdm2zofVgttvLEObPUmYYV6JaW dbJd3Ef0CuNO6vnlIuivGzhZePpNNkexrI+aaVdiFw65zutwD1Nqbj4t4Sfa+MBK6W0j civJglkjE7QkFaitrYFAeLVPcyk1oJr76FDdILVoR4At0lL7EY5HAJQYUPf0EbT1L8sL G8Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752515546; x=1753120346; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CrXOR3em4+gjn8PtaxMxqX8DlqyBrsVytx3gz0g+Udg=; b=BkW39c7hfZs3Wczwzb437IoDp22H9GV8ggjvTK02tVPB+dfE2gNL07m3L0GPRMomdO jM8pKw3cBRxWxQ1ohRW4baFwnDPHUqNvptpwHoAaMVDhWHGEtp/UUKmsC7LCrgCNHCJb t2DJfmURz6x3qnvXKGnmX8bd/fF1PGW/pBRu6304vmgn1ak9lOb/Pp3yKhvmDpKygdfW KJVqftO0xkZiGvPM4hR2Nt2bZSKHr84RfUpqKNiEkhtywyBjsx9/cGXaTOjS2cURrLye LwhQQ7weH0sr27Yyt7CGA0CArD+/DqAj5FgjpM9oYJ71QlzfeL0ocO2sG6BuJoUlVczf +0oA== X-Forwarded-Encrypted: i=1; AJvYcCXgJIhIjBdpuK2mEuO8UoFdH+Znox0Od3vosbXhVvHiR2KwTOvl14aC8HMv6vuBjaf8+WDBMEuYrg==@kvack.org X-Gm-Message-State: AOJu0Yx9aR001uE7sKmcy9TrGun8joCWOKXkEL6RGJu94aQz+/4s8vGh l2KDFwCAZV/waa6FR04qOwPp0IX+geZuIDVPehgNBnNDWjd9MRxW8blkPnN59JTvV+a5YU/Hdyt GaJsS3QSxIzz5Lw03dfa5DDM66OrA07w= X-Gm-Gg: ASbGncuVqgovTYadtl5tKVbhQ+OVyVNbMNYy/pgwDKS1EcgQGKF8AvNDq93uhukuFwX 5EKD+LllsAt4KWP2t/kF5x+ii1DhqkHzWZw/OBd/umAfKaMx2NQk/LdV7t1RZV9yVBBmfRZQDtt sOfJXpNpuEHkXqevir4LfiJjkirVoGp/qG5RTDR3gQOpMhAZZv4EJytiZKocKWlOFgE+WMv1tS2 Oxr+Wb+UwwdqDYY0VT4h/Y= X-Google-Smtp-Source: AGHT+IGX1x/rLh6T62gvVj2EWhzF8UHfmjEJsyqMGiPwxRo/2hggBPZEdkSV/4xL+6Tf1c2M/aWyB9xO7EGuKGSK52A= X-Received: by 2002:adf:e186:0:b0:3a5:2653:7308 with SMTP id ffacd0b85a97d-3b5f18debd5mr13480127f8f.57.1752515545708; Mon, 14 Jul 2025 10:52:25 -0700 (PDT) MIME-Version: 1.0 References: <20250709015303.8107-1-alexei.starovoitov@gmail.com> <20250709015303.8107-4-alexei.starovoitov@gmail.com> <20250711075001.fnlMZfk6@linutronix.de> <1adbee35-6131-49de-835b-2c93aacfdd1e@suse.cz> <20250711151730.rz_TY1Qq@linutronix.de> <20250714110639.uOaKJEfL@linutronix.de> In-Reply-To: <20250714110639.uOaKJEfL@linutronix.de> From: Alexei Starovoitov Date: Mon, 14 Jul 2025 10:52:10 -0700 X-Gm-Features: Ac12FXzoWo31JpXtCiKZcrl4QbhAUBZ3qtzSYF9FUHB5ZK9i7dwpYq3d49cxqVQ Message-ID: Subject: Re: [PATCH v2 3/6] locking/local_lock: Introduce local_lock_lockdep_start/end() To: Sebastian Andrzej Siewior Cc: Vlastimil Babka , bpf , linux-mm , Harry Yoo , Shakeel Butt , Michal Hocko , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Johannes Weiner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9A8FEC0005 X-Stat-Signature: 6d3it4z8bhojc83ytyrxb9t59bn7nxeh X-Rspam-User: X-HE-Tag: 1752515547-296042 X-HE-Meta: U2FsdGVkX18Tog6VuezDYzpHXOtIuSW9uywZnrPj3TtAON75mIIj2P6i+W4X4Ni70wGOjInyC2D2XsfjBmUo59Rr8RqAYwnkZUn3MpnKeGn+pmv0KgWhlOnf2R1ODdY1NemLz+2ZZjtS0c3MxLUcC4HB8EUOKpmswL4ePk8o+YkFRpCWWCHbdzlZgOXdb6NQBw8FBefjYfEUPhB0D7oGbQ5L0CZNzxqN68BSiiVIIjXS3e/79BxqX8Jjl3ljq250DJBSY7X19PqJz6B37l9ytqFlpe2AhRi45l+9ziiBiAZE317ngGa+bLn57O/30KjtCMv/JNcnXjMfgwZQvP6xZLCIZ3Cwru775Cy55q1jeDMzf+uQ6ih3Ve0vI4deJrS0wksTNZvcgbvhL7UF6foGiqWMgEnKNV0wEc5L3YnYL7djPzLsOInrdnHk4mWgDrDmVVF3R2aB6Svq3lOnMkoO+Oo8BJWFakAD906TKuX8xCt8y6Paw0+axD9ppRvbotwZulCjHEq/Rk/x0zjhaUblgr1xap+xG5OEVR1YEEoy5dgSGtyqJ4EH3rH3JOi+K63QLASb/IWGqPLwXBgc1a5DeJUHdZxx+5lJIvXCbKrsSVOrMw8hOpCHTA/FaISeyEsRVhmvutlFI9qc8cPkkYSdndsO1bWreaQCBDwOVlxiYRHAfIu1f/rhexAwddzhDYgVK1r/ul9DynzIHI9s7I0r5RWxoImdV6cpxuZyOxgYO8KlOwfaoHgj3a5ruje4uw121stG2GRgNTFmCaiItolEofDU0y9KOEMBzlsJEvw8JZEBT5yZiQ52Tkp+IDNgxtdgBFxZUdfU7pSayFntHUz8FLwjGRRnVDt+QIXLwAbaV+8q87a8B8fdOWGf+nP2PqUTdw1+Vyy/FrFFewcTxQA4XZmq8xmJrK7QPw7hkpDQ7NI7k0fBRmLPcEWkT/z8JL1rjOP5z25BKbdXKG7zpGa Vjd/hdHA tRCRE9CYebFDzEzPya6KG2FLqFwC7saj1VzeapnDlWe5zElz493Jep7e33EnSmSyXeUPBAW0e1u/PCMpyDnT08GAewNGWtXRV3OOX/Nk7UkLwhos6wWJhwZKgcUedSwPsiIIya+2b/MZAvfCA09Eu3mas5jL0YzurBrRZcfOwfh2HOBz3uGLNEwvzocxIEArOfoRG3eqRMB1Pu0QAo5mu2HbvYaUBibybYwCdatxQYfAAj7/XlLPhM7Slu+oq1KkDd3Nm/7ONctJE3p8lDoDirZxVyNoI7i58tXZWI8eoIHzsDOnzC+DDH7SqKedp6AFVGSgehvon5WPWnwobXmG7OCTPhL902ZG+OVr5lkUr34IVKyl3DKEhC06V10ggy9xtQRGdUKEDlpX6eW3I/5/jD3yBvmXdN1UpKev5fUNPQD3sF9z3+KD9uw1C4MuO+I/npcxCkeh5lmCukf8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 14, 2025 at 4:06=E2=80=AFAM Sebastian Andrzej Siewior wrote: > > On 2025-07-11 19:19:26 [-0700], Alexei Starovoitov wrote: > > > If there is no parent check then we could do "normal lock" on both > > > sides. > > > > How would ___slab_alloc() know whether there was a parent check or not? > > > > imo keeping local_lock_irqsave() as-is is cleaner, > > since if there is no parent check lockdep will rightfully complain. > > what about this: > > diff --git a/mm/slub.c b/mm/slub.c > index 7e2ffe1d46c6c..3520d1c25c205 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3693,6 +3693,34 @@ static inline void *freeze_slab(struct kmem_cache = *s, struct slab *slab) > return freelist; > } > > +static void local_lock_cpu_slab(struct kmem_cache *s, const gfp_t gfp_fl= ags, > + unsigned long *flags) > +{ > + bool allow_spin =3D gfpflags_allow_spinning(gfp_flags); > + > + /* > + * ___slab_alloc()'s caller is supposed to check if kmem_cache::k= mem_cache_cpu::lock > + * can be acquired without a deadlock before invoking the functio= n. > + * > + * On PREEMPT_RT an invocation is not possible from IRQ-off or pr= eempt > + * disabled context. The lock will always be acquired and if need= ed it > + * block and sleep until the lock is available. > + * > + * On !PREEMPT_RT allocations from any context but NMI are safe. = The lock > + * is always acquired with disabled interrupts meaning it is alwa= ys > + * possible to it. > + * In NMI context it is needed to check if the lock is acquired. = If it is not, > + * it is safe to acquire it. The trylock semantic is used to tell= lockdep > + * that we don't spin. The BUG_ON() will not trigger if it is saf= e to acquire > + * the lock. > + * > + */ > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && !allow_spin) > + BUG_ON(!local_trylock_irqsave(&s->cpu_slab->lock, *flags)= ); > + else > + local_lock_irqsave(&s->cpu_slab->lock, *flags); > +} the patch misses these two: diff --git a/mm/slub.c b/mm/slub.c index 36779519b02c..2f30b85fbf68 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3260,7 +3260,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain) unsigned long flags; int slabs =3D 0; - local_lock_irqsave(&s->cpu_slab->lock, flags); + local_lock_cpu_slab(s, 0, &flags); oldslab =3D this_cpu_read(s->cpu_slab->partial); @@ -4889,8 +4889,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s, goto redo; } } else { + long flags; /* Update the free list under the local lock */ - local_lock(&s->cpu_slab->lock); + local_lock_cpu_slab(s, 0, &flags); c =3D this_cpu_ptr(s->cpu_slab); if (unlikely(slab !=3D c->slab)) { local_unlock(&s->cpu_slab->lock); I realized that the latter one was missing local_lock_lockdep_start/end() in my patch as well, but that's secondary. So with above it works on !RT, but on RT lockdep complains as I explained earlier. With yours and above hunks applied here is full lockdep splat: [ 39.819636] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ 39.819638] WARNING: possible recursive locking detected [ 39.819641] 6.16.0-rc5-00342-gc8aca7837440-dirty #54 Tainted: G = O [ 39.819645] -------------------------------------------- [ 39.819646] page_alloc_kthr/2306 is trying to acquire lock: [ 39.819650] ff110001f5cbea88 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc+0xb7/0xec0 [ 39.819667] [ 39.819667] but task is already holding lock: [ 39.819668] ff110001f5cbfe88 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc+0xb7/0xec0 [ 39.819677] [ 39.819677] other info that might help us debug this: [ 39.819678] Possible unsafe locking scenario: [ 39.819678] [ 39.819679] CPU0 [ 39.819680] ---- [ 39.819681] lock((&c->lock)); [ 39.819684] lock((&c->lock)); [ 39.819687] [ 39.819687] *** DEADLOCK *** [ 39.819687] [ 39.819687] May be due to missing lock nesting notation [ 39.819687] [ 39.819689] 2 locks held by page_alloc_kthr/2306: [ 39.819691] #0: ff110001f5cbfe88 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc+0xb7/0xec0 [ 39.819700] #1: ffffffff8588f3a0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x197/0x250 [ 39.819710] [ 39.819710] stack backtrace: [ 39.819714] CPU: 1 UID: 0 PID: 2306 Comm: page_alloc_kthr Tainted: G O 6.16.0-rc5-00342-gc8aca7837440-dirty #54 PREEMPT_RT [ 39.819721] Tainted: [O]=3DOOT_MODULE [ 39.819723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 39.819726] Call Trace: [ 39.819729] [ 39.819734] dump_stack_lvl+0x5b/0x80 [ 39.819740] print_deadlock_bug.cold+0xbd/0xca [ 39.819747] __lock_acquire+0x12ad/0x2590 [ 39.819753] ? __lock_acquire+0x42b/0x2590 [ 39.819758] lock_acquire+0x133/0x2d0 [ 39.819763] ? ___slab_alloc+0xb7/0xec0 [ 39.819769] ? try_to_take_rt_mutex+0x624/0xfc0 [ 39.819773] ? __lock_acquire+0x42b/0x2590 [ 39.819778] rt_spin_lock+0x6f/0x250 [ 39.819783] ? ___slab_alloc+0xb7/0xec0 [ 39.819788] ? rtlock_slowlock_locked+0x5c60/0x5c60 [ 39.819792] ? rtlock_slowlock_locked+0xc3/0x5c60 [ 39.819798] ___slab_alloc+0xb7/0xec0 [ 39.819803] ? __lock_acquire+0x42b/0x2590 [ 39.819809] ? my_debug_callback+0x20e/0x390 [bpf_testmod] [ 39.819826] ? __lock_acquire+0x42b/0x2590 [ 39.819830] ? rt_read_unlock+0x2f0/0x2f0 [ 39.819835] ? my_debug_callback+0x20e/0x390 [bpf_testmod] [ 39.819844] ? kmalloc_nolock_noprof+0x15a/0x430 [ 39.819849] kmalloc_nolock_noprof+0x15a/0x430 [ 39.819857] my_debug_callback+0x20e/0x390 [bpf_testmod] [ 39.819867] ? page_alloc_kthread+0x320/0x320 [bpf_testmod] [ 39.819875] ? lock_is_held_type+0x85/0xe0 [ 39.819881] ___slab_alloc+0x256/0xec0 [ 39.819898] ? lock_acquire+0x133/0x2d0 [ 39.819927] ? __kmalloc_cache_noprof+0xd6/0x3b0 [ 39.819932] __kmalloc_cache_noprof+0xd6/0x3b0 As I said earlier lockdep _has_ to be tricked. We cannot unconditionally call local_lock_irqsave() on RT. lockdep doesn't understand per-cpu local_lock. And it doesn't understand this "if !locked_by_current_task -> go and lock" concept. lockdep has to be taught about safe lock region (call it tricking lockdep, but it has to be an external signal to lockdep).