* [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin [not found] <20260206171348.35886-1-harry.yoo@oracle.com> @ 2026-02-06 17:13 ` Harry Yoo 2026-02-06 18:10 ` Vlastimil Babka 0 siblings, 1 reply; 5+ messages in thread From: Harry Yoo @ 2026-02-06 17:13 UTC (permalink / raw) To: Vlastimil Babka, Andrew Morton Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Alexei Starovoitov, Hao Li, linux-mm, stable Lockdep complains when get_from_any_partial() is called in an NMI context, because current->mems_allowed_seq is seqcount_spinlock_t and not NMI-safe: ================================ WARNING: inconsistent lock state 6.19.0-rc5-kfree-rcu+ #315 Tainted: G N -------------------------------- inconsistent {INITIAL USE} -> {IN-NMI} usage. kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00 {INITIAL USE} state was registered at: lock_acquire+0x185/0x320 kernel_init_freeable+0x391/0x1150 kernel_init+0x1f/0x220 ret_from_fork+0x736/0x8f0 ret_from_fork_asm+0x1a/0x30 irq event stamp: 56 hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70 hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630 softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10 softirqs last disabled at (0): [<0000000000000000>] 0x0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&____s->seqcount#3); <Interrupt> lock(&____s->seqcount#3); *** DEADLOCK *** According to Documentation/locking/seqlock.rst, seqcount_t is not NMI-safe and seqcount_latch_t should be used when read path can interrupt the write-side critical section. In this case, return NULL and fall back to slab allocation if !allow_spin. Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Cc: stable@vger.kernel.org Signed-off-by: Harry Yoo <harry.yoo@oracle.com> --- mm/slub.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 102fb47ae013..d46464654c15 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3789,6 +3789,14 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * enum zone_type highest_zoneidx = gfp_zone(pc->flags); unsigned int cpuset_mems_cookie; + /* + * read_mems_allow_begin() accesses current->mems_allowed_seq, + * a seqcount_spinlock_t that is not NMI-safe. Skip allocation + * when GFP flags indicate spinning is not allowed. + */ + if (!gfpflags_allow_spinning(pc->flags)) + return NULL; + /* * The defrag ratio allows a configuration of the tradeoffs between * inter node defragmentation and node local allocations. A lower -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin 2026-02-06 17:13 ` [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin Harry Yoo @ 2026-02-06 18:10 ` Vlastimil Babka 2026-02-06 19:19 ` Alexei Starovoitov 0 siblings, 1 reply; 5+ messages in thread From: Vlastimil Babka @ 2026-02-06 18:10 UTC (permalink / raw) To: Harry Yoo, Andrew Morton Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li, linux-mm, stable On 2/6/26 18:13, Harry Yoo wrote: > Lockdep complains when get_from_any_partial() is called in an NMI > context, because current->mems_allowed_seq is seqcount_spinlock_t and > not NMI-safe: > > ================================ > WARNING: inconsistent lock state > 6.19.0-rc5-kfree-rcu+ #315 Tainted: G N > -------------------------------- > inconsistent {INITIAL USE} -> {IN-NMI} usage. > kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes: > ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00 > {INITIAL USE} state was registered at: > lock_acquire+0x185/0x320 > kernel_init_freeable+0x391/0x1150 > kernel_init+0x1f/0x220 > ret_from_fork+0x736/0x8f0 > ret_from_fork_asm+0x1a/0x30 > irq event stamp: 56 > hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70 > hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630 > softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&____s->seqcount#3); > <Interrupt> > lock(&____s->seqcount#3); > > *** DEADLOCK *** > > According to Documentation/locking/seqlock.rst, seqcount_t is not > NMI-safe and seqcount_latch_t should be used when read path can interrupt > the write-side critical section. In this case, return NULL and fall back > to slab allocation if !allow_spin. > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > Cc: stable@vger.kernel.org > Signed-off-by: Harry Yoo <harry.yoo@oracle.com> > --- > mm/slub.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/slub.c b/mm/slub.c > index 102fb47ae013..d46464654c15 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3789,6 +3789,14 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > enum zone_type highest_zoneidx = gfp_zone(pc->flags); > unsigned int cpuset_mems_cookie; > > + /* > + * read_mems_allow_begin() accesses current->mems_allowed_seq, > + * a seqcount_spinlock_t that is not NMI-safe. Skip allocation > + * when GFP flags indicate spinning is not allowed. > + */ > + if (!gfpflags_allow_spinning(pc->flags)) > + return NULL; I think it would be less restrictive to just continue, but skip the read_mems_allowed_retry() part in the do-while loop, so just make it one iteration for !allow_spin. If lockdep doesn't like even the read_mems_allowed_begin() (not clear to me), skip it too? > + > /* > * The defrag ratio allows a configuration of the tradeoffs between > * inter node defragmentation and node local allocations. A lower ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin 2026-02-06 18:10 ` Vlastimil Babka @ 2026-02-06 19:19 ` Alexei Starovoitov 2026-02-09 3:18 ` Harry Yoo 0 siblings, 1 reply; 5+ messages in thread From: Alexei Starovoitov @ 2026-02-06 19:19 UTC (permalink / raw) To: Vlastimil Babka Cc: Harry Yoo, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li, linux-mm, stable On Fri, Feb 6, 2026 at 10:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 2/6/26 18:13, Harry Yoo wrote: > > Lockdep complains when get_from_any_partial() is called in an NMI > > context, because current->mems_allowed_seq is seqcount_spinlock_t and > > not NMI-safe: > > > > ================================ > > WARNING: inconsistent lock state > > 6.19.0-rc5-kfree-rcu+ #315 Tainted: G N > > -------------------------------- > > inconsistent {INITIAL USE} -> {IN-NMI} usage. > > kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes: > > ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00 > > {INITIAL USE} state was registered at: > > lock_acquire+0x185/0x320 > > kernel_init_freeable+0x391/0x1150 > > kernel_init+0x1f/0x220 > > ret_from_fork+0x736/0x8f0 > > ret_from_fork_asm+0x1a/0x30 > > irq event stamp: 56 > > hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70 > > hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630 > > softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10 > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(&____s->seqcount#3); > > <Interrupt> > > lock(&____s->seqcount#3); > > > > *** DEADLOCK *** > > > > According to Documentation/locking/seqlock.rst, seqcount_t is not > > NMI-safe and seqcount_latch_t should be used when read path can interrupt > > the write-side critical section. In this case, return NULL and fall back > > to slab allocation if !allow_spin. > > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > > Cc: stable@vger.kernel.org > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com> > > --- > > mm/slub.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 102fb47ae013..d46464654c15 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3789,6 +3789,14 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > > enum zone_type highest_zoneidx = gfp_zone(pc->flags); > > unsigned int cpuset_mems_cookie; > > > > + /* > > + * read_mems_allow_begin() accesses current->mems_allowed_seq, > > + * a seqcount_spinlock_t that is not NMI-safe. Skip allocation > > + * when GFP flags indicate spinning is not allowed. > > + */ > > + if (!gfpflags_allow_spinning(pc->flags)) > > + return NULL; > > I think it would be less restrictive to just continue, but skip the > read_mems_allowed_retry() part in the do-while loop, so just make it one > iteration for !allow_spin. If lockdep doesn't like even the > read_mems_allowed_begin() (not clear to me), skip it too? +1 Just unconditional return NULL seems too restrictive. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin 2026-02-06 19:19 ` Alexei Starovoitov @ 2026-02-09 3:18 ` Harry Yoo 2026-02-09 19:03 ` Vlastimil Babka 0 siblings, 1 reply; 5+ messages in thread From: Harry Yoo @ 2026-02-09 3:18 UTC (permalink / raw) To: Alexei Starovoitov Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li, linux-mm, stable On Fri, Feb 06, 2026 at 11:19:01AM -0800, Alexei Starovoitov wrote: > On Fri, Feb 6, 2026 at 10:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > On 2/6/26 18:13, Harry Yoo wrote: > > > Lockdep complains when get_from_any_partial() is called in an NMI > > > context, because current->mems_allowed_seq is seqcount_spinlock_t and > > > not NMI-safe: > > > > > > ================================ > > > WARNING: inconsistent lock state > > > 6.19.0-rc5-kfree-rcu+ #315 Tainted: G N > > > -------------------------------- > > > inconsistent {INITIAL USE} -> {IN-NMI} usage. > > > kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes: > > > ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00 > > > {INITIAL USE} state was registered at: > > > lock_acquire+0x185/0x320 > > > kernel_init_freeable+0x391/0x1150 > > > kernel_init+0x1f/0x220 > > > ret_from_fork+0x736/0x8f0 > > > ret_from_fork_asm+0x1a/0x30 > > > irq event stamp: 56 > > > hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70 > > > hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630 > > > softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10 > > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > > > > > other info that might help us debug this: > > > Possible unsafe locking scenario: > > > > > > CPU0 > > > ---- > > > lock(&____s->seqcount#3); > > > <Interrupt> > > > lock(&____s->seqcount#3); > > > > > > *** DEADLOCK *** > > > > > > According to Documentation/locking/seqlock.rst, seqcount_t is not > > > NMI-safe and seqcount_latch_t should be used when read path can interrupt > > > the write-side critical section. In this case, return NULL and fall back > > > to slab allocation if !allow_spin. > > > > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com> > > > --- > > > mm/slub.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index 102fb47ae013..d46464654c15 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -3789,6 +3789,14 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > > > enum zone_type highest_zoneidx = gfp_zone(pc->flags); > > > unsigned int cpuset_mems_cookie; > > > > > > + /* > > > + * read_mems_allow_begin() accesses current->mems_allowed_seq, > > > + * a seqcount_spinlock_t that is not NMI-safe. Skip allocation > > > + * when GFP flags indicate spinning is not allowed. > > > + */ > > > + if (!gfpflags_allow_spinning(pc->flags)) > > > + return NULL; > > > > I think it would be less restrictive to just continue, Ack. > > but skip the > > read_mems_allowed_retry() part in the do-while loop, so just make it one > > iteration for !allow_spin. Makes sense. > > If lockdep doesn't like even the > > read_mems_allowed_begin() (not clear to me), skip it too? Yes, lockdep doesn't like read_mems_allowed_begin(), and thus we should skip both. > > +1 > Just unconditional return NULL seems too restrictive. Ack. I'll do something like this: diff --git a/mm/slub.c b/mm/slub.c index 102fb47ae013..cc686ab929fe 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3788,6 +3788,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * struct zone *zone; enum zone_type highest_zoneidx = gfp_zone(pc->flags); unsigned int cpuset_mems_cookie; + bool allow_spin = gfpflags_allow_spinning(pc->flags); /* * The defrag ratio allows a configuration of the tradeoffs between @@ -3812,7 +3813,15 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * return NULL; do { - cpuset_mems_cookie = read_mems_allowed_begin(); + /* + * read_mems_allow_begin() accesses current->mems_allowed_seq, + * a seqcount_spinlock_t that is not NMI-safe. Do not access + * current->mems_allowed_seq and avoid retry when GFP flags + * indicate spinning is not allowed. + */ + if (allow_spin) + cpuset_mems_cookie = read_mems_allowed_begin(); + zonelist = node_zonelist(mempolicy_slab_node(), pc->flags); for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) { struct kmem_cache_node *n; @@ -3836,7 +3845,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * } } } - } while (read_mems_allowed_retry(cpuset_mems_cookie)); + } while (allow_spin && read_mems_allowed_retry(cpuset_mems_cookie)); #endif /* CONFIG_NUMA */ return NULL; } -- Cheers, Harry / Hyeonggon ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin 2026-02-09 3:18 ` Harry Yoo @ 2026-02-09 19:03 ` Vlastimil Babka 0 siblings, 0 replies; 5+ messages in thread From: Vlastimil Babka @ 2026-02-09 19:03 UTC (permalink / raw) To: Harry Yoo, Alexei Starovoitov Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li, linux-mm, stable On 2/9/26 04:18, Harry Yoo wrote: > On Fri, Feb 06, 2026 at 11:19:01AM -0800, Alexei Starovoitov wrote: >> On Fri, Feb 6, 2026 at 10:10 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> > >> > On 2/6/26 18:13, Harry Yoo wrote: >> > > Lockdep complains when get_from_any_partial() is called in an NMI >> > > context, because current->mems_allowed_seq is seqcount_spinlock_t and >> > > not NMI-safe: >> > > >> > > ================================ >> > > WARNING: inconsistent lock state >> > > 6.19.0-rc5-kfree-rcu+ #315 Tainted: G N >> > > -------------------------------- >> > > inconsistent {INITIAL USE} -> {IN-NMI} usage. >> > > kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes: >> > > ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00 >> > > {INITIAL USE} state was registered at: >> > > lock_acquire+0x185/0x320 >> > > kernel_init_freeable+0x391/0x1150 >> > > kernel_init+0x1f/0x220 >> > > ret_from_fork+0x736/0x8f0 >> > > ret_from_fork_asm+0x1a/0x30 >> > > irq event stamp: 56 >> > > hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70 >> > > hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630 >> > > softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10 >> > > softirqs last disabled at (0): [<0000000000000000>] 0x0 >> > > >> > > other info that might help us debug this: >> > > Possible unsafe locking scenario: >> > > >> > > CPU0 >> > > ---- >> > > lock(&____s->seqcount#3); >> > > <Interrupt> >> > > lock(&____s->seqcount#3); >> > > >> > > *** DEADLOCK *** >> > > >> > > According to Documentation/locking/seqlock.rst, seqcount_t is not >> > > NMI-safe and seqcount_latch_t should be used when read path can interrupt >> > > the write-side critical section. In this case, return NULL and fall back >> > > to slab allocation if !allow_spin. >> > > >> > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") >> > > Cc: stable@vger.kernel.org >> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com> >> > > --- >> > > mm/slub.c | 8 ++++++++ >> > > 1 file changed, 8 insertions(+) >> > > >> > > diff --git a/mm/slub.c b/mm/slub.c >> > > index 102fb47ae013..d46464654c15 100644 >> > > --- a/mm/slub.c >> > > +++ b/mm/slub.c >> > > @@ -3789,6 +3789,14 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * >> > > enum zone_type highest_zoneidx = gfp_zone(pc->flags); >> > > unsigned int cpuset_mems_cookie; >> > > >> > > + /* >> > > + * read_mems_allow_begin() accesses current->mems_allowed_seq, >> > > + * a seqcount_spinlock_t that is not NMI-safe. Skip allocation >> > > + * when GFP flags indicate spinning is not allowed. >> > > + */ >> > > + if (!gfpflags_allow_spinning(pc->flags)) >> > > + return NULL; >> > >> > I think it would be less restrictive to just continue, > > Ack. > >> > but skip the >> > read_mems_allowed_retry() part in the do-while loop, so just make it one >> > iteration for !allow_spin. > > Makes sense. > >> > If lockdep doesn't like even the >> > read_mems_allowed_begin() (not clear to me), skip it too? > > Yes, lockdep doesn't like read_mems_allowed_begin(), and thus > we should skip both. > >> >> +1 >> Just unconditional return NULL seems too restrictive. > > Ack. > > I'll do something like this: Looks good! > > diff --git a/mm/slub.c b/mm/slub.c > index 102fb47ae013..cc686ab929fe 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3788,6 +3788,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > struct zone *zone; > enum zone_type highest_zoneidx = gfp_zone(pc->flags); > unsigned int cpuset_mems_cookie; > + bool allow_spin = gfpflags_allow_spinning(pc->flags); > > /* > * The defrag ratio allows a configuration of the tradeoffs between > @@ -3812,7 +3813,15 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > return NULL; > > do { > - cpuset_mems_cookie = read_mems_allowed_begin(); > + /* > + * read_mems_allow_begin() accesses current->mems_allowed_seq, > + * a seqcount_spinlock_t that is not NMI-safe. Do not access > + * current->mems_allowed_seq and avoid retry when GFP flags > + * indicate spinning is not allowed. > + */ > + if (allow_spin) > + cpuset_mems_cookie = read_mems_allowed_begin(); > + > zonelist = node_zonelist(mempolicy_slab_node(), pc->flags); > for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) { > struct kmem_cache_node *n; > @@ -3836,7 +3845,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context * > } > } > } > - } while (read_mems_allowed_retry(cpuset_mems_cookie)); > + } while (allow_spin && read_mems_allowed_retry(cpuset_mems_cookie)); > #endif /* CONFIG_NUMA */ > return NULL; > } > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-09 19:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260206171348.35886-1-harry.yoo@oracle.com>
2026-02-06 17:13 ` [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin Harry Yoo
2026-02-06 18:10 ` Vlastimil Babka
2026-02-06 19:19 ` Alexei Starovoitov
2026-02-09 3:18 ` Harry Yoo
2026-02-09 19:03 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox