From: Andrea Righi <arighi@nvidia.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Yury Norov <yury.norov@gmail.com>, Tejun Heo <tj@kernel.org>,
David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Ian May <ianm@nvidia.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/6] sched_ext: idle: Per-node idle cpumasks
Date: Tue, 11 Feb 2025 19:05:16 +0100 [thread overview]
Message-ID: <Z6uRXG7mGMfH6fAV@gpd3> (raw)
In-Reply-To: <20250211113827.302fd066@gandalf.local.home>
On Tue, Feb 11, 2025 at 11:38:27AM -0500, Steven Rostedt wrote:
> On Tue, 11 Feb 2025 15:45:15 +0100
> Andrea Righi <arighi@nvidia.com> wrote:
>
> > ...which is basically this (with GFP_ATOMIC):
> >
> > [ 11.829079] =============================
> > [ 11.829109] [ BUG: Invalid wait context ]
> > [ 11.829146] 6.13.0-virtme #51 Not tainted
> > [ 11.829185] -----------------------------
> > [ 11.829243] fish/344 is trying to lock:
> > [ 11.829285] ffff9659bec450b0 (&c->lock){..-.}-{3:3}, at: ___slab_alloc+0x66/0x1510
> > [ 11.829380] other info that might help us debug this:
> > [ 11.829450] context-{5:5}
> > [ 11.829494] 8 locks held by fish/344:
> > [ 11.829534] #0: ffff965a409c70a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x28/0x60
> > [ 11.829643] #1: ffff965a409c7130 (&tty->atomic_write_lock){+.+.}-{4:4}, at: file_tty_write.isra.0+0xa1/0x330
> > [ 11.829765] #2: ffff965a409c72e8 (&tty->termios_rwsem/1){++++}-{4:4}, at: n_tty_write+0x9e/0x510
> > [ 11.829871] #3: ffffbc6d01433380 (&ldata->output_lock){+.+.}-{4:4}, at: n_tty_write+0x1f1/0x510
> > [ 11.829979] #4: ffffffffb556b5c0 (rcu_read_lock){....}-{1:3}, at: __queue_work+0x59/0x680
> > [ 11.830173] #5: ffff9659800f0018 (&pool->lock){-.-.}-{2:2}, at: __queue_work+0xd7/0x680
> > [ 11.830286] #6: ffff9659801bcf60 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0x56/0x920
> > [ 11.830396] #7: ffffffffb556b5c0 (rcu_read_lock){....}-{1:3}, at: scx_select_cpu_dfl+0x56/0x460
> >
> > And I think that's because:
> >
> > * %GFP_ATOMIC users can not sleep and need the allocation to succeed. A lower
> > * watermark is applied to allow access to "atomic reserves".
> > * The current implementation doesn't support NMI and few other strict
> > * non-preemptive contexts (e.g. raw_spin_lock). The same applies to %GFP_NOWAIT.
> >
> > So I guess we the only viable option is to preallocate nodemask_t and
> > protect it somehow, hoping that it doesn't add too much overhead...
>
> I believe it's because you have p->pi_lock which is a raw_spin_lock() and
> you are trying to take a lock in ___slab_alloc() which I bet is a normal
> spin_lock(). In PREEMPT_RT() that turns into a mutex, and you can not take
> a spin_lock while holding a raw_spin_lock.
Exactly that, thanks Steve. I'll run some tests using per-cpu nodemask_t,
given that most of the times this is called with p->pi_lock held, it should
be safe and we shouldn't introduce any overhead.
-Andrea
next prev parent reply other threads:[~2025-02-11 18:05 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 20:40 [PATCHSET v10 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-07 20:40 ` [PATCH 1/6] mm/numa: Introduce numa_nearest_nodemask() Andrea Righi
2025-02-09 17:40 ` Yury Norov
2025-02-10 8:28 ` Andrea Righi
2025-02-10 16:41 ` Yury Norov
2025-02-10 16:51 ` Andrea Righi
2025-02-07 20:40 ` [PATCH 2/6] sched/topology: Introduce for_each_numa_node() iterator Andrea Righi
2025-02-07 21:46 ` Tejun Heo
2025-02-07 21:55 ` Andrea Righi
2025-02-07 21:56 ` Tejun Heo
2025-02-09 17:51 ` Yury Norov
2025-02-09 17:50 ` Yury Norov
2025-02-07 20:40 ` [PATCH 3/6] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-07 20:40 ` [PATCH 4/6] sched_ext: idle: introduce SCX_PICK_IDLE_IN_NODE Andrea Righi
2025-02-07 22:02 ` Tejun Heo
2025-02-07 20:40 ` [PATCH 5/6] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-07 22:30 ` Tejun Heo
2025-02-08 8:47 ` Andrea Righi
2025-02-09 18:07 ` Yury Norov
2025-02-10 16:57 ` Yury Norov
2025-02-11 7:32 ` Andrea Righi
2025-02-11 7:41 ` Andrea Righi
2025-02-11 9:50 ` Andrea Righi
2025-02-11 14:19 ` Yury Norov
2025-02-11 14:34 ` Andrea Righi
2025-02-11 14:45 ` Andrea Righi
2025-02-11 16:38 ` Steven Rostedt
2025-02-11 18:05 ` Andrea Righi [this message]
2025-02-07 20:40 ` [PATCH 6/6] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi
2025-02-07 22:39 ` Tejun Heo
2025-02-08 9:19 ` Andrea Righi
2025-02-09 6:31 ` Tejun Heo
2025-02-09 8:11 ` Andrea Righi
2025-02-10 6:01 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6uRXG7mGMfH6fAV@gpd3 \
--to=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=ianm@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.