From: Steven Rostedt <rostedt@goodmis.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yongliang Gao <leonylgao@gmail.com>,
mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
frankjpliu@tencent.com, Yongliang Gao <leonylgao@tencent.com>,
Huang Cun <cunhuang@tencent.com>
Subject: Re: [PATCH v3] trace/pid_list: optimize pid_list->lock contention
Date: Thu, 13 Nov 2025 10:35:12 -0500 [thread overview]
Message-ID: <20251113103512.18e7bb03@gandalf.local.home> (raw)
In-Reply-To: <20251113073420.yko6jYcI@linutronix.de>
On Thu, 13 Nov 2025 08:34:20 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > + do {
> > + seq = read_seqcount_begin(&pid_list->seqcount);
> > + ret = false;
> > + upper_chunk = pid_list->upper[upper1];
> > + if (upper_chunk) {
> > + lower_chunk = upper_chunk->data[upper2];
> > + if (lower_chunk)
> > + ret = test_bit(lower, lower_chunk->data);
> > + }
> > + } while (read_seqcount_retry(&pid_list->seqcount, seq));
>
> How is this better? Any numbers?
> If the write side is busy and the lock is handed over from one CPU to
> another then it is possible that the reader spins here and does several
> loops, right?
I think the chances of that is very slim. The writes are at fork and exit
and manually writing to one of the set_*_pid files.
The readers are at every sched_switch. Currently we just use
raw_spin_locks. But that forces a serialization of every sched_switch!
Which on big machines could cause a huge latency.
This approach allows multiple sched_switches to happen at the same time.
> And in this case, how accurate would it be? I mean the result could
> change right after the sequence here is completed because the write side
> got active again. How bad would it be if there would be no locking and
> RCU ensures that the chunks (and data) don't disappear while looking at
> it?
As I mentioned the use case for this, it is very accurate. That's because
the writers are updating the pid bits for themselves. If you are checking
for pid 123, that means task 123 is about to run. If bit 123 is being added
or removed, it would only be done by task 123 or its parent.
The exception to this rule is if a user manually adds or removes a pid from
the set_*_pid file. But that has other races that we don't really care
about. It's known that the update made there may take some milliseconds to
update.
-- Steve
prev parent reply other threads:[~2025-11-13 15:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 0:02 [PATCH v3] trace/pid_list: optimize pid_list->lock contention Yongliang Gao
2025-11-13 0:21 ` Steven Rostedt
2025-11-13 7:34 ` Sebastian Andrzej Siewior
2025-11-13 11:13 ` Yongliang Gao
2025-11-13 14:15 ` Sebastian Andrzej Siewior
2025-11-13 15:05 ` Steven Rostedt
2025-11-13 15:17 ` Sebastian Andrzej Siewior
2025-11-13 15:24 ` Steven Rostedt
2025-11-13 15:35 ` Sebastian Andrzej Siewior
2025-11-13 15:51 ` Steven Rostedt
2025-11-13 16:07 ` Sebastian Andrzej Siewior
2025-11-13 16:14 ` Steven Rostedt
2025-11-13 15:35 ` Steven Rostedt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251113103512.18e7bb03@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=bigeasy@linutronix.de \
--cc=cunhuang@tencent.com \
--cc=frankjpliu@tencent.com \
--cc=leonylgao@gmail.com \
--cc=leonylgao@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).