Re: [PATCH] trace/pid_list: optimize pid_list->lock contention

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steven Rostedt <rostedt@goodmis.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yongliang Gao <leonylgao@gmail.com>,
	mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	Yongliang Gao <leonylgao@tencent.com>,
	Huang Cun <cunhuang@tencent.com>
Subject: Re: [PATCH] trace/pid_list: optimize pid_list->lock contention
Date: Tue, 11 Nov 2025 10:27:39 -0500	[thread overview]
Message-ID: <20251111102739.2a0a64cf@gandalf.local.home> (raw)
In-Reply-To: <20251111081314.j8CFfAD6@linutronix.de>

On Tue, 11 Nov 2025 09:13:14 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Nope, no read-write lock that can be used in atomic sections. Well,
> there is RCU.

Well, it can't simply be replaced by RCU as the write side is also a
critical path. It happens when new tasks are spawned.

Now we could possibly do some RCU like magic, and remove the lock in the
read, but it would need some care with the writes.

Something like this (untested):

bool trace_pid_list_is_set(struct trace_pid_list *pid_list, unsigned int pid)
{
	union upper_chunk *upper_chunk;
	union lower_chunk *lower_chunk;
	unsigned long flags;
	unsigned int upper1;
	unsigned int upper2;
	unsigned int lower;
	bool ret = false;

	if (!pid_list)
		return false;

	if (pid_split(pid, &upper1, &upper2, &lower) < 0)
		return false;

	upper_chunk = READ_ONCE(pid_list->upper[upper1]);
	if (upper_chunk) {
		lower_chunk = READ_ONCE(upper_chunk->data[upper2]);
		if (lower_chunk)
			ret = test_bit(lower, lower_chunk->data);
	}

	return ret;
}

Now when all the bits of a chunk is cleared, it goes to a free-list. And
when a new chunk is needed, it acquires it from that free-list. We need to
make sure that the chunk acquired in the read hasn't gone through the
free-list.

Now we could have an atomic counter in the pid_list and make this more of a
seqcount? That is, have the counter updated when a chunk goes to the free
list and also when it is taken from the free list. We could then make this:

 again:
	counter = atomic_read(&pid_list->counter);
	smp_rmb();
	upper_chunk = READ_ONCE(pid_list->upper[upper1]);
	if (upper_chunk) {
		lower_chunk = READ_ONCE(upper_chunk->data[upper2]);
		if (lower_chunk) {
			ret = test_bit(lower, lower_chunk->data);
			smp_rmb();
			if (unlikely(counter != atomic_read(&pid_list->counter))) {
				ret = false;
				goto again;
			}
		}
	}

And in the set we need:

	upper_chunk = pid_list->upper[upper1];
	if (!upper_chunk) {
		upper_chunk = get_upper_chunk(pid_list);
		if (!upper_chunk) {
			ret = -ENOMEM;
			goto out;
		}
		atomic_inc(&pid_list->counter);
		smp_wmb();
		WRITE_ONCE(pid_list->upper[upper1], upper_chunk);
	}
	lower_chunk = upper_chunk->data[upper2];
	if (!lower_chunk) {
		lower_chunk = get_lower_chunk(pid_list);
		if (!lower_chunk) {
			ret = -ENOMEM;
			goto out;
		}
		atomic_inc(&pid_list->counter);
		smp_wmb();
		WRITE_ONCE(upper_chunk->data[upper2], lower_chunk);
	}

and in the clear:

	if (find_first_bit(lower_chunk->data, LOWER_MAX) >= LOWER_MAX) {
		put_lower_chunk(pid_list, lower_chunk);
		WRITE_ONCE(upper_chunk->data[upper2], NULL);
		smp_wmb();
		atomic_inc(&pid_list->counter);
		if (upper_empty(upper_chunk)) {
			put_upper_chunk(pid_list, upper_chunk);
			WRITE_ONCE(pid_list->upper[upper1], NULL);
			smp_wmb();
			atomic_inc(&pid_list->counter);
		}
	}

That is, the counter gets updated after setting the chunk to NULL and
before assigning it a new value. And reading it, the counter is read before
looking at any of the chunks, and tested after getting the result. If the
value is the same, then the chunks are for the correct PID and haven't
swapped in a free/alloc swap where it's looking at a chunk for a different
PID.

This would allow for the read to not take any locks.

-- Steve

next prev parent reply	other threads:[~2025-11-11 15:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 11:49 [PATCH] trace/pid_list: optimize pid_list->lock contention Yongliang Gao
2025-11-10 23:38 ` Steven Rostedt
2025-11-11  8:13   ` Sebastian Andrzej Siewior
2025-11-11 10:38     ` Yongliang Gao
2025-11-11 15:27     ` Steven Rostedt [this message]
2025-11-12  5:27       ` Yongliang Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251111102739.2a0a64cf@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=bigeasy@linutronix.de \
    --cc=cunhuang@tencent.com \
    --cc=leonylgao@gmail.com \
    --cc=leonylgao@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).