Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Segall <bsegall@google.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	 Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	 Waiman Long <longman@redhat.com>,
	 Boqun Feng <boqun.feng@gmail.com>,
	 linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write
Date: Wed, 24 Jan 2024 14:10:43 -0800	[thread overview]
Message-ID: <xm26v87imlgc.fsf@bsegall-linux.svl.corp.google.com> (raw)
In-Reply-To: <20240123150541.1508-1-hdanton@sina.com> (Hillf Danton's message of "Tue, 23 Jan 2024 23:05:41 +0800")

Hillf Danton <hdanton@sina.com> writes:

> On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall <bsegall@google.com>
>> The waitq wakeup in percpu_up_write necessarily runs the wake function
>> immediately in the current thread. With it calling
>> __percpu_rwsem_trylock on behalf of the thread being woken, the lock is
>> extremely fair and FIFO, with the window for unfairness merely being the
>> time between the release of sem->block and the completion of a relevant
>> trylock.
>> 
>> However, the woken threads that now hold the lock may not be chosen to
>> run for a long time, and it would be useful to have more of this window
>> available for a currently running thread to unfairly take the lock
>> immediately and use it.
>
> It makes no sense for lock acquirer to probe owner's activity except for
> spining on owner. Nor for owner to guess if any acquirer comes soon.

The code is not doing that; this text is just describing why we might
choose a less fair heuristic for which thread gets the lock.

>
>> This can result in priority-inversion issues
>> with high contention or things like CFS_BANDWIDTH quotas.
>
> Given mutex could not avoid PI (priority-inversion) and deadlock, why is
> percpu-rwsem special wrt PI?

I was going to say that mutex/rwsem have SPIN_ON_OWNER that dodge this
somewhat (and percpu-rwsem cannot do that). Switching
cgroup_threadgroup_rwsem to an actual rwsem and even disabling read-side
RWSEM_FLAG_HANDOFF doesn't actually help noticeably for my artificial
benchmark though, so the test may not be as representative as I hoped.

The most obvious possibility is that with the real problem
solving/not-causing the internal contention issues was sufficient, and
that also attacking it from the percpu-rwsem angle was overkill. It
wasn't sufficient for the artificial test, but cranking up the load to
get a reliable test could easily have blown past the point where the
other fix was sufficient.

>> 
>> Signed-off-by: Ben Segall <bsegall@google.com>
>> 
>> ---
>> 
>> So the actual problem we saw was that one job had severe slowdowns
>> during startup with certain other jobs on the machine, and the slowdowns
>> turned out to be some cgroup moves it did during startup. The antagonist
>> jobs were spawning huge numbers of threads and some other internal bugs
>> were exacerbating their contention. The lock handoff meant that a batch
>> of antagonist threads would receive the read lock of
>> cgroup_threadgroup_rwsem and at least some of those threads would take a
>> long time to be scheduled.
>
> If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF
> in rwsem_down_read_slowpath().

rwsem's HANDOFF flag is the exact opposite of what this patch is doing.
Percpu-rwsem's current code has perfect handoff for read->write, and a very
short window for write->read (or write->write) to be beaten by a new writer.

next prev parent reply	other threads:[~2024-01-24 22:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 22:59 [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write Benjamin Segall
2024-01-23 15:05 ` Hillf Danton
2024-01-24 22:10   ` Benjamin Segall [this message]
2024-01-25 11:04     ` Hillf Danton
2024-01-25 21:08       ` Benjamin Segall
2024-01-26 12:22         ` Hillf Danton
2024-01-26 20:40           ` Benjamin Segall
2024-01-27 11:20             ` Hillf Danton
2024-01-29 20:36               ` Benjamin Segall
2024-01-30 11:41                 ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26v87imlgc.fsf@bsegall-linux.svl.corp.google.com \
    --to=bsegall@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox