All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Matthew Wilcox <willy@infradead.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Wait for mutex to become unlocked
Date: Thu, 05 May 2022 02:22:30 +0200	[thread overview]
Message-ID: <87pmksj0ah.ffs@tglx> (raw)
In-Reply-To: <YnLzrGlBNCmCPLmS@casper.infradead.org>

On Wed, May 04 2022 at 22:44, Matthew Wilcox wrote:
> The customer has a low priority task which wants to read /proc/pid/smaps
> of a higher priority task.  Today, everything is awful; smaps acquires
> mmap_sem read-only, is preempted, then the high-pri task calls mmap()
> and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
> other threads in the high-pri task block on the mmap_sem as they take
> page faults because we don't want writers to starve.

Welcome to the wonderful world of priority inversion.

> So this is Good.  For the vast majority of cases, we avoid taking the
> mmap read lock and the problem will appear much less often.  But we can
> do Better with a new API.  You see, for this case, we don't actually
> want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> point in spinning waiting for the writer to finish when we can sleep.
> I'd like to write this code:
>
> again:
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		rwsem_wait_read(&mm->mmap_sem);
> 		goto again;
> 	}
>
> That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> and wakes it up without giving it the lock.  Now this thread will never
> be able to block any thread that tries to acquire mmap_sem for write.

Never?

 	if (down_read_trylock(&vma->sem)) {

---> preemption by writer

The writer will still block and depending on the rest of the runnable
threads it can take quite a while until the low prio reader comes back
on a CPU.

I grant you that the propability will decrease, but 'never' is just
wishful thinking.

> Similarly, it may make sense to add rwsem_wait_write() and mutex_wait().
> Perhaps also mutex_wait_killable() and mutex_wait_interruptible()
> (the combinatoric explosion is a bit messy; I don't know that it makes
> sense to do the _nested, _io variants).
>
> Does any of this make sense?

TBH, no.

If we start opening this can of worms, then we'll see tons of "customer
want's a pony" problems being solved by half baken "solutions" which
will exactly cause the combinatoric explosion you are worried about.

If there is a legitimate requirement to retrieve such information, then
we are better off thinking about a general approach of introspection,
which makes such information available as unreliable snapshots
retrievable with RCU protection.

The information gathered from /proc/pid/smaps is unreliable at the point
where the lock is dropped already today. So it does not make a
difference whether the VMAs have a 'read me if you really think it's
useful' sideband information which gets updated when the VMA changes and
allows to do:

 	rcu_read_lock();
 	vma = vma_lookup();
        if (vma)
                dump(vma->info);
        rcu_read_unlock();

You still need to decide, whether you want to provide that information
or not for a particular interface, but that's way more sane than the
'make locking more complex for questionable value' approach.

But looking at the stuff which gets recomputed and reevaluated in that
proc/smaps code this makes a lot of sense, because most if not all of
this information is already known at the point where the VMA is modified
while holding mmap_sem for useful reasons, no?

So no, we don't want to add more magic locking functions which pretend
to solve unsolvable problems. We rather go and make use of information
which is already available by providing a less archaic access mechanism.

What you are trying to do here is just adding to technical debt IMNSHO.

Thanks,

        tglx

  reply	other threads:[~2022-05-05  0:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04 21:44 Wait for mutex to become unlocked Matthew Wilcox
2022-05-05  0:22 ` Thomas Gleixner [this message]
2022-05-05  0:38   ` Matthew Wilcox
2022-05-05  1:14     ` Thomas Gleixner
2022-05-05  5:04       ` Paul E. McKenney
2022-05-05  5:21         ` Paul E. McKenney
2022-05-05  1:11 ` Waiman Long
     [not found] ` <20220505015223.5132-1-hdanton@sina.com>
2022-05-05  4:12   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmksj0ah.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=Liam.Howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.