From: David Howells <dhowells@redhat.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Olof Johansson <olof@austin.ibm.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
jamie@shareable.org, rusty@rustcorp.com.au
Subject: Re: [PATCH/RFC] Futex mmap_sem deadlock
Date: Wed, 23 Feb 2005 11:24:49 +0000 [thread overview]
Message-ID: <5109.1109157889@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0502221123540.2378@ppc970.osdl.org>
Linus Torvalds <torvalds@osdl.org> wrote:
> It shouldn't be. If one read writer is active, another should be able to
> come in, regardless of any pending writer trying to access it. At least
> that's always been the rule for the rw-spinlocks _exactly_ for this
> reaseon.
But not with rw-semaphores. I've designed them to be as fair as I can possibly
make them. This means holding reads up if there's a write pending.
> The rwsem code tries to be fairer, maybe it has broken this case in the
> name of fairness. I personally think fairness is overrated, and would
> rather have the rwsem implementation let readers come in.
I've seen writer starvation happening due to a continuous flow of overlapping
reads... That's one of the reasons I reimplemented rwsems.
Also, fair rwsems are easier to provide an assembly optimised form for that
doesn't involve the "thundering-herd" approach. Obviously, with the spinlock
approach, you can implement any semantics you desire once you're holding the
spinlock. With the optimised form as implemented, you can't determine how many
active readers there are. This information isn't stored anywhere.
> We have a notion of "atomic get_user()" calls, which is a lot more
> efficient, and is already used for other cases where we have _real_
> deadlocks (inode semaphore for reads into shared memory mappigns).
That's okay, provided the data you're trying to access isn't lurking on a disk
somewhere.
> > Auditing other read-takers of mmap_sem, I found one more exposure that
> > could be solved by just moving code around.
>
> DavidH - what's the word on nested read-semaphores like this? Are they
> supposed to work (like nested read-spinlocks), or do we need to do the
> things Olof does?
Nesting rwsems like this is definitely on the not-recommended list.
For the special case of the mmap semaphore, I'd advocate following something
like Jamie Lokier's solution, and note when a task holds its own mmap_sem
semaphore read-locked such that page-fault can avoid taking it again.
Something like:
struct task_struct {
...
unsigned mmsem_nest;
...
};
static inline void lock_mm_for_read(struct mm_struct *mm) {
if (current->mm == mm &&
current->mmsem_nest++ != 0)
;
else
down_read(&mm->mmap_sem);
}
static inline void unlock_mm_for_read(struct mm_struct *mm) {
if (current->mm == mm &&
--current->mmsem_nest != 0)
;
else
up_read(&mm->mmap_sem);
}
static inline void lock_mm_for_write(struct mm_struct *mm) {
down_write(&mm->mmap_sem);
}
static inline void unlock_mm_for_write(struct mm_struct *mm) {
up_write(&mm->mmap_sem);
}
Though I'd be tempted to say that faulting is the only case in which recursion
is permitted, and change the first two functions to reflect this and add an
extra pair specially for page-fault:
static inline void lock_mm_for_read(struct mm_struct *mm) {
down_read(&mm->mmap_sem);
if (current->mm == mm &&
current->flags |= PF_MMAP_SEM)
}
static inline void unlock_mm_for_read(struct mm_struct *mm) {
if (current->mm == mm &&
current->flags &= ~PF_MMAP_SEM)
up_read(&mm->mmap_sem);
}
static inline void lock_mm_for_fault(struct mm_struct *mm) {
if (!(current->flags & PF_MMAP_SEM))
down_read(&mm->mmap_sem);
}
static inline void unlock_mm_for_fault(struct mm_struct *mm) {
if (!(current->flags & PF_MMAP_SEM))
up_read(&mm->mmap_sem);
}
If current->mm is passed as the argument, the compiler's optimiser should
discard the first comparison in the if-statement.
Then futex.c would hold:
lock_mm_for_read(¤t->mm);
get_futex_key(...) etc.
queue_me(...) etc.
ret = get_user(...);
/* the rest */
unlock_mm_for_read(¤t->mm);
And do_page_fault() would hold:
lock_mm_for_fault(¤t->mm);
...
unlock_mm_for_fault(¤t->mm);
David
next prev parent reply other threads:[~2005-02-23 11:25 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-02-22 19:06 [PATCH/RFC] Futex mmap_sem deadlock Olof Johansson
2005-02-22 19:36 ` Linus Torvalds
2005-02-22 21:16 ` Benjamin Herrenschmidt
2005-02-22 21:19 ` Benjamin Herrenschmidt
2005-02-22 21:31 ` Linus Torvalds
2005-02-22 21:42 ` Benjamin Herrenschmidt
2005-02-22 22:10 ` Linus Torvalds
2005-02-22 22:24 ` Benjamin Herrenschmidt
2005-02-22 23:08 ` Greg KH
2005-02-23 11:24 ` David Howells [this message]
2005-02-22 19:55 ` Andrew Morton
2005-02-22 21:07 ` Jamie Lokier
2005-02-22 21:19 ` Olof Johansson
2005-02-22 22:09 ` Jamie Lokier
2005-02-22 21:19 ` Chris Friesen
2005-02-22 21:27 ` Jamie Lokier
2005-02-22 21:30 ` Linus Torvalds
2005-02-22 22:34 ` Jamie Lokier
2005-02-22 22:42 ` Olof Johansson
2005-02-22 23:20 ` Andrew Morton
2005-02-22 23:23 ` Olof Johansson
2005-02-23 11:39 ` David Howells
2005-02-23 16:22 ` Olof Johansson
2005-02-23 18:44 ` David Howells
2005-02-23 14:49 ` Joe Korty
2005-02-23 15:54 ` Linus Torvalds
2005-02-23 17:10 ` Olof Johansson
2005-02-23 17:37 ` Arjan van de Ven
2005-02-23 18:22 ` Jamie Lokier
2005-02-23 18:34 ` Linus Torvalds
2005-02-23 18:49 ` Jamie Lokier
2005-02-23 19:12 ` Olof Johansson
2005-02-23 22:00 ` Linus Torvalds
2005-02-24 0:00 ` Jamie Lokier
2005-02-23 18:37 ` Olof Johansson
2005-02-22 21:40 ` Andrew Morton
2005-02-22 21:59 ` Linus Torvalds
2005-02-23 11:42 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5109.1109157889@redhat.com \
--to=dhowells@redhat.com \
--cc=akpm@osdl.org \
--cc=jamie@shareable.org \
--cc=linux-kernel@vger.kernel.org \
--cc=olof@austin.ibm.com \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox