From: Andrea Arcangeli <andrea@suse.de>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org, jdike@karaya.com
Subject: Re: expand_stack fix [was Re: 2.4.9aa3]
Date: Sat, 8 Sep 2001 18:04:16 +0200 [thread overview]
Message-ID: <20010908180416.Z11329@athlon.random> (raw)
In-Reply-To: <20010903172445.N699@athlon.random> <Pine.LNX.4.33.0109071142060.10472-100000@penguin.transmeta.com>
In-Reply-To: <Pine.LNX.4.33.0109071142060.10472-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Fri, Sep 07, 2001 at 11:47:09AM -0700
On Fri, Sep 07, 2001 at 11:47:09AM -0700, Linus Torvalds wrote:
>
> On Mon, 3 Sep 2001, Andrea Arcangeli wrote:
> >
> > Linus please include the attached patch to the next kernel, expand_stack
> > is totally broken at the moment, we cannot mess with the mm vma layout
> > if we don't hold the mmap_sem in write mode.
>
> I disagree with the diagnosis..
>
> expand_stack() has _never_ messed with the vma layout, and never should.
> As such, from a vma list integrity standpoint it is fine.
>
> Do we mess with the contents? Yes. But I'd much rather see a much more
> minimal approach to the problem, on the order of:
> - make sure we only accept GROWSDOWN for anonymous areas (which don't
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> care about the offset)
> - make the vm_start update atomic (possibly by just getting the pagetable
> spinlock).
We just take the pagetable spinlock there, the race is all about the
pgoff.
In short you agree that the current locking is broken but you propose to
limit the usability of GROWSDOWN and GROWSUP solely to the anonymous
vmas instead of fixing the pgoff race with proper locking as I did.
My fix for the race doesn't drop the usability of GROWSDOWN that could
otherwise break userspace programs. I guess at least uml uses growsdown
vma file backed. Jeff?
Assuming it's acceptable to break GROWSDOWN on file backed vmas as you
suggested, the new locking rules with your approch would be:
1) pgoff is garbage as usual for any anon vma so don't even try to
mess with it in expand_stack
2) with only the read semaphore acquired the vm_start/vm_end of the
vmas can expand under us but the vma tree not
3) with only the read semaphore acquired to get a consistent
vm_start/vm_end of all vmas in the tree (like we need to in
expand_stack) also the page_table_lock of the mm must be
acquired
4) with the write lock vm_start/vm_end are consistent always as
well as tree so no change here
> > I considered implementing a read->write semaphore upgrade primitive but
> > it cannot be reliable
>
> There is no such thing. Never has been. It's a fundamentally impossible
> operation. We may, at some point, decide to have a "read_for_write()" and
Yes, of course with "it cannot be reliable" I meant it cannot work out
always, it would be an optimization for the common case:
spin_lock(&rwsem->lock)
if (only one reader)
gain write privilegies
err = success
else
err = faliure
spin_unlock(&rwsem->lock)
return err
it would still require the fail path in case of the faliure (multiple
readers potentially all trying to upgrade the lock) so I ignored the
optimization (expand_stack isn't a very fast path anyways).
> However, having a finer-granularity spinlock _inside_ the semaphore (see
> above suggestion) is a perfectly valid approach.
if you are 100% sure it's acceptable to break the kernel API for the
GROWSDOWN file backed vmas (which I don't think it's the case) I can
switch to your suggested fix (otherwise I prefer to keep upgrading the
semaphore in expand_stack rather to have to spinlocking at every nopage
private/shared page fault).
Andrea
next prev parent reply other threads:[~2001-09-08 16:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-08-19 6:07 2.4.9aa3 Andrea Arcangeli
2001-09-03 15:24 ` expand_stack fix [was Re: 2.4.9aa3] Andrea Arcangeli
2001-09-07 18:47 ` Linus Torvalds
2001-09-08 16:04 ` Andrea Arcangeli [this message]
2001-09-08 16:28 ` Linus Torvalds
2001-09-09 4:23 ` Jeff Dike
2001-09-09 3:50 ` Andrea Arcangeli
2001-09-09 5:42 ` Jeff Dike
2001-09-11 11:24 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010908180416.Z11329@athlon.random \
--to=andrea@suse.de \
--cc=jdike@karaya.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox