All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Ben LaHaise <bcrl@redhat.com>
Cc: torvalds@transmeta.com, alan@redhat.com, linux-mm@kvack.org,
	Chris Blizzard <blizzard@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: resend Re: [PATCH] final merging patch -- significant mozilla speedup.
Date: Sun, 19 Aug 2001 05:53:11 +0200	[thread overview]
Message-ID: <20010819055311.A8700@athlon.random> (raw)
In-Reply-To: <20010819012713.N1719@athlon.random> <Pine.LNX.4.33.0108182005590.3026-100000@touchme.toronto.redhat.com> <20010819023548.P1719@athlon.random> <20010819025314.R1719@athlon.random> <20010819032544.X1719@athlon.random> <20010819034050.Z1719@athlon.random> <20010819045906.E1719@athlon.random>
In-Reply-To: <20010819045906.E1719@athlon.random>; from andrea@suse.de on Sun, Aug 19, 2001 at 04:59:06AM +0200

On Sun, Aug 19, 2001 at 04:59:06AM +0200, Andrea Arcangeli wrote:
> @@ -587,7 +591,21 @@
>  		return -ENOMEM;
>  	spin_lock(&vma->vm_mm->page_table_lock);
>  	vma->vm_start = address;
> +
> +	/*
> +	 * vm_pgoff locking is a bit subtle: everybody but expand_stack is
> +	 * playing with the vm_pgoff with the write semaphore acquired. The
> +	 * only one playing with vm_pgoff with only the read semaphore
> +	 * acquired is expand_stack and it serializes against itself with the
> +	 * spinlock.
> +	 *
> +	 * More in general this means that it is not enough to grab the mmap_sem
> +	 * in read mode to avoid vm_pgoff to change under you. You either
> +	 * need the write semaphore acquired, or the read semaphore plus
> +	 * the spinlock.
> +	 */
>  	vma->vm_pgoff -= grow;
> +
>  	vma->vm_mm->total_vm += grow;
>  	if (vma->vm_flags & VM_LOCKED)
>  		vma->vm_mm->locked_vm += grow;

unfortunately I was way too optimistic about this and I also misread
part of the code while writing the above. Looking more closely
expand_stack is a race condition in itself.

Nobody is allowed to change vm_pgoff or vm_start without holding _both_
the mm sem in _write_ mode _and_ the spinlock.

expand_stack holds the mm sem in _read_ mode and the spinlock so it is
totally broken.

All the readers thinks that only holding only the read semaphore is
enough to get coherent data but expand_stack is breaking this rule and
so all the readers can race.

To fix this problem we simply need to convert all the callers of
expand_stack to hold the write semaphore instead of the read semaphore
(this will have to be propagated to all architectures). I just checked
all the callers and they're all convertible without any real pain (we
just need to do a second lookup after upgrading the lock because we
don't have a primitive to upgrade the lock from "read" to "write"
atomically without having to release it for some time in the middle, but
expand_stack is a slow path so it's not a showstopper).

Andrea

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <andrea@suse.de>
To: Ben LaHaise <bcrl@redhat.com>
Cc: torvalds@transmeta.com, alan@redhat.com, linux-mm@kvack.org,
	Chris Blizzard <blizzard@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: resend Re: [PATCH] final merging patch -- significant mozilla speedup.
Date: Sun, 19 Aug 2001 05:53:11 +0200	[thread overview]
Message-ID: <20010819055311.A8700@athlon.random> (raw)
In-Reply-To: <20010819045906.E1719@athlon.random>; from andrea@suse.de on Sun, Aug 19, 2001 at 04:59:06AM +0200

On Sun, Aug 19, 2001 at 04:59:06AM +0200, Andrea Arcangeli wrote:
> @@ -587,7 +591,21 @@
>  		return -ENOMEM;
>  	spin_lock(&vma->vm_mm->page_table_lock);
>  	vma->vm_start = address;
> +
> +	/*
> +	 * vm_pgoff locking is a bit subtle: everybody but expand_stack is
> +	 * playing with the vm_pgoff with the write semaphore acquired. The
> +	 * only one playing with vm_pgoff with only the read semaphore
> +	 * acquired is expand_stack and it serializes against itself with the
> +	 * spinlock.
> +	 *
> +	 * More in general this means that it is not enough to grab the mmap_sem
> +	 * in read mode to avoid vm_pgoff to change under you. You either
> +	 * need the write semaphore acquired, or the read semaphore plus
> +	 * the spinlock.
> +	 */
>  	vma->vm_pgoff -= grow;
> +
>  	vma->vm_mm->total_vm += grow;
>  	if (vma->vm_flags & VM_LOCKED)
>  		vma->vm_mm->locked_vm += grow;

unfortunately I was way too optimistic about this and I also misread
part of the code while writing the above. Looking more closely
expand_stack is a race condition in itself.

Nobody is allowed to change vm_pgoff or vm_start without holding _both_
the mm sem in _write_ mode _and_ the spinlock.

expand_stack holds the mm sem in _read_ mode and the spinlock so it is
totally broken.

All the readers thinks that only holding only the read semaphore is
enough to get coherent data but expand_stack is breaking this rule and
so all the readers can race.

To fix this problem we simply need to convert all the callers of
expand_stack to hold the write semaphore instead of the read semaphore
(this will have to be propagated to all architectures). I just checked
all the callers and they're all convertible without any real pain (we
just need to do a second lookup after upgrading the lock because we
don't have a primitive to upgrade the lock from "read" to "write"
atomically without having to release it for some time in the middle, but
expand_stack is a slow path so it's not a showstopper).

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

  reply	other threads:[~2001-08-19  3:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-16 21:02 [PATCH] final merging patch -- significant mozilla speedup Ben LaHaise
2001-08-18 18:22 ` resend " Ben LaHaise
2001-08-18 23:27   ` Andrea Arcangeli
2001-08-19  0:10     ` Ben LaHaise
2001-08-19  0:35       ` Andrea Arcangeli
2001-08-19  0:50         ` Rik van Riel
2001-08-19  0:55           ` Andrea Arcangeli
2001-08-19  1:17             ` Andrea Arcangeli
2001-08-19  0:53         ` Andrea Arcangeli
2001-08-19  1:02           ` Andrea Arcangeli
2001-08-19  1:25           ` Andrea Arcangeli
2001-08-19  1:40             ` Andrea Arcangeli
2001-08-19  2:59               ` Andrea Arcangeli
2001-08-19  3:53                 ` Andrea Arcangeli [this message]
2001-08-19  3:53                   ` Andrea Arcangeli
2001-08-19  5:11                   ` Andrea Arcangeli
2001-08-19  5:11                     ` Andrea Arcangeli
2001-08-19  0:54         ` Rik van Riel
2001-08-19  1:00           ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20010819055311.A8700@athlon.random \
    --to=andrea@suse.de \
    --cc=alan@redhat.com \
    --cc=bcrl@redhat.com \
    --cc=blizzard@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.