Re: downgrade_write replacement in remap_file_pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Howells <dhowells@redhat.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andrea Arcangeli <andrea@suse.de>,
	torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: downgrade_write replacement in remap_file_pages
Date: Tue, 08 Jun 2004 20:04:53 +0100	[thread overview]
Message-ID: <2303.1086721493@redhat.com> (raw)
In-Reply-To: <20040608093111.01a910e9.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

> 
> So ho-hum.  As a first step, David, could you please take a look into
> what's up with downgrade_write()?

Yes. Patch attached. The critical bit is in the hunk beginning at line 95.

It appears my rwsem testsuite didn't have a downgrade write test. I added one
and the bug became obvious.

> (Then again, we need to have a serious think about the overflow bug.  It's
> fatal.  Should we fix it?  If so, the current rwsem implementation is
> probably unsalvageable).

The optimised rwsem implementation is probably okay, provided you don't have a
32-bit machine with silly amounts of memory, for which the 32767 process limit
is probably reasonable. Maybe make it contingent on CONFIG_HIGHMEM on X86?

If you don't want to use an optimised version, there _is_ a spinlock version
as well.

For 64-bit archs like x86_64 and ppc64 this algorithm can easily made to use a
64-bit counter (lib/rwsem.c wants a signed long counter), so it's just a
matter of using XADDQ, LDARX/STDCX or equivalents (I have patches for those
two). I've also got MIPS64 patches somewhere too, though they're horribly out
of date.

Alpha and S390 already do the 64-bit counter thing. And if sparc64 has a
64-bit CAS, that can be used. IA64's rwsems should be 64-bitable too.

Using a 64-bit counter gives you a limit of 2 billion processes.

David



[-- Attachment #2: Type: text/plain, Size: 4087 bytes --]


Signed-Off-By: David Howells <dhowells@redhat.com>
D: Stop downgrade_write() from under-adjusting the rwsem counter in optimised
D: rw-semaphores.

diff -urp linux-2.6.6/lib/rwsem.c linux-2.6.6-keys/lib/rwsem.c
--- linux-2.6.6/lib/rwsem.c	2004-05-11 11:28:57.000000000 +0100
+++ linux-2.6.6-keys/lib/rwsem.c	2004-06-08 19:31:35.550653817 +0100
@@ -29,15 +29,15 @@ void rwsemtrace(struct rw_semaphore *sem
 
 /*
  * handle the lock being released whilst there are processes blocked on it that can now run
- * - if we come here, then:
- *   - the 'active part' of the count (&0x0000ffff) reached zero but has been re-incremented
+ * - if we come here from up_xxxx(), then:
+ *   - the 'active part' of the count (&0x0000ffff) had reached zero (but may have changed)
  *   - the 'waiting part' of the count (&0xffff0000) is negative (and will still be so)
  *   - there must be someone on the queue
  * - the spinlock must be held by the caller
  * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if wakewrite is non-zero
+ * - writers are only woken if downgrading is false
  */
-static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
+static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int downgrading)
 {
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
@@ -46,10 +46,12 @@ static inline struct rw_semaphore *__rws
 
 	rwsemtrace(sem,"Entering __rwsem_do_wake");
 
-	if (!wakewrite)
+	if (downgrading)
 		goto dont_wake_writers;
 
-	/* only wake someone up if we can transition the active part of the count from 0 -> 1 */
+	/* if we came through an up_xxxx() call, we only only wake someone up
+	 * if we can transition the active part of the count from 0 -> 1
+	 */
  try_again:
 	oldcount = rwsem_atomic_update(RWSEM_ACTIVE_BIAS,sem) - RWSEM_ACTIVE_BIAS;
 	if (oldcount & RWSEM_ACTIVE_MASK)
@@ -78,9 +80,10 @@ static inline struct rw_semaphore *__rws
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
 		goto out;
 
-	/* grant an infinite number of read locks to the readers at the front of the queue
-	 * - note we increment the 'active part' of the count by the number of readers (less one
-	 *   for the activity decrement we've already done) before waking any processes up
+	/* grant an infinite number of read locks to the readers at the front
+	 * of the queue
+	 * - note we increment the 'active part' of the count by the number of
+	 *   readers before waking any processes up
 	 */
  readers_only:
 	woken = 0;
@@ -95,8 +98,10 @@ static inline struct rw_semaphore *__rws
 	} while (waiter->flags & RWSEM_WAITING_FOR_READ);
 
 	loop = woken;
-	woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
-	woken -= RWSEM_ACTIVE_BIAS;
+	woken *= RWSEM_ACTIVE_BIAS - RWSEM_WAITING_BIAS;
+	if (!downgrading)
+		woken -= RWSEM_ACTIVE_BIAS; /* we'd already done one increment
+					     * earlier */
 	rwsem_atomic_add(woken,sem);
 
 	next = sem->wait_list.next;
@@ -150,7 +155,7 @@ static inline struct rw_semaphore *rwsem
 	 * - it might even be this process, since the waker takes a more active part
 	 */
 	if (!(count & RWSEM_ACTIVE_MASK))
-		sem = __rwsem_do_wake(sem,1);
+		sem = __rwsem_do_wake(sem, 0);
 
 	spin_unlock(&sem->wait_lock);
 
@@ -201,7 +206,7 @@ struct rw_semaphore fastcall __sched *rw
 
 /*
  * handle waking up a waiter on the semaphore
- * - up_read has decremented the active part of the count if we come here
+ * - up_read/up_write has decremented the active part of the count if we come here
  */
 struct rw_semaphore fastcall *rwsem_wake(struct rw_semaphore *sem)
 {
@@ -211,7 +216,7 @@ struct rw_semaphore fastcall *rwsem_wake
 
 	/* do nothing if list empty */
 	if (!list_empty(&sem->wait_list))
-		sem = __rwsem_do_wake(sem,1);
+		sem = __rwsem_do_wake(sem, 0);
 
 	spin_unlock(&sem->wait_lock);
 
@@ -233,7 +238,7 @@ struct rw_semaphore fastcall *rwsem_down
 
 	/* do nothing if list empty */
 	if (!list_empty(&sem->wait_list))
-		sem = __rwsem_do_wake(sem,0);
+		sem = __rwsem_do_wake(sem, 1);
 
 	spin_unlock(&sem->wait_lock);

next prev parent reply	other threads:[~2004-06-08 19:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-08 15:44 downgrade_write replacement in remap_file_pages Andrea Arcangeli
2004-06-08 16:31 ` Andrew Morton
2004-06-08 16:39   ` Linus Torvalds
2004-06-08 19:04   ` David Howells [this message]
2004-06-08 17:05 ` David Howells
2004-06-08 22:33   ` Andrea Arcangeli
2004-06-08 19:36 ` William Lee Irwin III
2004-06-08 22:52   ` Andrea Arcangeli
2004-06-09 12:19   ` [PATCH] A generic_file_sendpage() Alexander Nyberg
2004-06-10 19:49     ` Pavel Machek
2004-06-25 19:19     ` Jörn Engel
2004-06-25 19:46       ` viro
2004-06-25 20:03         ` Jörn Engel
2004-06-26  0:53           ` Trond Myklebust
2004-06-28 11:41             ` Jörn Engel
2004-06-25 20:05       ` Andreas Dilger
2004-06-25 20:09         ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2303.1086721493@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.