public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andrea Arcangeli <andrea@suse.de>,
	torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: downgrade_write replacement in remap_file_pages
Date: Tue, 08 Jun 2004 20:04:53 +0100	[thread overview]
Message-ID: <2303.1086721493@redhat.com> (raw)
In-Reply-To: <20040608093111.01a910e9.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

> 
> So ho-hum.  As a first step, David, could you please take a look into
> what's up with downgrade_write()?

Yes. Patch attached. The critical bit is in the hunk beginning at line 95.

It appears my rwsem testsuite didn't have a downgrade write test. I added one
and the bug became obvious.

> (Then again, we need to have a serious think about the overflow bug.  It's
> fatal.  Should we fix it?  If so, the current rwsem implementation is
> probably unsalvageable).

The optimised rwsem implementation is probably okay, provided you don't have a
32-bit machine with silly amounts of memory, for which the 32767 process limit
is probably reasonable. Maybe make it contingent on CONFIG_HIGHMEM on X86?

If you don't want to use an optimised version, there _is_ a spinlock version
as well.

For 64-bit archs like x86_64 and ppc64 this algorithm can easily made to use a
64-bit counter (lib/rwsem.c wants a signed long counter), so it's just a
matter of using XADDQ, LDARX/STDCX or equivalents (I have patches for those
two). I've also got MIPS64 patches somewhere too, though they're horribly out
of date.

Alpha and S390 already do the 64-bit counter thing. And if sparc64 has a
64-bit CAS, that can be used. IA64's rwsems should be 64-bitable too.

Using a 64-bit counter gives you a limit of 2 billion processes.

David



[-- Attachment #2: Type: text/plain, Size: 4087 bytes --]


Signed-Off-By: David Howells <dhowells@redhat.com>
D: Stop downgrade_write() from under-adjusting the rwsem counter in optimised
D: rw-semaphores.

diff -urp linux-2.6.6/lib/rwsem.c linux-2.6.6-keys/lib/rwsem.c
--- linux-2.6.6/lib/rwsem.c	2004-05-11 11:28:57.000000000 +0100
+++ linux-2.6.6-keys/lib/rwsem.c	2004-06-08 19:31:35.550653817 +0100
@@ -29,15 +29,15 @@ void rwsemtrace(struct rw_semaphore *sem
 
 /*
  * handle the lock being released whilst there are processes blocked on it that can now run
- * - if we come here, then:
- *   - the 'active part' of the count (&0x0000ffff) reached zero but has been re-incremented
+ * - if we come here from up_xxxx(), then:
+ *   - the 'active part' of the count (&0x0000ffff) had reached zero (but may have changed)
  *   - the 'waiting part' of the count (&0xffff0000) is negative (and will still be so)
  *   - there must be someone on the queue
  * - the spinlock must be held by the caller
  * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if wakewrite is non-zero
+ * - writers are only woken if downgrading is false
  */
-static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
+static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int downgrading)
 {
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
@@ -46,10 +46,12 @@ static inline struct rw_semaphore *__rws
 
 	rwsemtrace(sem,"Entering __rwsem_do_wake");
 
-	if (!wakewrite)
+	if (downgrading)
 		goto dont_wake_writers;
 
-	/* only wake someone up if we can transition the active part of the count from 0 -> 1 */
+	/* if we came through an up_xxxx() call, we only only wake someone up
+	 * if we can transition the active part of the count from 0 -> 1
+	 */
  try_again:
 	oldcount = rwsem_atomic_update(RWSEM_ACTIVE_BIAS,sem) - RWSEM_ACTIVE_BIAS;
 	if (oldcount & RWSEM_ACTIVE_MASK)
@@ -78,9 +80,10 @@ static inline struct rw_semaphore *__rws
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
 		goto out;
 
-	/* grant an infinite number of read locks to the readers at the front of the queue
-	 * - note we increment the 'active part' of the count by the number of readers (less one
-	 *   for the activity decrement we've already done) before waking any processes up
+	/* grant an infinite number of read locks to the readers at the front
+	 * of the queue
+	 * - note we increment the 'active part' of the count by the number of
+	 *   readers before waking any processes up
 	 */
  readers_only:
 	woken = 0;
@@ -95,8 +98,10 @@ static inline struct rw_semaphore *__rws
 	} while (waiter->flags & RWSEM_WAITING_FOR_READ);
 
 	loop = woken;
-	woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
-	woken -= RWSEM_ACTIVE_BIAS;
+	woken *= RWSEM_ACTIVE_BIAS - RWSEM_WAITING_BIAS;
+	if (!downgrading)
+		woken -= RWSEM_ACTIVE_BIAS; /* we'd already done one increment
+					     * earlier */
 	rwsem_atomic_add(woken,sem);
 
 	next = sem->wait_list.next;
@@ -150,7 +155,7 @@ static inline struct rw_semaphore *rwsem
 	 * - it might even be this process, since the waker takes a more active part
 	 */
 	if (!(count & RWSEM_ACTIVE_MASK))
-		sem = __rwsem_do_wake(sem,1);
+		sem = __rwsem_do_wake(sem, 0);
 
 	spin_unlock(&sem->wait_lock);
 
@@ -201,7 +206,7 @@ struct rw_semaphore fastcall __sched *rw
 
 /*
  * handle waking up a waiter on the semaphore
- * - up_read has decremented the active part of the count if we come here
+ * - up_read/up_write has decremented the active part of the count if we come here
  */
 struct rw_semaphore fastcall *rwsem_wake(struct rw_semaphore *sem)
 {
@@ -211,7 +216,7 @@ struct rw_semaphore fastcall *rwsem_wake
 
 	/* do nothing if list empty */
 	if (!list_empty(&sem->wait_list))
-		sem = __rwsem_do_wake(sem,1);
+		sem = __rwsem_do_wake(sem, 0);
 
 	spin_unlock(&sem->wait_lock);
 
@@ -233,7 +238,7 @@ struct rw_semaphore fastcall *rwsem_down
 
 	/* do nothing if list empty */
 	if (!list_empty(&sem->wait_list))
-		sem = __rwsem_do_wake(sem,0);
+		sem = __rwsem_do_wake(sem, 1);
 
 	spin_unlock(&sem->wait_lock);
 

  parent reply	other threads:[~2004-06-08 19:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-08 15:44 downgrade_write replacement in remap_file_pages Andrea Arcangeli
2004-06-08 16:31 ` Andrew Morton
2004-06-08 16:39   ` Linus Torvalds
2004-06-08 19:04   ` David Howells [this message]
2004-06-08 17:05 ` David Howells
2004-06-08 22:33   ` Andrea Arcangeli
2004-06-08 19:36 ` William Lee Irwin III
2004-06-08 22:52   ` Andrea Arcangeli
2004-06-09 12:19   ` [PATCH] A generic_file_sendpage() Alexander Nyberg
2004-06-10 19:49     ` Pavel Machek
2004-06-25 19:19     ` Jörn Engel
2004-06-25 19:46       ` viro
2004-06-25 20:03         ` Jörn Engel
2004-06-26  0:53           ` Trond Myklebust
2004-06-28 11:41             ` Jörn Engel
2004-06-25 20:05       ` Andreas Dilger
2004-06-25 20:09         ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2303.1086721493@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox