From: David Howells <dhowells@redhat.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andrea Arcangeli <andrea@suse.de>,
torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: downgrade_write replacement in remap_file_pages
Date: Tue, 08 Jun 2004 20:04:53 +0100 [thread overview]
Message-ID: <2303.1086721493@redhat.com> (raw)
In-Reply-To: <20040608093111.01a910e9.akpm@osdl.org>
[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]
>
> So ho-hum. As a first step, David, could you please take a look into
> what's up with downgrade_write()?
Yes. Patch attached. The critical bit is in the hunk beginning at line 95.
It appears my rwsem testsuite didn't have a downgrade write test. I added one
and the bug became obvious.
> (Then again, we need to have a serious think about the overflow bug. It's
> fatal. Should we fix it? If so, the current rwsem implementation is
> probably unsalvageable).
The optimised rwsem implementation is probably okay, provided you don't have a
32-bit machine with silly amounts of memory, for which the 32767 process limit
is probably reasonable. Maybe make it contingent on CONFIG_HIGHMEM on X86?
If you don't want to use an optimised version, there _is_ a spinlock version
as well.
For 64-bit archs like x86_64 and ppc64 this algorithm can easily made to use a
64-bit counter (lib/rwsem.c wants a signed long counter), so it's just a
matter of using XADDQ, LDARX/STDCX or equivalents (I have patches for those
two). I've also got MIPS64 patches somewhere too, though they're horribly out
of date.
Alpha and S390 already do the 64-bit counter thing. And if sparc64 has a
64-bit CAS, that can be used. IA64's rwsems should be 64-bitable too.
Using a 64-bit counter gives you a limit of 2 billion processes.
David
[-- Attachment #2: Type: text/plain, Size: 4087 bytes --]
Signed-Off-By: David Howells <dhowells@redhat.com>
D: Stop downgrade_write() from under-adjusting the rwsem counter in optimised
D: rw-semaphores.
diff -urp linux-2.6.6/lib/rwsem.c linux-2.6.6-keys/lib/rwsem.c
--- linux-2.6.6/lib/rwsem.c 2004-05-11 11:28:57.000000000 +0100
+++ linux-2.6.6-keys/lib/rwsem.c 2004-06-08 19:31:35.550653817 +0100
@@ -29,15 +29,15 @@ void rwsemtrace(struct rw_semaphore *sem
/*
* handle the lock being released whilst there are processes blocked on it that can now run
- * - if we come here, then:
- * - the 'active part' of the count (&0x0000ffff) reached zero but has been re-incremented
+ * - if we come here from up_xxxx(), then:
+ * - the 'active part' of the count (&0x0000ffff) had reached zero (but may have changed)
* - the 'waiting part' of the count (&0xffff0000) is negative (and will still be so)
* - there must be someone on the queue
* - the spinlock must be held by the caller
* - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if wakewrite is non-zero
+ * - writers are only woken if downgrading is false
*/
-static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
+static inline struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem, int downgrading)
{
struct rwsem_waiter *waiter;
struct task_struct *tsk;
@@ -46,10 +46,12 @@ static inline struct rw_semaphore *__rws
rwsemtrace(sem,"Entering __rwsem_do_wake");
- if (!wakewrite)
+ if (downgrading)
goto dont_wake_writers;
- /* only wake someone up if we can transition the active part of the count from 0 -> 1 */
+ /* if we came through an up_xxxx() call, we only only wake someone up
+ * if we can transition the active part of the count from 0 -> 1
+ */
try_again:
oldcount = rwsem_atomic_update(RWSEM_ACTIVE_BIAS,sem) - RWSEM_ACTIVE_BIAS;
if (oldcount & RWSEM_ACTIVE_MASK)
@@ -78,9 +80,10 @@ static inline struct rw_semaphore *__rws
if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
goto out;
- /* grant an infinite number of read locks to the readers at the front of the queue
- * - note we increment the 'active part' of the count by the number of readers (less one
- * for the activity decrement we've already done) before waking any processes up
+ /* grant an infinite number of read locks to the readers at the front
+ * of the queue
+ * - note we increment the 'active part' of the count by the number of
+ * readers before waking any processes up
*/
readers_only:
woken = 0;
@@ -95,8 +98,10 @@ static inline struct rw_semaphore *__rws
} while (waiter->flags & RWSEM_WAITING_FOR_READ);
loop = woken;
- woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
- woken -= RWSEM_ACTIVE_BIAS;
+ woken *= RWSEM_ACTIVE_BIAS - RWSEM_WAITING_BIAS;
+ if (!downgrading)
+ woken -= RWSEM_ACTIVE_BIAS; /* we'd already done one increment
+ * earlier */
rwsem_atomic_add(woken,sem);
next = sem->wait_list.next;
@@ -150,7 +155,7 @@ static inline struct rw_semaphore *rwsem
* - it might even be this process, since the waker takes a more active part
*/
if (!(count & RWSEM_ACTIVE_MASK))
- sem = __rwsem_do_wake(sem,1);
+ sem = __rwsem_do_wake(sem, 0);
spin_unlock(&sem->wait_lock);
@@ -201,7 +206,7 @@ struct rw_semaphore fastcall __sched *rw
/*
* handle waking up a waiter on the semaphore
- * - up_read has decremented the active part of the count if we come here
+ * - up_read/up_write has decremented the active part of the count if we come here
*/
struct rw_semaphore fastcall *rwsem_wake(struct rw_semaphore *sem)
{
@@ -211,7 +216,7 @@ struct rw_semaphore fastcall *rwsem_wake
/* do nothing if list empty */
if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem,1);
+ sem = __rwsem_do_wake(sem, 0);
spin_unlock(&sem->wait_lock);
@@ -233,7 +238,7 @@ struct rw_semaphore fastcall *rwsem_down
/* do nothing if list empty */
if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem,0);
+ sem = __rwsem_do_wake(sem, 1);
spin_unlock(&sem->wait_lock);
next prev parent reply other threads:[~2004-06-08 19:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-08 15:44 downgrade_write replacement in remap_file_pages Andrea Arcangeli
2004-06-08 16:31 ` Andrew Morton
2004-06-08 16:39 ` Linus Torvalds
2004-06-08 19:04 ` David Howells [this message]
2004-06-08 17:05 ` David Howells
2004-06-08 22:33 ` Andrea Arcangeli
2004-06-08 19:36 ` William Lee Irwin III
2004-06-08 22:52 ` Andrea Arcangeli
2004-06-09 12:19 ` [PATCH] A generic_file_sendpage() Alexander Nyberg
2004-06-10 19:49 ` Pavel Machek
2004-06-25 19:19 ` Jörn Engel
2004-06-25 19:46 ` viro
2004-06-25 20:03 ` Jörn Engel
2004-06-26 0:53 ` Trond Myklebust
2004-06-28 11:41 ` Jörn Engel
2004-06-25 20:05 ` Andreas Dilger
2004-06-25 20:09 ` Jörn Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2303.1086721493@redhat.com \
--to=dhowells@redhat.com \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox