public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch v3] rwsem: fix rwsem_is_locked() bugs
@ 2009-10-06  6:55 Amerigo Wang
  2009-10-07  9:41 ` Amerigo Wang
  2009-10-07 12:19 ` David Howells
  0 siblings, 2 replies; 4+ messages in thread
From: Amerigo Wang @ 2009-10-06  6:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ben Woodard, David Howells, akpm, Brian Behlendorf, Amerigo Wang


rwsem_is_locked() tests ->activity without locks, so we should always
keep ->activity consistent. However, the code in __rwsem_do_wake()
breaks this rule, it updates ->activity after _all_ readers waken up,
this may give some reader a wrong ->activity value, thus cause
rwsem_is_locked() behaves wrong.

Quote from Andrew:

"
- we have one or more processes sleeping in down_read(), waiting for access.

- we wake one or more processes up without altering ->activity

- they start to run and they do rwsem_is_locked().  This incorrectly
  returns "false", because the waker process is still crunching away in
  __rwsem_do_wake().

- the waker now alters ->activity, but it was too late.

And the patch fixes this by updating ->activity prior to waking the
sleeping processes.  So when they run, they'll see a non-zero value of
->activity.
"

Also, we have more problems, as pointed by David:

"... the case where the active readers run out, but there's a
writer on the queue (see __up_read()), nor the case where the active writer
ends, but there's a waiter on the queue (see __up_write()).  In both cases,
the lock is still held, though sem->activity is 0."

This patch fixes this too.

David also said we may have "the potential to cause more cacheline ping-pong
under contention", but "this change shouldn't cause a significant slowdown."

With this patch applied, I can't trigger that bug any more.

Reported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Cc: Ben Woodard <bwoodard@llnl.gov>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: WANG Cong <amwang@redhat.com>

---
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index 6c3c0f6..1a65776 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -71,7 +71,14 @@ extern void __downgrade_write(struct rw_semaphore *sem);
 
 static inline int rwsem_is_locked(struct rw_semaphore *sem)
 {
-	return (sem->activity != 0);
+	int ret;
+
+	if (spin_trylock_irq(&sem->wait_lock)) {
+		ret = !(list_empty(&sem->wait_list) && sem->activity == 0);
+		spin_unlock_irq(&sem->wait_lock);
+		return ret;
+	}
+	return 1;
 }
 
 #endif /* __KERNEL__ */
diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index 9df3ca5..234d83f 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -78,7 +78,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 
 	/* grant an infinite number of read locks to the front of the queue */
  dont_wake_writers:
-	woken = 0;
+	/*
+	 * we increase ->activity just to make rwsem_is_locked() happy,
+	 * to avoid potential cache line ping-pong, we don't do this
+	 * within the following loop.
+	 */
+	woken = sem->activity++;
 	while (waiter->flags & RWSEM_WAITING_FOR_READ) {
 		struct list_head *next = waiter->list.next;
 
@@ -94,7 +99,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 		waiter = list_entry(next, struct rwsem_waiter, list);
 	}
 
-	sem->activity += woken;
+	sem->activity = woken;
 
  out:
 	return sem;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Patch v3] rwsem: fix rwsem_is_locked() bugs
  2009-10-06  6:55 [Patch v3] rwsem: fix rwsem_is_locked() bugs Amerigo Wang
@ 2009-10-07  9:41 ` Amerigo Wang
  2009-10-07 12:19 ` David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: Amerigo Wang @ 2009-10-07  9:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ben Woodard, David Howells, akpm, Brian Behlendorf


David, any comments on this version? :)

Thanks.


Amerigo Wang wrote:
> rwsem_is_locked() tests ->activity without locks, so we should always
> keep ->activity consistent. However, the code in __rwsem_do_wake()
> breaks this rule, it updates ->activity after _all_ readers waken up,
> this may give some reader a wrong ->activity value, thus cause
> rwsem_is_locked() behaves wrong.
> 
> Quote from Andrew:
> 
> "
> - we have one or more processes sleeping in down_read(), waiting for access.
> 
> - we wake one or more processes up without altering ->activity
> 
> - they start to run and they do rwsem_is_locked().  This incorrectly
>   returns "false", because the waker process is still crunching away in
>   __rwsem_do_wake().
> 
> - the waker now alters ->activity, but it was too late.
> 
> And the patch fixes this by updating ->activity prior to waking the
> sleeping processes.  So when they run, they'll see a non-zero value of
> ->activity.
> "
> 
> Also, we have more problems, as pointed by David:
> 
> "... the case where the active readers run out, but there's a
> writer on the queue (see __up_read()), nor the case where the active writer
> ends, but there's a waiter on the queue (see __up_write()).  In both cases,
> the lock is still held, though sem->activity is 0."
> 
> This patch fixes this too.
> 
> David also said we may have "the potential to cause more cacheline ping-pong
> under contention", but "this change shouldn't cause a significant slowdown."
> 
> With this patch applied, I can't trigger that bug any more.
> 
> Reported-by: Brian Behlendorf <behlendorf1@llnl.gov>
> Cc: Ben Woodard <bwoodard@llnl.gov>
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: WANG Cong <amwang@redhat.com>
> 
> ---
> diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
> index 6c3c0f6..1a65776 100644
> --- a/include/linux/rwsem-spinlock.h
> +++ b/include/linux/rwsem-spinlock.h
> @@ -71,7 +71,14 @@ extern void __downgrade_write(struct rw_semaphore *sem);
>  
>  static inline int rwsem_is_locked(struct rw_semaphore *sem)
>  {
> -	return (sem->activity != 0);
> +	int ret;
> +
> +	if (spin_trylock_irq(&sem->wait_lock)) {
> +		ret = !(list_empty(&sem->wait_list) && sem->activity == 0);
> +		spin_unlock_irq(&sem->wait_lock);
> +		return ret;
> +	}
> +	return 1;
>  }
>  
>  #endif /* __KERNEL__ */
> diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
> index 9df3ca5..234d83f 100644
> --- a/lib/rwsem-spinlock.c
> +++ b/lib/rwsem-spinlock.c
> @@ -78,7 +78,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  
>  	/* grant an infinite number of read locks to the front of the queue */
>   dont_wake_writers:
> -	woken = 0;
> +	/*
> +	 * we increase ->activity just to make rwsem_is_locked() happy,
> +	 * to avoid potential cache line ping-pong, we don't do this
> +	 * within the following loop.
> +	 */
> +	woken = sem->activity++;
>  	while (waiter->flags & RWSEM_WAITING_FOR_READ) {
>  		struct list_head *next = waiter->list.next;
>  
> @@ -94,7 +99,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  		waiter = list_entry(next, struct rwsem_waiter, list);
>  	}
>  
> -	sem->activity += woken;
> +	sem->activity = woken;
>  
>   out:
>  	return sem;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch v3] rwsem: fix rwsem_is_locked() bugs
  2009-10-06  6:55 [Patch v3] rwsem: fix rwsem_is_locked() bugs Amerigo Wang
  2009-10-07  9:41 ` Amerigo Wang
@ 2009-10-07 12:19 ` David Howells
  2009-10-08  9:15   ` Amerigo Wang
  1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2009-10-07 12:19 UTC (permalink / raw)
  To: Amerigo Wang; +Cc: dhowells, linux-kernel, Ben Woodard, akpm, Brian Behlendorf

Amerigo Wang <amwang@redhat.com> wrote:

>  static inline int rwsem_is_locked(struct rw_semaphore *sem)
>  {
> -	return (sem->activity != 0);
> +	int ret;
> +
> +	if (spin_trylock_irq(&sem->wait_lock)) {
> +		ret = !(list_empty(&sem->wait_list) && sem->activity == 0);
> +		spin_unlock_irq(&sem->wait_lock);
> +		return ret;
> +	}
> +	return 1;
>  }

Yep...  This seems a reasonable approach, though I contend that if you're
holding the spinlock, then sem->wait_list _must_ be empty if sem->activity is
0 - so that half of the test is redundant.

sem->activity == 0 and sem->wait_list not being empty is a transitional state
that can only occur in ups and downgrades whilst they hold the spinlock.

> diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
> index 9df3ca5..234d83f 100644
> --- a/lib/rwsem-spinlock.c
> +++ b/lib/rwsem-spinlock.c
> @@ -78,7 +78,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  
>  	/* grant an infinite number of read locks to the front of the queue */
>   dont_wake_writers:
> -	woken = 0;
> +	/*
> +	 * we increase ->activity just to make rwsem_is_locked() happy,
> +	 * to avoid potential cache line ping-pong, we don't do this
> +	 * within the following loop.
> +	 */
> +	woken = sem->activity++;
>  	while (waiter->flags & RWSEM_WAITING_FOR_READ) {
>  		struct list_head *next = waiter->list.next;
>  
> @@ -94,7 +99,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  		waiter = list_entry(next, struct rwsem_waiter, list);
>  	}
>  
> -	sem->activity += woken;
> +	sem->activity = woken;
>  
>   out:
>  	return sem;

This change to __rwsem_do_wake() is all unnecessary - you're defending against
the test of sem->activity by rwsem_is_locked() - but that now happens with the
spinlock held.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch v3] rwsem: fix rwsem_is_locked() bugs
  2009-10-07 12:19 ` David Howells
@ 2009-10-08  9:15   ` Amerigo Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Amerigo Wang @ 2009-10-08  9:15 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Ben Woodard, akpm, Brian Behlendorf

David Howells wrote:
> Amerigo Wang <amwang@redhat.com> wrote:
> 
>>  static inline int rwsem_is_locked(struct rw_semaphore *sem)
>>  {
>> -	return (sem->activity != 0);
>> +	int ret;
>> +
>> +	if (spin_trylock_irq(&sem->wait_lock)) {
>> +		ret = !(list_empty(&sem->wait_list) && sem->activity == 0);
>> +		spin_unlock_irq(&sem->wait_lock);
>> +		return ret;
>> +	}
>> +	return 1;
>>  }
> 
> Yep...  This seems a reasonable approach, though I contend that if you're
> holding the spinlock, then sem->wait_list _must_ be empty if sem->activity is
> 0 - so that half of the test is redundant.
> 
> sem->activity == 0 and sem->wait_list not being empty is a transitional state
> that can only occur in ups and downgrades whilst they hold the spinlock.
> 


Hmm, yeah...

>> diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
>> index 9df3ca5..234d83f 100644
>> --- a/lib/rwsem-spinlock.c
>> +++ b/lib/rwsem-spinlock.c
>> @@ -78,7 +78,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>>  
>>  	/* grant an infinite number of read locks to the front of the queue */
>>   dont_wake_writers:
>> -	woken = 0;
>> +	/*
>> +	 * we increase ->activity just to make rwsem_is_locked() happy,
>> +	 * to avoid potential cache line ping-pong, we don't do this
>> +	 * within the following loop.
>> +	 */
>> +	woken = sem->activity++;
>>  	while (waiter->flags & RWSEM_WAITING_FOR_READ) {
>>  		struct list_head *next = waiter->list.next;
>>  
>> @@ -94,7 +99,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>>  		waiter = list_entry(next, struct rwsem_waiter, list);
>>  	}
>>  
>> -	sem->activity += woken;
>> +	sem->activity = woken;
>>  
>>   out:
>>  	return sem;
> 
> This change to __rwsem_do_wake() is all unnecessary - you're defending against
> the test of sem->activity by rwsem_is_locked() - but that now happens with the
> spinlock held.

Ah, yes, I knew this, I kept this just for completeness.
I will remove this part then. :)

THanks!


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-10-08  9:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-06  6:55 [Patch v3] rwsem: fix rwsem_is_locked() bugs Amerigo Wang
2009-10-07  9:41 ` Amerigo Wang
2009-10-07 12:19 ` David Howells
2009-10-08  9:15   ` Amerigo Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox