Re: [PATCH] arch/tile: fix rwlock so would-be write lockers don't block new readers

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chris Metcalf <cmetcalf@tilera.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	"Américo Wang" <xiyou.wangcong@gmail.com>,
	netdev <netdev@vger.kernel.org>, "Cypher Wu" <cypher.w@gmail.com>
Subject: Re: [PATCH] arch/tile: fix rwlock so would-be write lockers don't block new readers
Date: Mon, 15 Nov 2010 10:10:55 -0500	[thread overview]
Message-ID: <4CE14D7F.1010307@tilera.com> (raw)
In-Reply-To: <1289832730.2607.87.camel@edumazet-laptop>

On 11/15/2010 9:52 AM, Eric Dumazet wrote:
> Le lundi 15 novembre 2010 à 09:18 -0500, Chris Metcalf a écrit :
>> This avoids a deadlock in the IGMP code where one core gets a read
>> lock, another core starts trying to get a write lock (thus blocking
>> new readers), and then the first core tries to recursively re-acquire
>> the read lock.
>>
>> We still try to preserve some degree of balance by giving priority
>> to additional write lockers that come along while the lock is held
>> for write, so they can all complete quickly and return the lock to
>> the readers.
>>
>> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
>> ---
>> This should apply relatively cleanly to 2.6.26.7 source code too.
>>
>>  arch/tile/lib/spinlock_32.c |   29 ++++++++++++++++++-----------
>>  1 files changed, 18 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
>> index 485e24d..5cd1c40 100644
>> --- a/arch/tile/lib/spinlock_32.c
>> +++ b/arch/tile/lib/spinlock_32.c
>> @@ -167,23 +167,30 @@ void arch_write_lock_slow(arch_rwlock_t *rwlock, u32 val)
>>  	 * when we compare them.
>>  	 */
>>  	u32 my_ticket_;
>> +	u32 iterations = 0;
>>  
>> -	/* Take out the next ticket; this will also stop would-be readers. */
>> -	if (val & 1)
>> -		val = get_rwlock(rwlock);
>> -	rwlock->lock = __insn_addb(val, 1 << WR_NEXT_SHIFT);
>> +	/*
>> +	 * Wait until there are no readers, then bump up the next
>> +	 * field and capture the ticket value.
>> +	 */
>> +	for (;;) {
>> +		if (!(val & 1)) {
>> +			if ((val >> RD_COUNT_SHIFT) == 0)
>> +				break;
>> +			rwlock->lock = val;
>> +		}
>> +		delay_backoff(iterations++);
> Are you sure a writer should have a growing delay_backoff() ?

We always do this bounded exponential backoff on all locking operations
that require memory-network traffic.  With 64 cores, it's possible
otherwise to get into a situation where the cores are attempting to acquire
the lock sufficiently aggressively that lock acquisition performance is
worse than it would be with backoff.  In any case this path is unlikely to
run many times, since it only triggers if two cores both try to pull a
ticket at the same time; it doesn't correspond to writers actually waiting
once they have their ticket, which is handled later in this function.

> It seems to me this only allow new readers to come (so adding more
> unfairness to the rwlock, that already favor readers against writer[s])

Well, that is apparently the required semantic.  Once there is one reader,
you must allow new readers, to handle the case of recursive
re-acquisition.  In principle you could imagine doing something like having
a bitmask of cores that held the readlock (instead of a count), and only
allowing recursive re-acquisition when a write lock request is pending, but
this would make the lock structure bigger, and I'm not sure it's worth it. 
x86 certainly doesn't bother.

> Maybe allow one cpu to spin, and eventually other 'writers' be queued ?

Other than the brief spin to acquire the ticket, writers don't actually
spin on the lock.  They just wait for their ticket to come up.  This does
require spinning on memory reads, but those are satisfied out of the local
core's cache, with the exception that each time a writer completes, the
cache line is invalidated on the readers, and they have to re-fetch it from
the home cache.

>> +		val = __insn_tns((int *)&rwlock->lock);
>> +	}
>>  
>> -	/* Extract my ticket value from the original word. */
>> +	/* Take out the next ticket and extract my ticket value. */
>> +	rwlock->lock = __insn_addb(val, 1 << WR_NEXT_SHIFT);
>>  	my_ticket_ = val >> WR_NEXT_SHIFT;
>>  
>> -	/*
>> -	 * Wait until the "current" field matches our ticket, and
>> -	 * there are no remaining readers.
>> -	 */
>> +	/* Wait until the "current" field matches our ticket. */
>>  	for (;;) {
>>  		u32 curr_ = val >> WR_CURR_SHIFT;
>> -		u32 readers = val >> RD_COUNT_SHIFT;
>> -		u32 delta = ((my_ticket_ - curr_) & WR_MASK) + !!readers;
>> +		u32 delta = ((my_ticket_ - curr_) & WR_MASK);
>>  		if (likely(delta == 0))
>>  			break;
>>  
>

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

next prev parent reply	other threads:[~2010-11-15 15:10 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-11 13:49 Kernel rwlock design, Multicore and IGMP Cypher Wu
2010-11-11 15:23 ` Eric Dumazet
2010-11-11 15:32   ` Eric Dumazet
2010-11-12  3:32   ` Cypher Wu
2010-11-12  6:28     ` Américo Wang
2010-11-12  7:13     ` Américo Wang
2010-11-12  7:27       ` Eric Dumazet
2010-11-12  8:19         ` Américo Wang
2010-11-12  9:09           ` Yong Zhang
2010-11-12  9:18             ` Américo Wang
2010-11-12 11:06               ` Cypher Wu
2010-11-13  6:35                 ` Américo Wang
2010-11-12 13:00               ` Yong Zhang
2010-11-13  6:28                 ` Américo Wang
2010-11-12  9:22           ` Eric Dumazet
2010-11-12  9:33             ` Américo Wang
2010-11-12 13:34             ` [PATCH net-next-2.6] igmp: RCU conversion of in_dev->mc_list Eric Dumazet
2010-11-12 14:26               ` Eric Dumazet
2010-11-12 15:46                 ` [PATCH net-next-2.6 V2] " Eric Dumazet
2010-11-12 21:19                   ` David Miller
2010-11-13  6:44                   ` Américo Wang
2010-11-13 22:54           ` Kernel rwlock design, Multicore and IGMP Peter Zijlstra
2010-11-12 11:10         ` Cypher Wu
2010-11-12 11:25           ` Eric Dumazet
2010-11-13 22:53     ` Peter Zijlstra
     [not found]     ` <ZXmP8hjgLHA.4648@exchange1.tad.internal.tilera.com>
2010-11-13 23:03       ` Chris Metcalf
2010-11-15  7:22         ` Cypher Wu
2010-11-15 11:18           ` Cypher Wu
2010-11-15 11:31             ` Eric Dumazet
2010-11-17  1:30               ` Cypher Wu
2010-11-17  4:43                 ` Eric Dumazet
2010-11-15 14:18           ` [PATCH] arch/tile: fix rwlock so would-be write lockers don't block new readers Chris Metcalf
2010-11-15 14:52             ` Eric Dumazet
2010-11-15 15:10               ` Chris Metcalf [this message]
2010-11-22  5:39             ` Cypher Wu
2010-11-22 13:35               ` Chris Metcalf
2010-11-23  1:36                 ` Cypher Wu
2010-11-23 21:02                   ` Chris Metcalf
2010-11-24  2:53                     ` Cypher Wu
2010-11-24 14:09                       ` Chris Metcalf
2010-11-24 16:37                         ` Cypher Wu
2010-11-13 22:52 ` Kernel rwlock design, Multicore and IGMP Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE14D7F.1010307@tilera.com \
    --to=cmetcalf@tilera.com \
    --cc=cypher.w@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox