Re: [RFC] Implement Batched (group) ticket lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: jeremy@goop.org, kvm@vger.kernel.org, peterz@infradead.org,
	virtualization@lists.linux-foundation.org,
	paul.gortmaker@windriver.com, hpa@zytor.com, ak@linux.intel.com,
	gleb@redhat.com, x86@kernel.org, mingo@redhat.com,
	xen-devel@lists.xenproject.org, paulmck@linux.vnet.ibm.com,
	konrad.wilk@oracle.com, oleg@redhat.com, davej@redhat.com,
	tglx@linutronix.de, fernando_b1@lab.ntt.co.jp,
	chegu_vinod@hp.com, waiman.long@hp.com,
	linux-kernel@vger.kernel.org, pbonzini@redhat.com,
	torvalds@linux-foundation.org
Subject: Re: [RFC] Implement Batched (group) ticket lock
Date: Thu, 29 May 2014 15:14:40 +0530	[thread overview]
Message-ID: <53870188.5060209@linux.vnet.ibm.com> (raw)
In-Reply-To: <53865B53.7050809@redhat.com>

On 05/29/2014 03:25 AM, Rik van Riel wrote:
> On 05/28/2014 08:16 AM, Raghavendra K T wrote:
>
> This patch looks very promising.

Thank you Rik.

[...]
>>
>> - My kernbench/ebizzy test on baremetal (32 cpu +ht sandybridge) did not seem to
>>    show the impact of extra cmpxchg. but there should be effect of extra cmpxchg.
>
> Canceled out by better NUMA locality?

Yes perhaps. it was even slightly better.

[...]
>> - we can further add dynamically changing batch_size implementation (inspiration and
>>    hint by Paul McKenney) as necessary.
>
> I could see a larger batch size being beneficial.
>
> Currently the maximum wait time for a spinlock on a system
> with N CPUs is N times the length of the largest critical
> section.
>
> Having the batch size set equal to the number of CPUs would only
> double that, and better locality (CPUs local to the current
> lock holder winning the spinlock operation) might speed things
> up enough to cancel that part of that out again...

having batch size = number of cpus would definitely help contended cases
especially on larger machines (by my experience with testing on a 4
node 32 core machine). +ht case should make it even more
beneficial.

My only botheration was overhead in undercommit cases because of extra
cmpxchg.
So may be batch_size = total cpus / numa node be optimal?...

[...]
>> +#define TICKET_LOCK_INC_SHIFT 1
>> +#define __TICKET_LOCK_TAIL_INC (1<<TICKET_LOCK_INC_SHIFT)
>> +
>>   #ifdef CONFIG_PARAVIRT_SPINLOCKS
>> -#define __TICKET_LOCK_INC	2
>>   #define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
>>   #else
>> -#define __TICKET_LOCK_INC	1
>>   #define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
>>   #endif
>
> For the !CONFIG_PARAVIRT case, TICKET_LOCK_INC_SHIFT used to be 0,
> now you are making it one. Probably not an issue, since even people
> who compile with 128 < CONFIG_NR_CPUS <= 256 will likely have their
> spinlocks padded out to 32 or 64 bits anyway in most data structures.

Yes..

[...]
>> +#define TICKET_BATCH    0x4 /* 4 waiters can contend simultaneously */
>> +#define TICKET_LOCK_BATCH_MASK (~(TICKET_BATCH<<TICKET_LOCK_INC_SHIFT) + \
>> +				  TICKET_LOCK_TAIL_INC - 1)
>
> I do not see the value in having TICKET_BATCH declared with a
> hexadecimal number,

yes.. It had only helped me to make the idea readable to myself, I
could get rid of this if needed.

and it may be worth making sure the code
> does not compile if someone tried a TICKET_BATCH value that
> is not a power of 2.

I agree.  will have BUILD_BUG for not power of 2 in next version.
But yes it reminds me that I wanted to have TICKET_BATCH = 1 for
!CONFIG_PARAVIRT so that we continue to have original fair lock version.
Does that make sense? I left it after thinking about same kernel running
on host/guest which would anyway will have CONFIG_PARAVIRT on.

WARNING: multiple messages have this Message-ID (diff)

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	konrad.wilk@oracle.com, pbonzini@redhat.com, gleb@redhat.com,
	peterz@infradead.org, paulmck@linux.vnet.ibm.com,
	torvalds@linux-foundation.org, waiman.long@hp.com,
	davej@redhat.com, oleg@redhat.com, x86@kernel.org,
	jeremy@goop.org, paul.gortmaker@windriver.com,
	ak@linux.intel.com, jasowang@redhat.com,
	fernando_b1@lab.ntt.co.jp, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	xen-devel@lists.xenproject.org, mtosatti@redhat.com,
	chegu_vinod@hp.com
Subject: Re: [RFC] Implement Batched (group) ticket lock
Date: Thu, 29 May 2014 15:14:40 +0530	[thread overview]
Message-ID: <53870188.5060209@linux.vnet.ibm.com> (raw)
In-Reply-To: <53865B53.7050809@redhat.com>

On 05/29/2014 03:25 AM, Rik van Riel wrote:
> On 05/28/2014 08:16 AM, Raghavendra K T wrote:
>
> This patch looks very promising.

Thank you Rik.

[...]
>>
>> - My kernbench/ebizzy test on baremetal (32 cpu +ht sandybridge) did not seem to
>>    show the impact of extra cmpxchg. but there should be effect of extra cmpxchg.
>
> Canceled out by better NUMA locality?

Yes perhaps. it was even slightly better.

[...]
>> - we can further add dynamically changing batch_size implementation (inspiration and
>>    hint by Paul McKenney) as necessary.
>
> I could see a larger batch size being beneficial.
>
> Currently the maximum wait time for a spinlock on a system
> with N CPUs is N times the length of the largest critical
> section.
>
> Having the batch size set equal to the number of CPUs would only
> double that, and better locality (CPUs local to the current
> lock holder winning the spinlock operation) might speed things
> up enough to cancel that part of that out again...

having batch size = number of cpus would definitely help contended cases
especially on larger machines (by my experience with testing on a 4
node 32 core machine). +ht case should make it even more
beneficial.

My only botheration was overhead in undercommit cases because of extra
cmpxchg.
So may be batch_size = total cpus / numa node be optimal?...

[...]
>> +#define TICKET_LOCK_INC_SHIFT 1
>> +#define __TICKET_LOCK_TAIL_INC (1<<TICKET_LOCK_INC_SHIFT)
>> +
>>   #ifdef CONFIG_PARAVIRT_SPINLOCKS
>> -#define __TICKET_LOCK_INC	2
>>   #define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
>>   #else
>> -#define __TICKET_LOCK_INC	1
>>   #define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
>>   #endif
>
> For the !CONFIG_PARAVIRT case, TICKET_LOCK_INC_SHIFT used to be 0,
> now you are making it one. Probably not an issue, since even people
> who compile with 128 < CONFIG_NR_CPUS <= 256 will likely have their
> spinlocks padded out to 32 or 64 bits anyway in most data structures.

Yes..

[...]
>> +#define TICKET_BATCH    0x4 /* 4 waiters can contend simultaneously */
>> +#define TICKET_LOCK_BATCH_MASK (~(TICKET_BATCH<<TICKET_LOCK_INC_SHIFT) + \
>> +				  TICKET_LOCK_TAIL_INC - 1)
>
> I do not see the value in having TICKET_BATCH declared with a
> hexadecimal number,

yes.. It had only helped me to make the idea readable to myself, I
could get rid of this if needed.

and it may be worth making sure the code
> does not compile if someone tried a TICKET_BATCH value that
> is not a power of 2.

I agree.  will have BUILD_BUG for not power of 2 in next version.
But yes it reminds me that I wanted to have TICKET_BATCH = 1 for
!CONFIG_PARAVIRT so that we continue to have original fair lock version.
Does that make sense? I left it after thinking about same kernel running
on host/guest which would anyway will have CONFIG_PARAVIRT on.

next prev parent reply	other threads:[~2014-05-29  9:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 12:16 [RFC] Implement Batched (group) ticket lock Raghavendra K T
2014-05-28 12:16 ` Raghavendra K T
2014-05-28 12:16 ` Raghavendra K T
2014-05-28 21:55 ` Rik van Riel
2014-05-28 21:55   ` Rik van Riel
2014-05-28 22:19   ` Linus Torvalds
2014-05-28 22:19   ` Linus Torvalds
2014-05-28 22:19     ` Linus Torvalds
2014-05-28 22:29     ` Thomas Gleixner
2014-05-28 22:29     ` Thomas Gleixner
2014-05-28 22:29       ` Thomas Gleixner
2014-05-29  1:18     ` Rik van Riel
2014-05-29  1:18       ` Rik van Riel
2014-05-29  1:18     ` Rik van Riel
2014-05-29  9:44   ` Raghavendra K T [this message]
2014-05-29  9:44     ` Raghavendra K T
2014-05-29  9:44   ` Raghavendra K T
2014-05-28 21:55 ` Rik van Riel
2014-05-29  6:46 ` Peter Zijlstra
2014-05-29  6:46 ` Peter Zijlstra
2014-05-29  6:46   ` Peter Zijlstra
2014-05-29  9:51   ` Raghavendra K T
2014-05-29  9:51     ` Raghavendra K T
2014-05-29  9:51   ` Raghavendra K T
2014-05-29 22:45 ` Waiman Long
2014-05-29 22:45   ` Waiman Long
2014-05-30  8:53   ` Raghavendra K T
2014-05-30  8:53     ` Raghavendra K T
2014-05-30  8:53   ` Raghavendra K T
2014-05-29 22:45 ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2014-05-28 12:16 Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53870188.5060209@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=chegu_vinod@hp.com \
    --cc=davej@redhat.com \
    --cc=fernando_b1@lab.ntt.co.jp \
    --cc=gleb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.