From mboxrd@z Thu Jan 1 00:00:00 1970 From: Waiman Long Subject: Re: [PATCH v10 03/19] qspinlock: Add pending bit Date: Mon, 19 May 2014 16:17:23 -0400 Message-ID: <537A66D3.8070607@hp.com> References: <1399474907-22206-1-git-send-email-Waiman.Long@hp.com> <1399474907-22206-4-git-send-email-Waiman.Long@hp.com> <20140512152208.GA12309@potion.brq.redhat.com> <537276B4.10209@hp.com> <20140514165121.GA21370@potion.redhat.com> <20140514170016.GW30445@twins.programming.kicks-ass.net> <20140514191339.GA22813@potion.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; Format="flowed" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20140514191339.GA22813@potion.brq.redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: x86@kernel.org, Gleb Natapov , Peter Zijlstra , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Boris Ostrovsky , linux-arch@vger.kernel.org, kvm@vger.kernel.org, Raghavendra K T , Ingo Molnar , xen-devel@lists.xenproject.org, "Paul E. McKenney" , Rik van Riel , Konrad Rzeszutek Wilk , Scott J Norton , Paolo Bonzini , Thomas Gleixner , virtualization@lists.linux-foundation.org, Chegu Vinod , Oleg Nesterov , David Vrabel , Linus Torvalds List-Id: linux-arch.vger.kernel.org T24gMDUvMTQvMjAxNCAwMzoxMyBQTSwgUmFkaW0gS3LEjW3DocWZIHdyb3RlOgo+IDIwMTQtMDUt MTQgMTk6MDArMDIwMCwgUGV0ZXIgWmlqbHN0cmE6Cj4+IE9uIFdlZCwgTWF5IDE0LCAyMDE0IGF0 IDA2OjUxOjI0UE0gKzAyMDAsIFJhZGltIEtyxI1tw6HFmSB3cm90ZToKPj4+IE9rLgo+Pj4gSSd2 ZSBzZWVuIG1lcml0IGluIHB2cXNwaW5sb2NrIGV2ZW4gd2l0aCBzbGlnaHRseSBzbG93ZXIgZmly c3Qtd2FpdGVyLAo+Pj4gc28gSSB3b3VsZCBoYXZlIGhhcHBpbHkgc2FjcmlmaWNlZCB0aG9zZSBo b3JyaWJsZSBicmFuY2hlcy4KPj4+IChJIHByZWZlciBlbGVnYW50IHRvIG9wdGltaXplZCBjb2Rl LCBidXQgSSBjYW4gc2VlIHdoeSB3ZSB3YW50IHRvIGJlCj4+PiAgIHN0cmljdGx5IGJldHRlciB0 aGFuIHRpY2tldGxvY2suKQo+Pj4gUGV0ZXIgbWVudGlvbmVkIHRoYXQgd2UgYXJlIGZvY3VzaW5n IG9uIGJhcmUtbWV0YWwgcGF0Y2hlcywgc28gSSdsbAo+Pj4gd2l0aG9sZCBteSBvdGhlciBwYXJh dmlydCByYW50cyB1bnRpbCB0aGV5IGFyZSBwb2xpc2hlZC4KPiAoSXQgd2FzIGFuIGFtYmlndW91 cyBzZW50ZW5jZSwgSSBoYXZlIGNvbW1lbnRzIGZvciBsYXRlciBwYXRjaGVzLikKPgo+PiBXZWxs LCBwYXJhdmlydCBtdXN0IGhhcHBlbiB0b28sIGJ1dCBjb21lcyBsYXRlciBpbiB0aGlzIHNlcmll cywgcGF0Y2ggMwo+PiB3aGljaCB3ZSdyZSByZXBseWluZyB0byBpcyBzdGlsbCB2ZXJ5IG11Y2gg aW4gdGhlIGJhcmUgbWV0YWwgcGFydCBvZiB0aGUKPj4gc2VyaWVzLgo+IChJIHRoaW5rIHRoYXQg YmFyZSBtZXRhbCBzcGFucyB0aGUgZmlyc3QgNyBwYXRjaGVzLikKPgo+PiBJJ3ZlIG5vdCBoYWQg dGltZSB5ZXQgdG8gZGVjb2RlIGFsbCB0aGF0IFdhaW1hbiBoYXMgZG9uZSB0byBtYWtlCj4+IHBh cmF2aXJ0IHdvcmsuCj4+Cj4+IEJ1dCBhcyBhIGdlbmVyYWwgcnVsZSBJIGxpa2UgcGF0Y2hlcyB0 aGF0IHN0YXJ0IHdpdGggc29tZXRoaW5nIHNpbXBsZQo+PiBhbmQgd29ya2luZyBhbmQgdGhlbiBv cHRpbWl6ZSBpdCwgdGhpcyBzZXJpZXMgZG9lc24ndCBzZWVtIHRvIHF1aXRlCj4+IGdyYXNwIHRo YXQuCj4+Cj4+PiBBbmQgdG8gZm9yY2VmdWxseSBicmluZyB0aGlzIHRocmVhZCBhIGxpdHRsZSBi aXQgb24tdG9waWM6Cj4+Pgo+Pj4gUGVuZGluZy1iaXQgaXMgZWZmZWN0aXZlbHkgYSBsb2NrIGlu IGEgbG9jaywgc28gSSB3YXMgd29uZGVyaW5nIHdoeQo+Pj4gZG9uJ3Qgd2UgdXNlIG1vcmUgcGVu ZGluZyBiaXRzOyBhZHZhbnRhZ2VzIGFyZSB0aGUgc2FtZSwganVzdCBkaW1pbmlzaGVkCj4+PiBi eSB0aGUgcHJvYmFiaWxpdHkgb2YgaGF2aW5nIGFuIGlkZWFsbHkgY29udGVuZGVkIGxvY2s6Cj4+ PiAgIC0gd2FpdGVyIHdvbid0IGJlIGJsb2NrZWQgb24gUkFNIGFjY2VzcyBpZiBjcml0aWNhbCBz ZWN0aW9uIChvciBtb3JlKQo+Pj4gICAgIGVuZHMgc29vbmVyCj4+PiAgIC0gc29tZSB1bmx1Y2t5 IGNhY2hlbGluZSBpcyBub3QgZm9yZ290dGVuCj4+PiAgIC0gZmFzdGVyIHVubG9jayAobm8gbmVl ZCBmb3IgdGFpbCBvcGVyYXRpb25zKQo+Pj4gKC0gPykKPj4+IGRpc2FkdmFudGFnZXMgYXJlIG1h Z25pZmllZDoKPj4+ICAgLSBpbmNyZWFzZWQgY29tcGxleGl0eQo+Pj4gICAtIGludGVuc2UgY2Fj aGVsaW5lIHNoYXJpbmcKPj4+ICAgICAoSSB0aG91Z2h0IHRoYXQgdGhpcyBpcyB0aGUgbWFpbiBk aXNhZHZhbnRhZ2Ugb2YgdGlja2V0bG9jay4pCj4+PiAoLSA/KQo+Pj4KPj4+IE9uZSBiaXQgc3Rp bGwgaW1wcm92ZWQgcGVyZm9ybWFuY2UsIGlzIGl0IHRoZSBiZXN0IHdlIGdvdD8KPj4gU28sIHRo ZSBhZHZhbnRhZ2Ugb2Ygb25lIGJpdCBpcyB0aGF0IGlmIHdlIHVzZSBhIHdob2xlIGJ5dGUgZm9y IDEgYml0IHdlCj4+IGNhbiBhdm9pZCBzb21lIGF0b21pYyBvcHMuCj4+Cj4+IFRoZSBlbnRpcmUg cmVhc29uIGZvciB0aGlzIGluLXdvcmQgc3Bpbm5lciBpcyB0byBhbW9ydGl6ZSB0aGUgY29zdCBv Zgo+PiBoaXR0aW5nIHRoZSBleHRlcm5hbCBub2RlIGNhY2hlbGluZS4KPiBFdmVyeSBwZW5kaW5n IENQVSByZW1vdmVzIG9uZSBsZW5ndGggb2YgdGhlIGNyaXRpY2FsIHNlY3Rpb24gZnJvbSB0aGUK PiBkZWxheSBjYXVzZWQgYnkgY2FjaGVsaW5lIHByb3BhZ2F0aW9uIGFuZCByZWFsbHkgY29sZCBj YWNoZSBpcwo+IGh1bmRyZWRzKD8pIG9mIGN5Y2xlcywgc28gd2UgY291bGQgYnVybiBzb21lIHRv IGVuc3VyZSBjb3JyZWN0bmVzcyBhbmQKPiBzdGlsbCBiZSB3YWl0aW5nIHdoZW4gdGhlIGZpcnN0 IHBlbmRpbmcgQ1BVIHVubG9ja3MuCgpBc3N1bWluZyB0aGF0IHRha2luZyBhIHNwaW5sb2NrIGlz IGZhaXJseSBmcmVxdWVudCBpbiB0aGUga2VybmVsLCB0aGUgCm5vZGUgc3RydWN0dXJlIGNhY2hl bGluZSB3b24ndCBiZSBzbyBjb2xkIGFmdGVyIGFsbC4KCj4+IFNvIHRyYWRpdGlvbmFsIGxvY2tz IGxpa2UgdGVzdC1hbmQtdGVzdCBhbmQgdGhlIHRpY2tldCBsb2NrIG9ubHkgZXZlcgo+PiBhY2Nl c3MgdGhlIHNwaW5sb2NrIHdvcmQgaXRzZWYsIHRoaXMgTUNTIHN0eWxlIHF1ZXVlaW5nIGxvY2sg aGFzIGEKPj4gc2Vjb25kIChhbmQsIHNlZSBteSBvdGhlciByYW50cyBpbiB0aGlzIHRocmVhZCwg d2hlbiBkb25lIHdyb25nIG1vcmUKPj4gdGhhbiAyKSBjYWNoZWxpbmUgdG8gdG91Y2guCj4+Cj4+ IFRoYXQgc2FpZCwgYWxsIG91ciBiZW5jaG1hcmtpbmcgaXMgcHJldHR5IG11Y2ggZm9yIHRoZSBj YWNoZS1ob3QgY2FzZSwKPj4gc28gSSdtIG5vdCBlbnRpcmVseSBjb252aW5jZWQgeWV0IHRoYXQg dGhlIG9uZSBwZW5kaW5nIGJpdCBtYWtlcyB1cCBmb3IKPj4gaXQsIGl0IGRvZXMgaW4gdGhlIGNh Y2hlLWhvdCBjYXNlLgo+IFllYWgsIHdlIHByb2JhYmx5IHVzZSB0aGUgZmFzdGVyIHByZS1sb2Nr IHF1aXRlIGEgbG90Lgo+IENvdmVyIGxldHRlciBzdGF0ZXMgdGhhdCBxdWV1ZSBkZXB0aCAxLTMg aXMgYSBiaXQgc2xvd2VyIHRoYW4gdGlja2V0Cj4gc3BpbmxvY2ssIHNvIHdlIG1pZ2h0IG5vdCBi ZSBsb3NpbmcgaWYgd2UgaW1wbGVtZW50ZWQgYSBmYXN0ZXIKPiBpbi13b3JkLWxvY2sgb2YgdGhp cyBjYXBhY2l0eS4gIChOb3QgdGhhdCBJJ20gYSBmYW4gb2YgdGhlIGh5YnJpZCBsb2NrLikKCkkg aGFkIHRyaWVkIGFuIGV4cGVyaW1lbnRhbCBwYXRjaCB3aXRoIDIgcGVuZGluZyBiaXRzLiBIb3dl dmVyLCB0aGUgCmJlbmNobWFyayB0ZXN0IHRoYXQgSSB1c2VkIHNob3cgdGhlIHBlcmZvcm1hbmNl IGlzIGV2ZW4gd29yc2UgdGhhbiAKd2l0aG91dCBhbnkgcGVuZGluZyBiaXQuIEkgcHJvYmFibHkg bmVlZCB0byByZXZpc2l0IHRoYXQgbGF0ZXIgYXMgdG8gd2h5IAp0aGlzIGlzIHRoZSBjYXNlLiBB cyBmb3Igbm93LCBJIHdpbGwgZm9jdXMgb24ganVzdCBoYXZpbmcgb25lIHBlbmRpbmcgCmJpdC4g SWYgd2UgY291bGQgZmluZCBhIHdheSB0byBnZXQgYmV0dGVyIHBlcmZvcm1hbmNlIG91dCBvZiBt b3JlIHRoYW4gMSAKcGVuZGluZyBiaXQgbGF0ZXIgb24sIHdlIGNvdWxkIGFsd2F5cyBzdWJtaXQg YW5vdGhlciBwYXRjaCB0byBkbyB0aGF0LgoKPj4gQnV0Li4uIHdyaXRpbmcgY2FjaGUtY29sZCBi ZW5jaG1hcmtzIGlzIF9oYXJkXyA6Lwo+IFdvdWxkbid0IGNsZmx1c2ggb2YgdGhlIHNlY29uZCBj YWNoZWxpbmUgYmVmb3JlIHRyeWluZyBmb3IgdGhlIGxvY2sgZ2l2ZQo+IHVzIGEgcm91Z2ggZXN0 aW1hdGU/CgpjbGZsdXNoIGlzIGEgdmVyeSBleHBlbnNpdmUgb3BlcmF0aW9uIGFuZCBJIGRvdWJ0 IHRoYXQgaXQgd2lsbCBiZSAKaW5kaWNhdGl2ZSBvZiByZWFsIGxpZmUgcGVyZm9ybWFuY2UgYXQg YWxsLiBCVFcsIHRoZXJlIGlzIG5vIHdheSB0byAKd3JpdGUgYSBjYWNoZS1jb2xkIGJlbmNobWFy ayBmb3IgdGhhdCAybmQgY2FjaGVsaW5lIGFzIGFueSBjYWxsIHRvIApzcGluX2xvY2sgd2lsbCBs aWtlbHkgdG8gYWNjZXNzIGl0IGlmIHRoZXJlIGlzIGVub3VnaCBjb250ZW50aW9uLgoKLUxvbmdt YW4KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fClZpcnR1 YWxpemF0aW9uIG1haWxpbmcgbGlzdApWaXJ0dWFsaXphdGlvbkBsaXN0cy5saW51eC1mb3VuZGF0 aW9uLm9yZwpodHRwczovL2xpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5m by92aXJ0dWFsaXphdGlvbg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from g4t3427.houston.hp.com ([15.201.208.55]:19045 "EHLO g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932511AbaESURb (ORCPT ); Mon, 19 May 2014 16:17:31 -0400 Message-ID: <537A66D3.8070607@hp.com> Date: Mon, 19 May 2014 16:17:23 -0400 From: Waiman Long MIME-Version: 1.0 Subject: Re: [PATCH v10 03/19] qspinlock: Add pending bit References: <1399474907-22206-1-git-send-email-Waiman.Long@hp.com> <1399474907-22206-4-git-send-email-Waiman.Long@hp.com> <20140512152208.GA12309@potion.brq.redhat.com> <537276B4.10209@hp.com> <20140514165121.GA21370@potion.redhat.com> <20140514170016.GW30445@twins.programming.kicks-ass.net> <20140514191339.GA22813@potion.brq.redhat.com> In-Reply-To: <20140514191339.GA22813@potion.brq.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, Paolo Bonzini , Konrad Rzeszutek Wilk , Boris Ostrovsky , "Paul E. McKenney" , Rik van Riel , Linus Torvalds , Raghavendra K T , David Vrabel , Oleg Nesterov , Gleb Natapov , Scott J Norton , Chegu Vinod Message-ID: <20140519201723.PfMagbhP7zGVxDuwdl_ZczzoDnjzEO7XczDP9yCojXw@z> On 05/14/2014 03:13 PM, Radim Krčmář wrote: > 2014-05-14 19:00+0200, Peter Zijlstra: >> On Wed, May 14, 2014 at 06:51:24PM +0200, Radim Krčmář wrote: >>> Ok. >>> I've seen merit in pvqspinlock even with slightly slower first-waiter, >>> so I would have happily sacrificed those horrible branches. >>> (I prefer elegant to optimized code, but I can see why we want to be >>> strictly better than ticketlock.) >>> Peter mentioned that we are focusing on bare-metal patches, so I'll >>> withold my other paravirt rants until they are polished. > (It was an ambiguous sentence, I have comments for later patches.) > >> Well, paravirt must happen too, but comes later in this series, patch 3 >> which we're replying to is still very much in the bare metal part of the >> series. > (I think that bare metal spans the first 7 patches.) > >> I've not had time yet to decode all that Waiman has done to make >> paravirt work. >> >> But as a general rule I like patches that start with something simple >> and working and then optimize it, this series doesn't seem to quite >> grasp that. >> >>> And to forcefully bring this thread a little bit on-topic: >>> >>> Pending-bit is effectively a lock in a lock, so I was wondering why >>> don't we use more pending bits; advantages are the same, just diminished >>> by the probability of having an ideally contended lock: >>> - waiter won't be blocked on RAM access if critical section (or more) >>> ends sooner >>> - some unlucky cacheline is not forgotten >>> - faster unlock (no need for tail operations) >>> (- ?) >>> disadvantages are magnified: >>> - increased complexity >>> - intense cacheline sharing >>> (I thought that this is the main disadvantage of ticketlock.) >>> (- ?) >>> >>> One bit still improved performance, is it the best we got? >> So, the advantage of one bit is that if we use a whole byte for 1 bit we >> can avoid some atomic ops. >> >> The entire reason for this in-word spinner is to amortize the cost of >> hitting the external node cacheline. > Every pending CPU removes one length of the critical section from the > delay caused by cacheline propagation and really cold cache is > hundreds(?) of cycles, so we could burn some to ensure correctness and > still be waiting when the first pending CPU unlocks. Assuming that taking a spinlock is fairly frequent in the kernel, the node structure cacheline won't be so cold after all. >> So traditional locks like test-and-test and the ticket lock only ever >> access the spinlock word itsef, this MCS style queueing lock has a >> second (and, see my other rants in this thread, when done wrong more >> than 2) cacheline to touch. >> >> That said, all our benchmarking is pretty much for the cache-hot case, >> so I'm not entirely convinced yet that the one pending bit makes up for >> it, it does in the cache-hot case. > Yeah, we probably use the faster pre-lock quite a lot. > Cover letter states that queue depth 1-3 is a bit slower than ticket > spinlock, so we might not be losing if we implemented a faster > in-word-lock of this capacity. (Not that I'm a fan of the hybrid lock.) I had tried an experimental patch with 2 pending bits. However, the benchmark test that I used show the performance is even worse than without any pending bit. I probably need to revisit that later as to why this is the case. As for now, I will focus on just having one pending bit. If we could find a way to get better performance out of more than 1 pending bit later on, we could always submit another patch to do that. >> But... writing cache-cold benchmarks is _hard_ :/ > Wouldn't clflush of the second cacheline before trying for the lock give > us a rough estimate? clflush is a very expensive operation and I doubt that it will be indicative of real life performance at all. BTW, there is no way to write a cache-cold benchmark for that 2nd cacheline as any call to spin_lock will likely to access it if there is enough contention. -Longman