From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [RFC 08/07] qspinlock: integrate pending bit into queue Date: Wed, 21 May 2014 19:02:27 +0200 Message-ID: <20140521170226.GA9241@potion.brq.redhat.com> References: <1399474907-22206-1-git-send-email-Waiman.Long@hp.com> <1399474907-22206-4-git-send-email-Waiman.Long@hp.com> <20140512152208.GA12309@potion.brq.redhat.com> <537276B4.10209@hp.com> <20140514165121.GA21370@potion.redhat.com> <20140514170016.GW30445@twins.programming.kicks-ass.net> <20140514191339.GA22813@potion.brq.redhat.com> <537A66D3.8070607@hp.com> <20140521164930.GA26199@potion.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20140521164930.GA26199@potion.brq.redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Waiman Long Cc: x86@kernel.org, Gleb Natapov , Peter Zijlstra , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Boris Ostrovsky , linux-arch@vger.kernel.org, kvm@vger.kernel.org, Raghavendra K T , Ingo Molnar , xen-devel@lists.xenproject.org, "Paul E. McKenney" , Rik van Riel , Konrad Rzeszutek Wilk , Scott J Norton , Paolo Bonzini , Thomas Gleixner , virtualization@lists.linux-foundation.org, Chegu Vinod , Oleg Nesterov , David Vrabel , Linus Torvalds List-Id: linux-arch.vger.kernel.org MjAxNC0wNS0yMSAxODo0OSswMjAwLCBSYWRpbSBLcsSNbcOhxZk6Cj4gMjAxNC0wNS0xOSAxNjox Ny0wNDAwLCBXYWltYW4gTG9uZzoKPiA+ICAgICAgIEFzIGZvciBub3csIEkgd2lsbCBmb2N1cyBv biBqdXN0IGhhdmluZyBvbmUgcGVuZGluZyBiaXQuCj4gCj4gSSdsbCB0aHJvdyBzb21lIGlkZWFz IGF0IGl0LAoKT25lIG9mIHRoZSBpZGVhcyBmb2xsb3dzOyBpdCBzZWVtcyBzb3VuZCwgYnV0IEkg aGF2ZW4ndCBiZW5jaG1hcmtlZCBpdAp0aG9yb3VnaGx5LiAoV2FzdGVkIGEgbG90IG9mIHRpbWUg Ynkgd3JpdGluZy9wbGF5aW5nIHdpdGggdmFyaW91cyB0b29scwphbmQgbG9hZHMuKQoKRGJlbmNo IG9uIGV4dDQgcmFtZGlzaywgaGFja2JlbmNoIGFuZCBlYml6enkgaGF2ZSBzaG93biBhIHNtYWxs CmltcHJvdmVtZW50IGluIHBlcmZvcm1hbmNlLCBidXQgbXkgbWFpbiBkcml2ZSB3YXMgdGhlIHdl aXJkIGRlc2lnbiBvZgpQZW5kaW5nIEJpdC4KRG9lcyB5b3VyIHNldHVwIHlpZWxkIGltcHJvdmVt ZW50cyB0b28/CihBIG1pbm9yIGNvZGUgc3dhcCBub3RlZCBpbiB0aGUgcGF0Y2ggbWlnaHQgaGVs cCB0aGluZ3MuKQoKSXQgaXMgbWVhbnQgdG8gYmUgYXBsaWVkIG9uIHRvcCBvZiBmaXJzdCA3IHBh dGNoZXMsIGJlY2F1c2UgdGhlIHZpcnQKc3R1ZmYgd291bGQganVzdCBnZXQgaW4gdGhlIHdheS4K SSBoYXZlIHByZXNlcnZlZCBhIGxvdCBvZiBkZWFkIGNvZGUgYW5kIG1hZGUgc29tZSBxdWVzdGlv bmFibGUgZGVjaXNpb25zCmp1c3QgdG8ga2VlcCB0aGUgZGlmZiBzaG9ydCBhbmQgaW4gb25lIHBh dGNoLCBzb3JyeSBhYm91dCB0aGF0LgoKKEl0IGlzIHdvcmsgaW4gcHJvZ3Jlc3MsIGRvdWJsZSBz bGFzaGVkIGxpbmVzIG1hcmsgcG9pbnRzIG9mIGludGVyZXN0LikKCi0tLTg8LS0tClBlbmRpbmcg Qml0IHdhc24ndCB1c2VkIGlmIHdlIGFscmVhZHkgaGFkIGEgbm9kZSBxdWV1ZSB3aXRoIG9uZSBj cHUsCndoaWNoIG1lYW50IHRoYXQgd2Ugc3VmZmVyZWQgZnJvbSB0aGVzZSBkcmF3YmFja3MgYWdh aW46CiAtIHVubG9jayBwYXRoIHdhcyBtb3JlIGNvbXBsaWNhdGVkCiAgIChsYXN0IHF1ZXVlZCBD UFUgaGFkIHRvIGNsZWFyIHRoZSB0YWlsKQogLSBjb2xkIG5vZGUgY2FjaGVsaW5lIHdhcyBqdXN0 IG9uZSBjcml0aWNhbCBzZWN0aW9uIGF3YXkKCldpdGggdGhpcyBwYXRjaCwgUGVuZGluZyBCaXQg aXMgdXNlZCBhcyBhbiBhZGRpdGlvbmFsIHN0ZXAgaW4gdGhlIHF1ZXVlLgpXYWl0aW5nIGZvciBs b2NrIGlzIHRoZSBzYW1lOiB3ZSB0cnkgUGVuZGluZyBCaXQgYW5kIGlmIGl0IGlzIHRha2VuLCB3 ZQphcHBlbmQgdG8gTm9kZSBRdWV1ZS4KVW5sb2NrIGlzIGRpZmZlcmVudDogcGVuZGluZyBDUFUg bW92ZXMgaW50byBjcml0aWNhbCBzZWN0aW9uIGFuZCBmaXJzdApDUFUgZnJvbSBOb2RlIFF1ZXVl IHRha2VzIFBlbmRpbmcgQml0IGFuZCBub3RpZmllcyBuZXh0IGluIGxpbmUgb3IKY2xlYXJzIHRo ZSB0YWlsLgoKVGhpcyBhbGxvd3MgdGhlIHBlbmRpbmcgQ1BVIHRvIHRha2UgdGhlIGxvY2sgYXMg ZmFzdCBhcyBwb3NzaWJsZSwKYmVjYXVzZSBhbGwgYm9va2tlZXBpbmcgd2FzIGRvbmUgd2hlbiBl bnRlcmluZyBQZW5kaW5nIFF1ZXVlLgpOb2RlIFF1ZXVlIG9wZXJhdGlvbnMgY2FuIGFsc28gYmUg c2xvd2VyIHdpdGhvdXQgYWZmZWN0aW5nIHRoZQpwZXJmb3JtYW5jZSwgYmVjYXVzZSB3ZSBoYXZl IGFuIGFkZGl0aW9uYWwgYnVmZmVyIG9mIG9uZSBjcml0aWNhbApzZWN0aW9uLgotLS0KIGtlcm5l bC9sb2NraW5nL3FzcGlubG9jay5jIHwgMTgwICsrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKy0tLS0tLS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDEzNSBpbnNlcnRpb25zKCspLCA0NSBk ZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9rZXJuZWwvbG9ja2luZy9xc3BpbmxvY2suYyBiL2tl cm5lbC9sb2NraW5nL3FzcGlubG9jay5jCmluZGV4IDBlZTFhMjMuLjc2Y2FmYjAgMTAwNjQ0Ci0t LSBhL2tlcm5lbC9sb2NraW5nL3FzcGlubG9jay5jCisrKyBiL2tlcm5lbC9sb2NraW5nL3FzcGlu bG9jay5jCkBAIC05OCw3ICs5OCwxMCBAQCBzdHJ1Y3QgX19xc3BpbmxvY2sgewogCXVuaW9uIHsK IAkJYXRvbWljX3QgdmFsOwogI2lmZGVmIF9fTElUVExFX0VORElBTgotCQl1OAkgbG9ja2VkOwor CQlzdHJ1Y3QgeworCQkJdTgJbG9ja2VkOworCQkJdTgJcGVuZGluZzsKKwkJfTsKIAkJc3RydWN0 IHsKIAkJCXUxNglsb2NrZWRfcGVuZGluZzsKIAkJCXUxNgl0YWlsOwpAQCAtMTA5LDcgKzExMiw4 IEBAIHN0cnVjdCBfX3FzcGlubG9jayB7CiAJCQl1MTYJbG9ja2VkX3BlbmRpbmc7CiAJCX07CiAJ CXN0cnVjdCB7Ci0JCQl1OAlyZXNlcnZlZFszXTsKKwkJCXU4CXJlc2VydmVkWzJdOworCQkJdTgJ cGVuZGluZzsKIAkJCXU4CWxvY2tlZDsKIAkJfTsKICNlbmRpZgpAQCAtMzE0LDYgKzMxOCw1OSBA QCBzdGF0aWMgaW5saW5lIGludCB0cnlsb2NrX3BlbmRpbmcoc3RydWN0IHFzcGlubG9jayAqbG9j aywgdTMyICpwdmFsKQogCXJldHVybiAxOwogfQogCisvLyBuaWNlIGNvbW1lbnQgaGVyZQorc3Rh dGljIGlubGluZSBib29sIHRyeWxvY2soc3RydWN0IHFzcGlubG9jayAqbG9jaywgdTMyICp2YWwp IHsKKwlpZiAoISgqdmFsID0gYXRvbWljX3JlYWQoJmxvY2stPnZhbCkpICYmCisJICAgKGF0b21p Y19jbXB4Y2hnKCZsb2NrLT52YWwsIDAsIF9RX0xPQ0tFRF9WQUwpID09IDApKSB7CisJCSp2YWwg PSBfUV9MT0NLRURfVkFMOworCQlyZXR1cm4gMTsKKwl9CisJcmV0dXJuIDA7Cit9CisKKy8vIGhl cmUKK3N0YXRpYyBpbmxpbmUgYm9vbCB0cnlwZW5kaW5nKHN0cnVjdCBxc3BpbmxvY2sgKmxvY2ss IHUzMiAqcHZhbCkgeworCXUzMiBvbGQsIHZhbCA9ICpwdmFsOworCS8vIG9wdGltaXplciBtaWdo dCBwcm9kdWNlIHRoZSBzYW1lIGNvZGUgaWYgd2UgdXNlICpwdmFsIGRpcmVjdGx5CisKKwkvLyB3 ZSBjb3VsZCB1c2UgJ2lmJyBhbmQgYSB4Y2hnIHRoYXQgdG91Y2hlcyBvbmx5IHRoZSBwZW5kaW5n IGJpdCB0bworCS8vIHNhdmUgc29tZSBjeWNsZXMgYXQgdGhlIHByaWNlIG9mIGEgbG9uZ2VyIGxp bmUgY3V0dGluZyB3aW5kb3cKKwkvLyAoYW5kIEkgdGhpbmsgaXQgd291bGQgYnVnIHdpdGhvdXQg Y2hhbmdpbmcgdGhlIHJlc3QpCisJd2hpbGUgKCEodmFsICYgKF9RX1BFTkRJTkdfTUFTSyB8IF9R X1RBSUxfTUFTSykpKSB7CisJCW9sZCA9IGF0b21pY19jbXB4Y2hnKCZsb2NrLT52YWwsIHZhbCwg dmFsIHwgX1FfUEVORElOR19NQVNLKTsKKwkJaWYgKG9sZCA9PSB2YWwpIHsKKwkJCSpwdmFsID0g dmFsIHwgX1FfUEVORElOR19NQVNLOworCQkJcmV0dXJuIDE7CisJCX0KKwkJdmFsID0gb2xkOwor CX0KKwkqcHZhbCA9IHZhbDsKKwlyZXR1cm4gMDsKK30KKworLy8gaGVyZQorc3RhdGljIGlubGlu ZSB2b2lkIHNldF9wZW5kaW5nKHN0cnVjdCBxc3BpbmxvY2sgKmxvY2ssIHU4IHBlbmRpbmcpCit7 CisJc3RydWN0IF9fcXNwaW5sb2NrICpsID0gKHZvaWQgKilsb2NrOworCisJLy8gdGFrZSBhIGxv b2sgaWYgdGhpcyBpcyBuZWNlc3NhcnksIGFuZCBpZiB3ZSBkb24ndCBoYXZlIGFuCisJLy8gYWJz dHJhY3Rpb24gYWxyZWFkeQorCWJhcnJpZXIoKTsKKwlBQ0NFU1NfT05DRShsLT5wZW5kaW5nKSA9 IHBlbmRpbmc7CisJYmFycmllcigpOworfQorCisvLyBhbmQgaGVyZQorc3RhdGljIGlubGluZSB1 MzIgY21weGNoZ190YWlsKHN0cnVjdCBxc3BpbmxvY2sgKmxvY2ssIHUzMiB0YWlsLCB1MzIgbmV3 dGFpbCkKKy8vIEFQSS1pbmNvbXBhdGlibGUgd2l0aCBzZXRfcGVuZGluZyBhbmQgdGhlIHNoaWZ0 aW5nIGlzIHVnbHksIHNvIEknZCByYXRoZXIKKy8vIHJlZmFjdG9yIHRoaXMgb25lLCB4Y2hnX3Rh aWwoKSBhbmQgZW5jb2RlX3RhaWwoKSAuLi4gYW5vdGhlciBkYXkKK3sKKwlzdHJ1Y3QgX19xc3Bp bmxvY2sgKmwgPSAodm9pZCAqKWxvY2s7CisKKwlyZXR1cm4gKHUzMiljbXB4Y2hnKCZsLT50YWls LCB0YWlsID4+IF9RX1RBSUxfT0ZGU0VULAorCSAgICAgICAgICAgICAgICAgICAgbmV3dGFpbCA+ PiBfUV9UQUlMX09GRlNFVCkgPDwgX1FfVEFJTF9PRkZTRVQ7Cit9CisKIC8qKgogICogcXVldWVf c3Bpbl9sb2NrX3Nsb3dwYXRoIC0gYWNxdWlyZSB0aGUgcXVldWUgc3BpbmxvY2sKICAqIEBsb2Nr OiBQb2ludGVyIHRvIHF1ZXVlIHNwaW5sb2NrIHN0cnVjdHVyZQpAQCAtMzI0LDIxICszODEsMjEg QEAgc3RhdGljIGlubGluZSBpbnQgdHJ5bG9ja19wZW5kaW5nKHN0cnVjdCBxc3BpbmxvY2sgKmxv Y2ssIHUzMiAqcHZhbCkKICAqICAgICAgICAgICAgICBmYXN0ICAgICA6ICAgIHNsb3cgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgOiAgICB1bmxvY2sKICAqICAgICAgICAgICAgICAg ICAgICAgICA6ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgOgogICog dW5jb250ZW5kZWQgICgwLDAsMCkgLTotLT4gKDAsMCwxKSAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS06LS0+ICgqLCosMCkKLSAqICAgICAgICAgICAgICAgICAgICAgICA6ICAgICAgIHwg Xi0tLS0tLS0tLi0tLS0tLS4gICAgICAgICAgICAgLyAgOgotICogICAgICAgICAgICAgICAgICAg ICAgIDogICAgICAgdiAgICAgICAgICAgXCAgICAgIFwgICAgICAgICAgICB8ICA6Ci0gKiBwZW5k aW5nICAgICAgICAgICAgICAgOiAgICAoMCwxLDEpICstLT4gKDAsMSwwKSAgIFwgICAgICAgICAg IHwgIDoKLSAqICAgICAgICAgICAgICAgICAgICAgICA6ICAgICAgIHwgXi0tJyAgICAgICAgICAg ICAgfCAgICAgICAgICAgfCAgOgotICogICAgICAgICAgICAgICAgICAgICAgIDogICAgICAgdiAg ICAgICAgICAgICAgICAgICB8ICAgICAgICAgICB8ICA6Ci0gKiB1bmNvbnRlbmRlZCAgICAgICAg ICAgOiAgICAobix4LHkpICstLT4gKG4sMCwwKSAtLScgICAgICAgICAgIHwgIDoKKyAqICAgICAg ICAgICAgICAgICAgICAgICA6ICAgICAgIHwgXi0tLS0tLS0tLiAgICAgICAgICAgICAgICAgICAg LyAgOgorICogICAgICAgICAgICAgICAgICAgICAgIDogICAgICAgdiAgICAgICAgICAgXCAgICAg ICAgICAgICAgICAgICB8ICA6CisgKiBwZW5kaW5nICAgICAgICAgICAgICAgOiAgICAoMCwxLDEp ICstLT4gKDAsMSwwKSAgICAgICAgICAgICAgIHwgIDoKKyAqICAgICAgICAgICAgICAgICAgICAg ICA6ICAgICAgIHwgXi0tJyAgICAgICAgIF4tLS0tLS0tLS0tLiAgICAgfCAgOgorICogICAgICAg ICAgICAgICAgICAgICAgIDogICAgICAgdiAgICAgICAgICAgICAgICAgICAgICAgICB8ICAgICB8 ICA6CisgKiB1bmNvbnRlbmRlZCAgICAgICAgICAgOiAgICAobix4LHkpICstLT4gKG4sMCx5KSAt LS0+ICgwLDEseSkgIHwgIDoKICAqICAgcXVldWUgICAgICAgICAgICAgICA6ICAgICAgIHwgXi0t JyAgICAgICAgICAgICAgICAgICAgICAgICAgfCAgOgogICogICAgICAgICAgICAgICAgICAgICAg IDogICAgICAgdiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB8ICA6Ci0gKiBjb250ZW5k ZWQgICAgICAgICAgICAgOiAgICAoKix4LHkpICstLT4gKCosMCwwKSAtLS0+ICgqLDAsMSkgLScg IDoKLSAqICAgcXVldWUgICAgICAgICAgICAgICA6ICAgICAgICAgXi0tJyAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgOgotICoKLSAqIFRoZSBwZW5kaW5nIGJpdCBwcm9jZXNzaW5nIGlzIGlu IHRoZSB0cnlsb2NrX3BlbmRpbmcoKSBmdW5jdGlvbgotICogd2hlcmVhcyB0aGUgdW5jb250ZW5k ZWQgYW5kIGNvbnRlbmRlZCBxdWV1ZSBwcm9jZXNzaW5nIGlzIGluIHRoZQotICogcXVldWVfc3Bp bl9sb2NrX3Nsb3dwYXRoKCkgZnVuY3Rpb24uCisgKiBjb250ZW5kZWQgICAgICAgICAgICAgOiAg ICAoKix4LHkpICstLT4gKCosMCx5KSAgICAgICgqLDAsMSkgLScgIDoKKyAqICAgcXVldWUgICAg ICAgICAgICAgICA6ICAgICAgICAgXi0tJyAgICAgICB8ICAgICAgICAgICAgXiAgICAgICAgOgor ICogICAgICAgICAgICAgICAgICAgICAgIDogICAgICAgICAgICAgICAgICAgIHYgICAgICAgICAg ICB8ICAgICAgICA6CisgKiAgICAgICAgICAgICAgICAgICAgICAgOiAgICAgICAgICAgICAgICAg KCosMSx5KSAtLS0+ICgqLDEsMCkgICAgIDoKKyAqIC8vIGRpYWdyYW0gbWlnaHQgYmUgd3Jvbmcg KGFuZCBkZWZpbml0ZWx5IGlzbid0IG9idmlvdXMpCiAgKgorICogLy8gZ2l2ZSBzb21lIGluc2ln aHQgYWJvdXQgdGhlIGh5YnJpZCBsb2NraW5nCiAgKi8KIHZvaWQgcXVldWVfc3Bpbl9sb2NrX3Ns b3dwYXRoKHN0cnVjdCBxc3BpbmxvY2sgKmxvY2ssIHUzMiB2YWwpCiB7CkBAIC0zNDgsOCArNDA1 LDIwIEBAIHZvaWQgcXVldWVfc3Bpbl9sb2NrX3Nsb3dwYXRoKHN0cnVjdCBxc3BpbmxvY2sgKmxv Y2ssIHUzMiB2YWwpCiAKIAlCVUlMRF9CVUdfT04oQ09ORklHX05SX0NQVVMgPj0gKDFVIDw8IF9R X1RBSUxfQ1BVX0JJVFMpKTsKIAotCWlmICh0cnlsb2NrX3BlbmRpbmcobG9jaywgJnZhbCkpCi0J CXJldHVybjsJLyogTG9jayBhY3F1aXJlZCAqLworCS8qCisJICogQ2hlY2sgaWYgbm90aGluZyBj aGFuZ2VkIHdoaWxlIHdlIHdlcmUgY2FsbGluZyB0aGlzIGZ1bmN0aW9uLgorCSAqIChDb2xkIGNv ZGUgY2FjaGVsaW5lIGNvdWxkIGhhdmUgZGVsYXllZCB1cy4pCisJICovCisJLy8gdGhpcyBzaG91 bGQgZ28gaW50byBhIHNlcGFyYXRlIHBhdGNoIHdpdGggbWljcm8tb3B0aW1pemF0aW9ucworCWlm ICh0cnlsb2NrKGxvY2ssICZ2YWwpKQorCQlyZXR1cm47CisJLyoKKwkgKiBUaGUgbG9jayBpcyBz dGlsbCBoZWxkLCB3YWl0IHdpdGhvdXQgdG91Y2hpbmcgdGhlIG5vZGUgdW5sZXNzIHRoZXJlCisJ ICogaXMgYXQgbGVhc3Qgb25lIGNwdSB3YWl0aW5nIGJlZm9yZSB1cy4KKwkgKi8KKwkvLyBjcmVh dGUgc3RydWN0dXJlZCBjb2RlIG91dCBvZiB0aGlzIG1lc3MKKwlpZiAodHJ5cGVuZGluZyhsb2Nr LCAmdmFsKSkKKwkJZ290byBwZW5kaW5nOwogCiAJbm9kZSA9IHRoaXNfY3B1X3B0cigmbWNzX25v ZGVzWzBdKTsKIAlpZHggPSBub2RlLT5jb3VudCsrOwpAQCAtMzY0LDE1ICs0MzMsMTggQEAgdm9p ZCBxdWV1ZV9zcGluX2xvY2tfc2xvd3BhdGgoc3RydWN0IHFzcGlubG9jayAqbG9jaywgdTMyIHZh bCkKIAkgKiBhdHRlbXB0IHRoZSB0cnlsb2NrIG9uY2UgbW9yZSBpbiB0aGUgaG9wZSBzb21lb25l IGxldCBnbyB3aGlsZSB3ZQogCSAqIHdlcmVuJ3Qgd2F0Y2hpbmcuCiAJICovCi0JaWYgKHF1ZXVl X3NwaW5fdHJ5bG9jayhsb2NrKSkKKwkvLyBpcyBzb21lIG9mIHRoZSByZS1jaGVja2luZyBjb3Vu dGVycHJvZHVjdGl2ZT8KKwlpZiAodHJ5bG9jayhsb2NrLCAmdmFsKSkgeworCQl0aGlzX2NwdV9k ZWMobWNzX25vZGVzWzBdLmNvdW50KTsgLy8gdWdseQorCQlyZXR1cm47CisJfQorCWlmICh0cnlw ZW5kaW5nKGxvY2ssICZ2YWwpKQogCQlnb3RvIHJlbGVhc2U7CiAKIAkvKgotCSAqIHdlIGFscmVh ZHkgdG91Y2hlZCB0aGUgcXVldWVpbmcgY2FjaGVsaW5lOyBkb24ndCBib3RoZXIgd2l0aCBwZW5k aW5nCi0JICogc3R1ZmYuCi0JICoKIAkgKiBwLCosKiAtPiBuLCosKgogCSAqLworCS8vIHJhY2lu ZyBmb3IgcGVuZGluZy9xdWV1ZSB0aWxsIGhlcmU7IHNhZmUKIAlvbGQgPSB4Y2hnX3RhaWwobG9j aywgdGFpbCwgJnZhbCk7CiAKIAkvKgpAQCAtMzg2LDQxICs0NTgsNDUgQEAgdm9pZCBxdWV1ZV9z cGluX2xvY2tfc2xvd3BhdGgoc3RydWN0IHFzcGlubG9jayAqbG9jaywgdTMyIHZhbCkKIAl9CiAK IAkvKgotCSAqIHdlJ3JlIGF0IHRoZSBoZWFkIG9mIHRoZSB3YWl0cXVldWUsIHdhaXQgZm9yIHRo ZSBvd25lciAmIHBlbmRpbmcgdG8KLQkgKiBnbyBhd2F5LgotCSAqIExvYWQtYWNxdWlyZWQgaXMg dXNlZCBoZXJlIGJlY2F1c2UgdGhlIGdldF9xbG9jaygpCi0JICogZnVuY3Rpb24gYmVsb3cgbWF5 IG5vdCBiZSBhIGZ1bGwgbWVtb3J5IGJhcnJpZXIuCi0JICoKLQkgKiAqLHgseSAtPiAqLDAsMAor CSAqIFdlIGFyZSBub3cgd2FpdGluZyBmb3IgdGhlIHBlbmRpbmcgYml0IHRvIGdldCBjbGVhcmVk LgogCSAqLwotCXdoaWxlICgodmFsID0gc21wX2xvYWRfYWNxdWlyZSgmbG9jay0+dmFsLmNvdW50 ZXIpKQotCQkJCSAgICAgICAmIF9RX0xPQ0tFRF9QRU5ESU5HX01BU0spCisJLy8gbWFrZSBhIGdl dF9wZW5kaW5nKGxvY2ssICZ2YWwpIGhlbHBlcgorCXdoaWxlICgodmFsID0gc21wX2xvYWRfYWNx dWlyZSgmbG9jay0+dmFsLmNvdW50ZXIpKSAmIF9RX1BFTkRJTkdfTUFTSykKKwkJLy8gd291bGQg bG9uZ2VyIGJvZHkgZWFzZSBjYWNoZWxpbmUgY29udGVudGlvbj8KKwkJLy8gd291bGQgaXQgYmUg YmV0dGVyIHRvIHVzZSBtb25pdG9yL213YWl0IGluc3RlYWQ/CisJCS8vICh3ZSBjYW4gdG9sZXJh dGUgc29tZSBkZWxheSBiZWNhdXNlIHdlIGFyZW4ndCBwZW5kaW5nIC4uLikKIAkJYXJjaF9tdXRl eF9jcHVfcmVsYXgoKTsKIAogCS8qCi0JICogY2xhaW0gdGhlIGxvY2s6CisJICogVGhlIHBlbmRp bmcgYml0IGlzIGZyZWUsIHRha2UgaXQuCiAJICoKLQkgKiBuLDAsMCAtPiAwLDAsMSA6IGxvY2ss IHVuY29udGVuZGVkCi0JICogKiwwLDAgLT4gKiwwLDEgOiBsb2NrLCBjb250ZW5kZWQKKwkgKiAq LDAsKiAtPiAqLDEsKgorCSAqLworCS8vIG1pZ2h0IGFkZCAmdmFsIHBhcmFtIGFuZCBkbyB8PSBf UV9QRU5ESU5HX1ZBTCB3aGVuIHJlZmFjdG9yaW5nIC4uLgorCXNldF9wZW5kaW5nKGxvY2ssIDEp OworCisJLyoKKwkgKiBDbGVhciB0aGUgdGFpbCBpZiBub29uZSBxdWV1ZWQgYWZ0ZXIgdXMuCiAJ ICoKLQkgKiBJZiB0aGUgcXVldWUgaGVhZCBpcyB0aGUgb25seSBvbmUgaW4gdGhlIHF1ZXVlIChs b2NrIHZhbHVlID09IHRhaWwpLAotCSAqIGNsZWFyIHRoZSB0YWlsIGNvZGUgYW5kIGdyYWIgdGhl IGxvY2suIE90aGVyd2lzZSwgd2Ugb25seSBuZWVkCi0JICogdG8gZ3JhYiB0aGUgbG9jay4KKwkg KiBuLDEseSAtPiAwLDEseQogCSAqLwotCWZvciAoOzspIHsKLQkJaWYgKHZhbCAhPSB0YWlsKSB7 Ci0JCQlnZXRfcWxvY2sobG9jayk7Ci0JCQlicmVhazsKLQkJfQotCQlvbGQgPSBhdG9taWNfY21w eGNoZygmbG9jay0+dmFsLCB2YWwsIF9RX0xPQ0tFRF9WQUwpOwotCQlpZiAob2xkID09IHZhbCkK LQkJCWdvdG8gcmVsZWFzZTsJLyogTm8gY29udGVudGlvbiAqLworCWlmICgodmFsICYgX1FfVEFJ TF9NQVNLKSA9PSB0YWlsICYmCisJICAgIGNtcHhjaGdfdGFpbChsb2NrLCB0YWlsLCAwKSA9PSB0 YWlsKQorCQlnb3RvIHJlbGVhc2U7CisJLy8gbmVnYXRlIHRoZSBjb25kaXRpb24gYW5kIG9ibGl0 ZXJhdGUgdGhlIGdvdG8gd2l0aCBicmFjZXMKIAotCQl2YWwgPSBvbGQ7Ci0JfQorCS8vIGZ1biBm YWN0OgorCS8vICBpZiAoKHZhbCAmIF9RX1RBSUxfTUFTSykgPT0gdGFpbCkgeworCS8vICAJdmFs ID0gY21weGNoZ190YWlsKCZsb2NrLCB0YWlsLCAwKTsKKwkvLyAgCWlmICgodmFsICYgX1FfVEFJ TF9NQVNLKSA9PSB0YWlsKQorCS8vICAJCWdvdG8gcmVsZWFzZTsKKwkvLyBwcm9kdWNlZCBzaWdu aWZpY2FudGx5IGZhc3RlciBjb2RlIGluIG15IGJlbmNobWFya3MgLi4uCisJLy8gKEkgaGF2ZW4n dCBsb29rZWQgd2h5LCBzZWVtcyBsaWtlIGEgZmx1a2UuKQorCS8vIHN3YXAgdGhlIGNvZGUgaWYg eW91IHdhbnQgcGVyZm9ybWFuY2UgYXQgYW55IGNvc3QKIAogCS8qCi0JICogY29udGVuZGVkIHBh dGg7IHdhaXQgZm9yIG5leHQsIHJlbGVhc2UuCisJICogVGVsbCB0aGUgbmV4dCBub2RlIHRoYXQg d2UgYXJlIHBlbmRpbmcsIHNvIGl0IGNhbiBzdGFydCBzcGlubmluZyB0bworCSAqIHJlcGxhY2Ug dXMgaW4gdGhlIGZ1dHVyZS4KIAkgKi8KIAl3aGlsZSAoIShuZXh0ID0gQUNDRVNTX09OQ0Uobm9k ZS0+bmV4dCkpKQogCQlhcmNoX211dGV4X2NwdV9yZWxheCgpOwpAQCAtNDMyLDUgKzUwOCwxOSBA QCByZWxlYXNlOgogCSAqIHJlbGVhc2UgdGhlIG5vZGUKIAkgKi8KIAl0aGlzX2NwdV9kZWMobWNz X25vZGVzWzBdLmNvdW50KTsKK3BlbmRpbmc6CisJLyoKKwkgKiB3ZSdyZSBhdCB0aGUgaGVhZCBv ZiB0aGUgd2FpdHF1ZXVlLCB3YWl0IGZvciB0aGUgb3duZXIgdG8gZ28gYXdheS4KKwkgKiBGbGlw IHBlbmRpbmcgYW5kIGxvY2tlZCBiaXQgdGhlbi4KKwkgKgorCSAqICosMSwwIC0+ICosMCwxCisJ ICovCisJd2hpbGUgKCh2YWwgPSBzbXBfbG9hZF9hY3F1aXJlKCZsb2NrLT52YWwuY291bnRlcikp ICYgX1FfTE9DS0VEX01BU0spCisJCWFyY2hfbXV0ZXhfY3B1X3JlbGF4KCk7CisJY2xlYXJfcGVu ZGluZ19zZXRfbG9ja2VkKGxvY2ssIHZhbCk7CisKKwkvKgorCSAqIFdlIGhhdmUgdGhlIGxvY2su CisJICovCiB9CiBFWFBPUlRfU1lNQk9MKHF1ZXVlX3NwaW5fbG9ja19zbG93cGF0aCk7Ci0tIAox LjkuMAoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KVmly dHVhbGl6YXRpb24gbWFpbGluZyBsaXN0ClZpcnR1YWxpemF0aW9uQGxpc3RzLmxpbnV4LWZvdW5k YXRpb24ub3JnCmh0dHBzOi8vbGlzdHMubGludXhmb3VuZGF0aW9uLm9yZy9tYWlsbWFuL2xpc3Rp bmZvL3ZpcnR1YWxpemF0aW9u From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:57531 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751801AbaEURDS (ORCPT ); Wed, 21 May 2014 13:03:18 -0400 Date: Wed, 21 May 2014 19:02:27 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [RFC 08/07] qspinlock: integrate pending bit into queue Message-ID: <20140521170226.GA9241@potion.brq.redhat.com> References: <1399474907-22206-1-git-send-email-Waiman.Long@hp.com> <1399474907-22206-4-git-send-email-Waiman.Long@hp.com> <20140512152208.GA12309@potion.brq.redhat.com> <537276B4.10209@hp.com> <20140514165121.GA21370@potion.redhat.com> <20140514170016.GW30445@twins.programming.kicks-ass.net> <20140514191339.GA22813@potion.brq.redhat.com> <537A66D3.8070607@hp.com> <20140521164930.GA26199@potion.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140521164930.GA26199@potion.brq.redhat.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Waiman Long Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, Paolo Bonzini , Konrad Rzeszutek Wilk , Boris Ostrovsky , "Paul E. McKenney" , Rik van Riel , Linus Torvalds , Raghavendra K T , David Vrabel , Oleg Nesterov , Gleb Natapov , Scott J Norton , Chegu Vinod Message-ID: <20140521170227.xsLuaAMS4mkxQGzvnzgLoQrysipSf7tnMi_NerS3nng@z> 2014-05-21 18:49+0200, Radim Krčmář: > 2014-05-19 16:17-0400, Waiman Long: > > As for now, I will focus on just having one pending bit. > > I'll throw some ideas at it, One of the ideas follows; it seems sound, but I haven't benchmarked it thoroughly. (Wasted a lot of time by writing/playing with various tools and loads.) Dbench on ext4 ramdisk, hackbench and ebizzy have shown a small improvement in performance, but my main drive was the weird design of Pending Bit. Does your setup yield improvements too? (A minor code swap noted in the patch might help things.) It is meant to be aplied on top of first 7 patches, because the virt stuff would just get in the way. I have preserved a lot of dead code and made some questionable decisions just to keep the diff short and in one patch, sorry about that. (It is work in progress, double slashed lines mark points of interest.) ---8<--- Pending Bit wasn't used if we already had a node queue with one cpu, which meant that we suffered from these drawbacks again: - unlock path was more complicated (last queued CPU had to clear the tail) - cold node cacheline was just one critical section away With this patch, Pending Bit is used as an additional step in the queue. Waiting for lock is the same: we try Pending Bit and if it is taken, we append to Node Queue. Unlock is different: pending CPU moves into critical section and first CPU from Node Queue takes Pending Bit and notifies next in line or clears the tail. This allows the pending CPU to take the lock as fast as possible, because all bookkeeping was done when entering Pending Queue. Node Queue operations can also be slower without affecting the performance, because we have an additional buffer of one critical section. --- kernel/locking/qspinlock.c | 180 +++++++++++++++++++++++++++++++++------------ 1 file changed, 135 insertions(+), 45 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 0ee1a23..76cafb0 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -98,7 +98,10 @@ struct __qspinlock { union { atomic_t val; #ifdef __LITTLE_ENDIAN - u8 locked; + struct { + u8 locked; + u8 pending; + }; struct { u16 locked_pending; u16 tail; @@ -109,7 +112,8 @@ struct __qspinlock { u16 locked_pending; }; struct { - u8 reserved[3]; + u8 reserved[2]; + u8 pending; u8 locked; }; #endif @@ -314,6 +318,59 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) return 1; } +// nice comment here +static inline bool trylock(struct qspinlock *lock, u32 *val) { + if (!(*val = atomic_read(&lock->val)) && + (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) == 0)) { + *val = _Q_LOCKED_VAL; + return 1; + } + return 0; +} + +// here +static inline bool trypending(struct qspinlock *lock, u32 *pval) { + u32 old, val = *pval; + // optimizer might produce the same code if we use *pval directly + + // we could use 'if' and a xchg that touches only the pending bit to + // save some cycles at the price of a longer line cutting window + // (and I think it would bug without changing the rest) + while (!(val & (_Q_PENDING_MASK | _Q_TAIL_MASK))) { + old = atomic_cmpxchg(&lock->val, val, val | _Q_PENDING_MASK); + if (old == val) { + *pval = val | _Q_PENDING_MASK; + return 1; + } + val = old; + } + *pval = val; + return 0; +} + +// here +static inline void set_pending(struct qspinlock *lock, u8 pending) +{ + struct __qspinlock *l = (void *)lock; + + // take a look if this is necessary, and if we don't have an + // abstraction already + barrier(); + ACCESS_ONCE(l->pending) = pending; + barrier(); +} + +// and here +static inline u32 cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 newtail) +// API-incompatible with set_pending and the shifting is ugly, so I'd rather +// refactor this one, xchg_tail() and encode_tail() ... another day +{ + struct __qspinlock *l = (void *)lock; + + return (u32)cmpxchg(&l->tail, tail >> _Q_TAIL_OFFSET, + newtail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + /** * queue_spin_lock_slowpath - acquire the queue spinlock * @lock: Pointer to queue spinlock structure @@ -324,21 +381,21 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) * fast : slow : unlock * : : * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) - * : | ^--------.------. / : - * : v \ \ | : - * pending : (0,1,1) +--> (0,1,0) \ | : - * : | ^--' | | : - * : v | | : - * uncontended : (n,x,y) +--> (n,0,0) --' | : + * : | ^--------. / : + * : v \ | : + * pending : (0,1,1) +--> (0,1,0) | : + * : | ^--' ^----------. | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,y) ---> (0,1,y) | : * queue : | ^--' | : * : v | : - * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : - * queue : ^--' : - * - * The pending bit processing is in the trylock_pending() function - * whereas the uncontended and contended queue processing is in the - * queue_spin_lock_slowpath() function. + * contended : (*,x,y) +--> (*,0,y) (*,0,1) -' : + * queue : ^--' | ^ : + * : v | : + * : (*,1,y) ---> (*,1,0) : + * // diagram might be wrong (and definitely isn't obvious) * + * // give some insight about the hybrid locking */ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) { @@ -348,8 +405,20 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (trylock_pending(lock, &val)) - return; /* Lock acquired */ + /* + * Check if nothing changed while we were calling this function. + * (Cold code cacheline could have delayed us.) + */ + // this should go into a separate patch with micro-optimizations + if (trylock(lock, &val)) + return; + /* + * The lock is still held, wait without touching the node unless there + * is at least one cpu waiting before us. + */ + // create structured code out of this mess + if (trypending(lock, &val)) + goto pending; node = this_cpu_ptr(&mcs_nodes[0]); idx = node->count++; @@ -364,15 +433,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * attempt the trylock once more in the hope someone let go while we * weren't watching. */ - if (queue_spin_trylock(lock)) + // is some of the re-checking counterproductive? + if (trylock(lock, &val)) { + this_cpu_dec(mcs_nodes[0].count); // ugly + return; + } + if (trypending(lock, &val)) goto release; /* - * we already touched the queueing cacheline; don't bother with pending - * stuff. - * * p,*,* -> n,*,* */ + // racing for pending/queue till here; safe old = xchg_tail(lock, tail, &val); /* @@ -386,41 +458,45 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) } /* - * we're at the head of the waitqueue, wait for the owner & pending to - * go away. - * Load-acquired is used here because the get_qlock() - * function below may not be a full memory barrier. - * - * *,x,y -> *,0,0 + * We are now waiting for the pending bit to get cleared. */ - while ((val = smp_load_acquire(&lock->val.counter)) - & _Q_LOCKED_PENDING_MASK) + // make a get_pending(lock, &val) helper + while ((val = smp_load_acquire(&lock->val.counter)) & _Q_PENDING_MASK) + // would longer body ease cacheline contention? + // would it be better to use monitor/mwait instead? + // (we can tolerate some delay because we aren't pending ...) arch_mutex_cpu_relax(); /* - * claim the lock: + * The pending bit is free, take it. * - * n,0,0 -> 0,0,1 : lock, uncontended - * *,0,0 -> *,0,1 : lock, contended + * *,0,* -> *,1,* + */ + // might add &val param and do |= _Q_PENDING_VAL when refactoring ... + set_pending(lock, 1); + + /* + * Clear the tail if noone queued after us. * - * If the queue head is the only one in the queue (lock value == tail), - * clear the tail code and grab the lock. Otherwise, we only need - * to grab the lock. + * n,1,y -> 0,1,y */ - for (;;) { - if (val != tail) { - get_qlock(lock); - break; - } - old = atomic_cmpxchg(&lock->val, val, _Q_LOCKED_VAL); - if (old == val) - goto release; /* No contention */ + if ((val & _Q_TAIL_MASK) == tail && + cmpxchg_tail(lock, tail, 0) == tail) + goto release; + // negate the condition and obliterate the goto with braces - val = old; - } + // fun fact: + // if ((val & _Q_TAIL_MASK) == tail) { + // val = cmpxchg_tail(&lock, tail, 0); + // if ((val & _Q_TAIL_MASK) == tail) + // goto release; + // produced significantly faster code in my benchmarks ... + // (I haven't looked why, seems like a fluke.) + // swap the code if you want performance at any cost /* - * contended path; wait for next, release. + * Tell the next node that we are pending, so it can start spinning to + * replace us in the future. */ while (!(next = ACCESS_ONCE(node->next))) arch_mutex_cpu_relax(); @@ -432,5 +508,19 @@ release: * release the node */ this_cpu_dec(mcs_nodes[0].count); +pending: + /* + * we're at the head of the waitqueue, wait for the owner to go away. + * Flip pending and locked bit then. + * + * *,1,0 -> *,0,1 + */ + while ((val = smp_load_acquire(&lock->val.counter)) & _Q_LOCKED_MASK) + arch_mutex_cpu_relax(); + clear_pending_set_locked(lock, val); + + /* + * We have the lock. + */ } EXPORT_SYMBOL(queue_spin_lock_slowpath); -- 1.9.0