From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Kogan Subject: [PATCH v5 0/5] Add NUMA-awareness to qspinlock Date: Wed, 16 Oct 2019 00:28:58 -0400 Message-ID: <20191016042903.61081-1-alex.kogan@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org To: linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com Cc: alex.kogan@oracle.com, dave.dice@oracle.com, rahul.x.yadav@oracle.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com List-Id: linux-arch.vger.kernel.org Q2hhbmdlcyBmcm9tIHY0OgotLS0tLS0tLS0tLS0tLS0tCgotIFN3aXRjaCB0byBhIGRldGVybWlu aXN0aWMgYm91bmQgb24gdGhlIG51bWJlciBvZiBpbnRyYS1ub2RlIGhhbmRvZmZzLAphcyBzdWdn ZXN0ZWQgYnkgTG9uZ21hbi4KCi0gU2NhbiB0aGUgbWFpbiBxdWV1ZSBhZnRlciBhY3F1aXJpbmcg dGhlIE1DUyBsb2NrIGFuZCBiZWZvcmUgYWNxdWlyaW5nIAp0aGUgc3BpbmxvY2sgKHByZS1zY2Fu KSwgYXMgc3VnZ2VzdGVkIGJ5IExvbmdtYW4uIElmIG5vIHRocmVhZCBpcyBmb3VuZCAKaW4gcHJl LXNjYW4sIHRyeSBhZ2FpbiBhZnRlciBhY3F1aXJpbmcgdGhlIHNwaW5sb2NrLCByZXN1bWluZyBm cm9tIHRoZQpzYW1lIHBsYWNlIHdoZXJlIHByZS1zY2FuIHN0b3BwZWQuCgotIENvbnZlcnQgdGhl IHNlY29uZGFyeSBxdWV1ZSB0byBhIGN5Y2xpYyBsaXN0IHN1Y2ggdGhhdCB0aGUgdGFpbOKAmXMg QG5leHQKcG9pbnRzIHRvIHRoZSBoZWFkIG9mIHRoZSBxdWV1ZS4gU3RvcmUgdGhlIHBvaW50ZXIg dG8gdGhlIHNlY29uZGFyeSBxdWV1ZQp0YWlsIChyYXRoZXIgdGhhbiBoZWFkKSBpbiBAbG9ja2Vk LiBUaGlzIGVsaW1pbmF0ZXMgdGhlIG5lZWQgZm9yIHRoZSBAdGFpbApmaWVsZCBpbiBDTkEgbm9k ZXMsIG1ha2luZyBzcGFjZSBmb3IgZmllbGRzIHJlcXVpcmVkIGJ5IHRoZSB0d28gY2hhbmdlcwph Ym92ZS4KCi0gQ2hhbmdlIGFyY2hfbWNzX3NwaW5fbG9ja19jb250ZW5kZWQoKSB0byBhcmNoX21j c19zcGluX2xvY2soKSwgYW5kCmZpeCBtaXN1c2Ugb2Ygb2xkIG1hY3JvIG5hbWVzLCBhcyBzdWdn ZXN0ZWQgYnkgSGFuanVuLgoKClN1bW1hcnkKLS0tLS0tLQoKTG9jayB0aHJvdWdocHV0IGNhbiBi ZSBpbmNyZWFzZWQgYnkgaGFuZGluZyBhIGxvY2sgdG8gYSB3YWl0ZXIgb24gdGhlCnNhbWUgTlVN QSBub2RlIGFzIHRoZSBsb2NrIGhvbGRlciwgcHJvdmlkZWQgY2FyZSBpcyB0YWtlbiB0byBhdm9p ZApzdGFydmF0aW9uIG9mIHdhaXRlcnMgb24gb3RoZXIgTlVNQSBub2Rlcy4gVGhpcyBwYXRjaCBp bnRyb2R1Y2VzIENOQQooY29tcGFjdCBOVU1BLWF3YXJlIGxvY2spIGFzIHRoZSBzbG93IHBhdGgg Zm9yIHFzcGlubG9jay4gSXQgaXMKZW5hYmxlZCB0aHJvdWdoIGEgY29uZmlndXJhdGlvbiBvcHRp b24gKE5VTUFfQVdBUkVfU1BJTkxPQ0tTKS4KCkNOQSBpcyBhIE5VTUEtYXdhcmUgdmVyc2lvbiBv ZiB0aGUgTUNTIGxvY2suIFNwaW5uaW5nIHRocmVhZHMgYXJlCm9yZ2FuaXplZCBpbiB0d28gcXVl dWVzLCBhIG1haW4gcXVldWUgZm9yIHRocmVhZHMgcnVubmluZyBvbiB0aGUgc2FtZQpub2RlIGFz IHRoZSBjdXJyZW50IGxvY2sgaG9sZGVyLCBhbmQgYSBzZWNvbmRhcnkgcXVldWUgZm9yIHRocmVh ZHMKcnVubmluZyBvbiBvdGhlciBub2Rlcy4gVGhyZWFkcyBzdG9yZSB0aGUgSUQgb2YgdGhlIG5v ZGUgb24gd2hpY2gKdGhleSBhcmUgcnVubmluZyBpbiB0aGVpciBxdWV1ZSBub2Rlcy4gQWZ0ZXIg YWNxdWlyaW5nIHRoZSBNQ1MgbG9jayBhbmQKYmVmb3JlIGFjcXVpcmluZyB0aGUgc3BpbmxvY2ss IHRoZSBsb2NrIGhvbGRlciBzY2FucyB0aGUgbWFpbiBxdWV1ZQpsb29raW5nIGZvciBhIHRocmVh ZCBydW5uaW5nIG9uIHRoZSBzYW1lIG5vZGUgKHByZS1zY2FuKS4gSWYgZm91bmQgKGNhbGwKaXQg dGhyZWFkIFQpLCBhbGwgdGhyZWFkcyBpbiB0aGUgbWFpbiBxdWV1ZSBiZXR3ZWVuIHRoZSBjdXJy ZW50IGxvY2sKaG9sZGVyIGFuZCBUIGFyZSBtb3ZlZCB0byB0aGUgZW5kIG9mIHRoZSBzZWNvbmRh cnkgcXVldWUuICBJZiBzdWNoIFQKaXMgbm90IGZvdW5kLCB3ZSBtYWtlIGFub3RoZXIgc2NhbiBv ZiB0aGUgbWFpbiBxdWV1ZSBhZnRlciBhY3F1aXJpbmcgCnRoZSBzcGlubG9jayB3aGVuIHVubG9j a2luZyB0aGUgTUNTIGxvY2sgKHBvc3Qtc2NhbiksIHN0YXJ0aW5nIGF0IHRoZQpub2RlIHdoZXJl IHByZS1zY2FuIHN0b3BwZWQuIElmIGJvdGggc2NhbnMgZmFpbCB0byBmaW5kIHN1Y2ggVCwgdGhl Ck1DUyBsb2NrIGlzIHBhc3NlZCB0byB0aGUgZmlyc3QgdGhyZWFkIGluIHRoZSBzZWNvbmRhcnkg cXVldWUuIElmIHRoZQpzZWNvbmRhcnkgcXVldWUgaXMgZW1wdHksIHRoZSBNQ1MgbG9jayBpcyBw YXNzZWQgdG8gdGhlIG5leHQgdGhyZWFkIGluIHRoZQptYWluIHF1ZXVlLiBUbyBhdm9pZCBzdGFy dmF0aW9uIG9mIHRocmVhZHMgaW4gdGhlIHNlY29uZGFyeSBxdWV1ZSwgdGhvc2UKdGhyZWFkcyBh cmUgbW92ZWQgYmFjayB0byB0aGUgaGVhZCBvZiB0aGUgbWFpbiBxdWV1ZSBhZnRlciBhIGNlcnRh aW4KbnVtYmVyIG9mIGludHJhLW5vZGUgbG9jayBoYW5kLW9mZnMuCgpNb3JlIGRldGFpbHMgYXJl IGF2YWlsYWJsZSBhdCBodHRwczovL2FyeGl2Lm9yZy9hYnMvMTgxMC4wNTYwMC4KCldlIGhhdmUg ZG9uZSBzb21lIHBlcmZvcm1hbmNlIGV2YWx1YXRpb24gd2l0aCB0aGUgbG9ja3RvcnR1cmUgbW9k dWxlCmFzIHdlbGwgYXMgd2l0aCBzZXZlcmFsIGJlbmNobWFya3MgZnJvbSB0aGUgd2lsbC1pdC1z Y2FsZSByZXBvLgpUaGUgZm9sbG93aW5nIGxvY2t0b3J0dXJlIHJlc3VsdHMgYXJlIGZyb20gYW4g T3JhY2xlIFg1LTQgc2VydmVyCihmb3VyIEludGVsIFhlb24gRTctODg5NSB2MyBAIDIuNjBHSHog c29ja2V0cyB3aXRoIDE4IGh5cGVydGhyZWFkZWQKY29yZXMgZWFjaCkuIEVhY2ggbnVtYmVyIHJl cHJlc2VudHMgYW4gYXZlcmFnZSAob3ZlciAyNSBydW5zKSBvZiB0aGUKdG90YWwgbnVtYmVyIG9m IG9wcyAoeDEwXjcpIHJlcG9ydGVkIGF0IHRoZSBlbmQgb2YgZWFjaCBydW4uIFRoZSAKc3RhbmRh cmQgZGV2aWF0aW9uIGlzIGFsc28gcmVwb3J0ZWQgaW4gKCksIGFuZCBpbiBnZW5lcmFsIGlzIGFi b3V0IDMlCmZyb20gdGhlIGF2ZXJhZ2UuIFRoZSAnc3RvY2snIGtlcm5lbCBpcyB2NS40LjAtcmMx LApjb21taXQgZDkwZjJkZjYzYzVjLCBjb21waWxlZCBpbiB0aGUgZGVmYXVsdCBjb25maWd1cmF0 aW9uLiAKJ3BhdGNoLUNOQScgaXMgdGhlIG1vZGlmaWVkIGtlcm5lbCB3aXRoIE5VTUFfQVdBUkVf U1BJTkxPQ0tTIHNldDsgCnRoZSBzcGVlZHVwIGlzIGNhbGN1bGF0ZWQgZGl2aWRpbmcgJ3BhdGNo LUNOQScgYnkgJ3N0b2NrJy4KCiN0aHIgIAkgc3RvY2sgICAgICAgIHBhdGNoLUNOQSAgIHNwZWVk dXAgKHBhdGNoLUNOQS9zdG9jaykKICAxICAyLjY3NCAoMC4xMTgpICAyLjczNiAoMC4xMTkpICAx LjAyMwogIDIgIDIuNTg4ICgwLjE0MSkgIDIuNjAzICgwLjEwOCkgIDEuMDA2CiAgNCAgNC4yMzAg KDAuMTIwKSAgNC4yMjAgKDAuMTI3KSAgMC45OTgKICA4ICA1LjM2MiAoMC4xODEpICA2LjY3OSAo MC4xODIpICAxLjI0NgogMTYgIDYuNjM5ICgwLjEzMykgIDguMDUwICgwLjIwMCkgIDEuMjEzCiAz MiAgNy4zNTkgKDAuMTQ5KSAgOC43OTIgKDAuMTY4KSAgMS4xOTUKIDM2ICA3LjQ0MyAoMC4xNDIp ICA4Ljg3MyAoMC4yMzApICAxLjE5MgogNzIgIDYuNTU0ICgwLjE0NykgIDkuMzE3ICgwLjE1OCkg IDEuNDIxCjEwOCAgNi4xNTYgKDAuMDkzKSAgOS40MDQgKDAuMTkxKSAgMS41MjgKMTQyICA1LjY1 OSAoMC4wOTMpICA5LjM2MSAoMC4xODQpICAxLjY1NAoKVGhlIGZvbGxvd2luZyB0YWJsZXMgY29u dGFpbiB0aHJvdWdocHV0IHJlc3VsdHMgKG9wcy91cykgZnJvbSB0aGUgc2FtZQpzZXR1cCBmb3Ig d2lsbC1pdC1zY2FsZS9vcGVuMV90aHJlYWRzOiAKCiN0aHIgIAkgc3RvY2sgICAgICAgIHBhdGNo LUNOQSAgIHNwZWVkdXAgKHBhdGNoLUNOQS9zdG9jaykKICAxICAwLjUzMiAoMC4wMDIpICAwLjUz MiAoMC4wMDMpICAxLjAwMAogIDIgIDAuNzg1ICgwLjAyNCkgIDAuNzc5ICgwLjAyNSkgIDAuOTky CiAgNCAgMS40MjYgKDAuMDE4KSAgMS40MDkgKDAuMDIxKSAgMC45ODgKICA4ICAxLjc3OSAoMC4x MDEpICAxLjcxMSAoMC4xMjcpICAwLjk2MgogMTYgIDEuNzYxICgwLjA5MykgIDEuNjcxICgwLjEw NCkgIDAuOTQ5CiAzMiAgMC45MzUgKDAuMDYzKSAgMS42MTkgKDAuMDkzKSAgMS43MzEKIDM2ICAw LjkzNiAoMC4wODIpICAxLjU5MSAoMC4wODYpICAxLjY5OQogNzIgIDAuODM5ICgwLjA0MykgIDEu NjY3ICgwLjA5NykgIDEuOTg4CjEwOCAgMC44NDIgKDAuMDM1KSAgMS43MDEgKDAuMDkxKSAgMi4w MjEKMTQyICAwLjgzMCAoMC4wMzcpICAxLjcxNCAoMC4wOTgpICAyLjA2NgoKYW5kIHdpbGwtaXQt c2NhbGUvbG9jazJfdGhyZWFkczoKCiN0aHIgIAkgc3RvY2sgICAgICAgIHBhdGNoLUNOQSAgIHNw ZWVkdXAgKHBhdGNoLUNOQS9zdG9jaykKICAxICAxLjU1NSAoMC4wMDkpICAxLjU3NyAoMC4wMDIp ICAxLjAxNAogIDIgIDIuNjQ0ICgwLjA2MCkgIDIuNjgyICgwLjA2MikgIDEuMDE0CiAgNCAgNS4x NTkgKDAuMjA1KSAgNS4xOTcgKDAuMjMxKSAgMS4wMDcKICA4ICA0LjMwMiAoMC4yMjEpICA0LjI3 OSAoMC4zMTgpICAwLjk5NQogMTYgIDQuMjU5ICgwLjExMSkgIDQuMDg3ICgwLjE2MykgIDAuOTYw CiAzMiAgMi41ODMgKDAuMTEyKSAgNC4wNzcgKDAuMTIwKSAgMS41NzgKIDM2ICAyLjQ5OSAoMC4x MDYpICA0LjA3NiAoMC4xMDYpICAxLjYzMQogNzIgIDEuOTc5ICgwLjA4NSkgIDQuMDc3ICgwLjEy MykgIDIuMDYxCjEwOCAgMi4wOTYgKDAuMDkwKSAgNC4wNDMgKDAuMTMwKSAgMS45MjkKMTQyICAx LjkxMyAoMC4xMDkpICAzLjk4NCAoMC4xMDgpICAyLjA4MgoKT3VyIGV2YWx1YXRpb24gc2hvd3Mg dGhhdCBDTkEgYWxzbyBpbXByb3ZlcyBwZXJmb3JtYW5jZSBvZiB1c2VyIAphcHBsaWNhdGlvbnMg dGhhdCBoYXZlIGhvdCBwdGhyZWFkIG11dGV4ZXMuIFRob3NlIG11dGV4ZXMgYXJlIApibG9ja2lu ZywgYW5kIHdhaXRpbmcgdGhyZWFkcyBwYXJrIGFuZCB1bnBhcmsgdmlhIHRoZSBmdXRleCAKbWVj aGFuaXNtIGluIHRoZSBrZXJuZWwuIEdpdmVuIHRoYXQga2VybmVsIGZ1dGV4IGNoYWlucywgd2hp Y2gKYXJlIGhhc2hlZCBieSB0aGUgbXV0ZXggYWRkcmVzcywgYXJlIGVhY2ggcHJvdGVjdGVkIGJ5 IGEgCmNoYWluLXNwZWNpZmljIHNwaW4gbG9jaywgdGhlIGNvbnRlbnRpb24gb24gYSB1c2VyLW1v ZGUgbXV0ZXggCnRyYW5zbGF0ZXMgaW50byBjb250ZW50aW9uIG9uIGEga2VybmVsIGxldmVsIHNw aW5sb2NrLiAKCkhlcmUgYXJlIHRoZSByZXN1bHRzIGZvciB0aGUgbGV2ZWxkYiDigJhyZWFkcmFu ZG9t4oCZIGJlbmNobWFyazoKCiN0aHIgIAkgc3RvY2sgICAgICAgIHBhdGNoLUNOQSAgIHNwZWVk dXAgKHBhdGNoLUNOQS9zdG9jaykKICAxICAwLjUzMiAoMC4wMDcpICAwLjUzNSAoMC4wMTUpICAx LjAwNgogIDIgIDAuNjY1ICgwLjAzMCkgIDAuNjczICgwLjAzNCkgIDEuMDExCiAgNCAgMC43MTUg KDAuMDIzKSAgMC43MTYgKDAuMDI2KSAgMS4wMDIKICA4ICAwLjY4NiAoMC4wMjMpICAwLjY4NiAo MC4wMjQpICAxLjAwMQogMTYgIDAuNzE5ICgwLjAzMCkgIDAuNzM3ICgwLjAyNSkgIDEuMDI1CiAz MiAgMC43NDAgKDAuMDM0KSAgMC45NTkgKDAuMTA1KSAgMS4yOTYKIDM2ICAwLjczMCAoMC4wMjQp ICAxLjA3OSAoMC4xMTIpICAxLjQ3OAogNzIgIDAuNjUyICgwLjAxOCkgIDEuMTYwICgwLjAyNCkg IDEuNzc4CjEwOCAgMC42MjIgKDAuMDE2KSAgMS4xNTcgKDAuMDI4KSAgMS44NjAKMTQyICAwLjYw MCAoMC4wMTUpICAxLjE0NSAoMC4wMzUpICAxLjkwOAoKQWRkaXRpb25hbCBwZXJmb3JtYW5jZSBu dW1iZXJzIGFyZSBhdmFpbGFibGUgaW4gcHJldmlvdXMgcmV2aXNpb25zCm9mIHRoZSBzZXJpZXMu CgpGdXJ0aGVyIGNvbW1lbnRzIGFyZSB3ZWxjb21lIGFuZCBhcHByZWNpYXRlZC4KCkFsZXggS29n YW4gKDUpOgogIGxvY2tpbmcvcXNwaW5sb2NrOiBSZW5hbWUgbWNzIGxvY2svdW5sb2NrIG1hY3Jv cyBhbmQgbWFrZSB0aGVtIG1vcmUKICAgIGdlbmVyaWMKICBsb2NraW5nL3FzcGlubG9jazogUmVm YWN0b3IgdGhlIHFzcGlubG9jayBzbG93IHBhdGgKICBsb2NraW5nL3FzcGlubG9jazogSW50cm9k dWNlIENOQSBpbnRvIHRoZSBzbG93IHBhdGggb2YgcXNwaW5sb2NrCiAgbG9ja2luZy9xc3Bpbmxv Y2s6IEludHJvZHVjZSBzdGFydmF0aW9uIGF2b2lkYW5jZSBpbnRvIENOQQogIGxvY2tpbmcvcXNw aW5sb2NrOiBJbnRyb2R1Y2UgdGhlIHNodWZmbGUgcmVkdWN0aW9uIG9wdGltaXphdGlvbiBpbnRv CiAgICBDTkEKCiBhcmNoL2FybS9pbmNsdWRlL2FzbS9tY3Nfc3BpbmxvY2suaCB8ICAgNiArLQog YXJjaC94ODYvS2NvbmZpZyAgICAgICAgICAgICAgICAgICAgfCAgMTkgKysrCiBhcmNoL3g4Ni9p bmNsdWRlL2FzbS9xc3BpbmxvY2suaCAgICB8ICAgNCArCiBhcmNoL3g4Ni9rZXJuZWwvYWx0ZXJu YXRpdmUuYyAgICAgICB8ICA0MSArKysrKwogaW5jbHVkZS9hc20tZ2VuZXJpYy9tY3Nfc3Bpbmxv Y2suaCAgfCAgIDQgKy0KIGtlcm5lbC9sb2NraW5nL21jc19zcGlubG9jay5oICAgICAgIHwgIDIw ICstLQoga2VybmVsL2xvY2tpbmcvcXNwaW5sb2NrLmMgICAgICAgICAgfCAgNzcgKysrKysrKyst CiBrZXJuZWwvbG9ja2luZy9xc3BpbmxvY2tfY25hLmggICAgICB8IDMxMiArKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysKIGtlcm5lbC9sb2NraW5nL3FzcGlubG9ja19wYXJhdmly dC5oIHwgICAyICstCiA5IGZpbGVzIGNoYW5nZWQsIDQ2MiBpbnNlcnRpb25zKCspLCAyMyBkZWxl dGlvbnMoLSkKIGNyZWF0ZSBtb2RlIDEwMDY0NCBrZXJuZWwvbG9ja2luZy9xc3BpbmxvY2tfY25h LmgKCi0tIAoyLjExLjAgKEFwcGxlIEdpdC04MSkKCgpfX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51 eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5v cmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2120.oracle.com ([156.151.31.85]:33578 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390659AbfJPEcv (ORCPT ); Wed, 16 Oct 2019 00:32:51 -0400 From: Alex Kogan Subject: [PATCH v5 0/5] Add NUMA-awareness to qspinlock Date: Wed, 16 Oct 2019 00:28:58 -0400 Message-ID: <20191016042903.61081-1-alex.kogan@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, alex.kogan@oracle.com, dave.dice@oracle.com, rahul.x.yadav@oracle.com Message-ID: <20191016042858.4dXU5kfdCjWXYzAD_9Ku27vJ9I2oixcwNtLLkPmlUTU@z> Changes from v4: ---------------- - Switch to a deterministic bound on the number of intra-node handoffs, as suggested by Longman. - Scan the main queue after acquiring the MCS lock and before acquiring the spinlock (pre-scan), as suggested by Longman. If no thread is found in pre-scan, try again after acquiring the spinlock, resuming from the same place where pre-scan stopped. - Convert the secondary queue to a cyclic list such that the tail’s @next points to the head of the queue. Store the pointer to the secondary queue tail (rather than head) in @locked. This eliminates the need for the @tail field in CNA nodes, making space for fields required by the two changes above. - Change arch_mcs_spin_lock_contended() to arch_mcs_spin_lock(), and fix misuse of old macro names, as suggested by Hanjun. Summary ------- Lock throughput can be increased by handing a lock to a waiter on the same NUMA node as the lock holder, provided care is taken to avoid starvation of waiters on other NUMA nodes. This patch introduces CNA (compact NUMA-aware lock) as the slow path for qspinlock. It is enabled through a configuration option (NUMA_AWARE_SPINLOCKS). CNA is a NUMA-aware version of the MCS lock. Spinning threads are organized in two queues, a main queue for threads running on the same node as the current lock holder, and a secondary queue for threads running on other nodes. Threads store the ID of the node on which they are running in their queue nodes. After acquiring the MCS lock and before acquiring the spinlock, the lock holder scans the main queue looking for a thread running on the same node (pre-scan). If found (call it thread T), all threads in the main queue between the current lock holder and T are moved to the end of the secondary queue. If such T is not found, we make another scan of the main queue after acquiring the spinlock when unlocking the MCS lock (post-scan), starting at the node where pre-scan stopped. If both scans fail to find such T, the MCS lock is passed to the first thread in the secondary queue. If the secondary queue is empty, the MCS lock is passed to the next thread in the main queue. To avoid starvation of threads in the secondary queue, those threads are moved back to the head of the main queue after a certain number of intra-node lock hand-offs. More details are available at https://arxiv.org/abs/1810.05600. We have done some performance evaluation with the locktorture module as well as with several benchmarks from the will-it-scale repo. The following locktorture results are from an Oracle X5-4 server (four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded cores each). Each number represents an average (over 25 runs) of the total number of ops (x10^7) reported at the end of each run. The standard deviation is also reported in (), and in general is about 3% from the average. The 'stock' kernel is v5.4.0-rc1, commit d90f2df63c5c, compiled in the default configuration. 'patch-CNA' is the modified kernel with NUMA_AWARE_SPINLOCKS set; the speedup is calculated dividing 'patch-CNA' by 'stock'. #thr stock patch-CNA speedup (patch-CNA/stock) 1 2.674 (0.118) 2.736 (0.119) 1.023 2 2.588 (0.141) 2.603 (0.108) 1.006 4 4.230 (0.120) 4.220 (0.127) 0.998 8 5.362 (0.181) 6.679 (0.182) 1.246 16 6.639 (0.133) 8.050 (0.200) 1.213 32 7.359 (0.149) 8.792 (0.168) 1.195 36 7.443 (0.142) 8.873 (0.230) 1.192 72 6.554 (0.147) 9.317 (0.158) 1.421 108 6.156 (0.093) 9.404 (0.191) 1.528 142 5.659 (0.093) 9.361 (0.184) 1.654 The following tables contain throughput results (ops/us) from the same setup for will-it-scale/open1_threads: #thr stock patch-CNA speedup (patch-CNA/stock) 1 0.532 (0.002) 0.532 (0.003) 1.000 2 0.785 (0.024) 0.779 (0.025) 0.992 4 1.426 (0.018) 1.409 (0.021) 0.988 8 1.779 (0.101) 1.711 (0.127) 0.962 16 1.761 (0.093) 1.671 (0.104) 0.949 32 0.935 (0.063) 1.619 (0.093) 1.731 36 0.936 (0.082) 1.591 (0.086) 1.699 72 0.839 (0.043) 1.667 (0.097) 1.988 108 0.842 (0.035) 1.701 (0.091) 2.021 142 0.830 (0.037) 1.714 (0.098) 2.066 and will-it-scale/lock2_threads: #thr stock patch-CNA speedup (patch-CNA/stock) 1 1.555 (0.009) 1.577 (0.002) 1.014 2 2.644 (0.060) 2.682 (0.062) 1.014 4 5.159 (0.205) 5.197 (0.231) 1.007 8 4.302 (0.221) 4.279 (0.318) 0.995 16 4.259 (0.111) 4.087 (0.163) 0.960 32 2.583 (0.112) 4.077 (0.120) 1.578 36 2.499 (0.106) 4.076 (0.106) 1.631 72 1.979 (0.085) 4.077 (0.123) 2.061 108 2.096 (0.090) 4.043 (0.130) 1.929 142 1.913 (0.109) 3.984 (0.108) 2.082 Our evaluation shows that CNA also improves performance of user applications that have hot pthread mutexes. Those mutexes are blocking, and waiting threads park and unpark via the futex mechanism in the kernel. Given that kernel futex chains, which are hashed by the mutex address, are each protected by a chain-specific spin lock, the contention on a user-mode mutex translates into contention on a kernel level spinlock. Here are the results for the leveldb ‘readrandom’ benchmark: #thr stock patch-CNA speedup (patch-CNA/stock) 1 0.532 (0.007) 0.535 (0.015) 1.006 2 0.665 (0.030) 0.673 (0.034) 1.011 4 0.715 (0.023) 0.716 (0.026) 1.002 8 0.686 (0.023) 0.686 (0.024) 1.001 16 0.719 (0.030) 0.737 (0.025) 1.025 32 0.740 (0.034) 0.959 (0.105) 1.296 36 0.730 (0.024) 1.079 (0.112) 1.478 72 0.652 (0.018) 1.160 (0.024) 1.778 108 0.622 (0.016) 1.157 (0.028) 1.860 142 0.600 (0.015) 1.145 (0.035) 1.908 Additional performance numbers are available in previous revisions of the series. Further comments are welcome and appreciated. Alex Kogan (5): locking/qspinlock: Rename mcs lock/unlock macros and make them more generic locking/qspinlock: Refactor the qspinlock slow path locking/qspinlock: Introduce CNA into the slow path of qspinlock locking/qspinlock: Introduce starvation avoidance into CNA locking/qspinlock: Introduce the shuffle reduction optimization into CNA arch/arm/include/asm/mcs_spinlock.h | 6 +- arch/x86/Kconfig | 19 +++ arch/x86/include/asm/qspinlock.h | 4 + arch/x86/kernel/alternative.c | 41 +++++ include/asm-generic/mcs_spinlock.h | 4 +- kernel/locking/mcs_spinlock.h | 20 +-- kernel/locking/qspinlock.c | 77 ++++++++- kernel/locking/qspinlock_cna.h | 312 ++++++++++++++++++++++++++++++++++++ kernel/locking/qspinlock_paravirt.h | 2 +- 9 files changed, 462 insertions(+), 23 deletions(-) create mode 100644 kernel/locking/qspinlock_cna.h -- 2.11.0 (Apple Git-81)