From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Martin Subject: [PATCH v3 14/28] arm64/sve: Backend logic for setting the vector length Date: Tue, 10 Oct 2017 19:38:31 +0100 Message-ID: <1507660725-7986-15-git-send-email-Dave.Martin@arm.com> References: <1507660725-7986-1-git-send-email-Dave.Martin@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1507660725-7986-1-git-send-email-Dave.Martin@arm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org, Okamoto Takayuki , libc-alpha@sourceware.org, Ard Biesheuvel , Szabolcs Nagy , Catalin Marinas , Will Deacon , Richard Sandiford , kvmarm@lists.cs.columbia.edu List-Id: linux-arch.vger.kernel.org VGhpcyBwYXRjaCBpbXBsZW1lbnRzIHRoZSBjb3JlIGxvZ2ljIGZvciBjaGFuZ2luZyBhIHRhc2sn cyB2ZWN0b3IKbGVuZ3RoIG9uIHJlcXVlc3QgZnJvbSB1c2Vyc3BhY2UuICBUaGlzIHdpbGwgYmUg dXNlZCBieSB0aGUgcHRyYWNlCmFuZCBwcmN0bCBmcm9udGVuZHMgdGhhdCBhcmUgaW1wbGVtZW50 ZWQgaW4gbGF0ZXIgcGF0Y2hlcy4KClRoZSBTVkUgYXJjaGl0ZWN0dXJlIHBlcm1pdHMsIGJ1dCBk b2VzIG5vdCByZXF1aXJlLCBpbXBsZW1lbnRhdGlvbnMKdG8gc3VwcG9ydCB2ZWN0b3IgbGVuZ3Ro cyB0aGF0IGFyZSBub3QgYSBwb3dlciBvZiB0d28uICBUbyBoYW5kbGUKdGhpcywgbG9naWMgaXMg YWRkZWQgdG8gY2hlY2sgYSByZXF1ZXN0ZWQgdmVjdG9yIGxlbmd0aCBhZ2FpbnN0IGEKcG9zc2li bHkgc3BhcnNlIGJpdG1hcCBvZiBhdmFpbGFibGUgdmVjdG9yIGxlbmd0aHMgYXQgcnVudGltZSwg c28KdGhhdCB0aGUgYmVzdCBzdXBwb3J0ZWQgdmFsdWUgY2FuIGJlIGNob3Nlbi4KClNpZ25lZC1v ZmYtYnk6IERhdmUgTWFydGluIDxEYXZlLk1hcnRpbkBhcm0uY29tPgpDYzogQWxleCBCZW5uw6ll IDxhbGV4LmJlbm5lZUBsaW5hcm8ub3JnPgoKLS0tCgpDaGFuZ2VzIHNpbmNlIHYyCi0tLS0tLS0t LS0tLS0tLS0KCkJ1ZyBmaXhlczoKCiAqIHN2ZV9zZXRfdmVjdG9yX2xlbmd0aCgpIHNldHMgb3Ig Y2xlYXJzIFRJRl9TVkVfVkxfSU5IRVJJVCBiYXNlZAogICBvbiB0aGUgaW5jb21pbmcgZmxhZ3Ms IGJ1dCBpdCBpcyBlcnJvbmVvdXNseSBhbHdheXMgc2V0L2NsZWFyZWQKICAgZm9yIGN1cnJlbnQs IGluc3RlYWQgb2YgZm9yIHRoZSByZXF1ZXN0ZWQgdGFzay4KCiAgIEZpeGVkIHRoZXNlIG9wZXJh dGlvbnMgdG8gb3BlcmF0ZSBvbiB0aGUgdGFyZ2V0IHRhc2suCgogICBXaXRob3V0IHRoaXMgZml4 LCBhIFBUUkFDRV9TRVRSRUdTRVQgZm9yIE5UX0FSTV9TVkUgd2lsbCBjaGFuZ2UKICAgdGhlIHZl Y3RvciBsZW5ndGggaW5oZXJpdGFuY2UgbW9kZSBvZiB0aGUgY2FsbGVyIGluc3RlYWQgb2YgdGhh dAogICBvZiB0aGUgdGFyZ2V0IHRhc2suCgogKiBGaXhlZCBzdmVfc2V0X3ZlY3Rvcl9sZW5ndGgo KSB0byBndWFyZCBhZ2FpbnN0IHNvZnRpcnEgaW5zdGVhZCBvZiBqdXN0CiAgIHByZWVtcHRpb24u ICBUaGlzIGlzIG5vdyBkb25lIGJ5IHN2ZV9zZXRfdmVjdG9yX2xlbmd0aCgpIGl0c2VsZgogICBp bnN0ZWFkIG9mIGl0cyBjYWxsZXIsIG5vdCBsZWFzdCBiZWNhdXNlIHN2ZV9mcmVlKCkgc2hvdWxk IHByb2JhYmx5CiAgIG5vdCBiZSBjYWxsZWQgZnJvbSBhdG9taWMgY29udGV4dC4KCiAgIChCdWcg ZGV0ZWN0ZWQgYnkgdGhlIGV4dHJhIFdBUk5fT04oKXMgaW4gdGFza19mcHNpbWRfe2xvYWQsc2F2 ZX0oKS4pCgpNaXNjZWxsYW5lb3VzOgoKICogQWRkIGNvbW1lbnRzIGV4cGxhaW5pbmcgdGhlIGlu dGVudCwgcHVycG9zZSBhbmQgYmFzaWMgY29uc3RyYWludHMKICAgZm9yIGZwc2ltZC5jIGhlbHBl cnMuCi0tLQogYXJjaC9hcm02NC9pbmNsdWRlL2FzbS9mcHNpbWQuaCB8ICAgOCArKysKIGFyY2gv YXJtNjQva2VybmVsL2Zwc2ltZC5jICAgICAgfCAxMzcgKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrLQogaW5jbHVkZS91YXBpL2xpbnV4L3ByY3RsLmggICAgICB8ICAgNSAr KwogMyBmaWxlcyBjaGFuZ2VkLCAxNDkgaW5zZXJ0aW9ucygrKSwgMSBkZWxldGlvbigtKQoKZGlm ZiAtLWdpdCBhL2FyY2gvYXJtNjQvaW5jbHVkZS9hc20vZnBzaW1kLmggYi9hcmNoL2FybTY0L2lu Y2x1ZGUvYXNtL2Zwc2ltZC5oCmluZGV4IDUyZTAxYzUuLjdkZDM5MzkgMTAwNjQ0Ci0tLSBhL2Fy Y2gvYXJtNjQvaW5jbHVkZS9hc20vZnBzaW1kLmgKKysrIGIvYXJjaC9hcm02NC9pbmNsdWRlL2Fz bS9mcHNpbWQuaApAQCAtMjAsNiArMjAsNyBAQAogCiAjaWZuZGVmIF9fQVNTRU1CTFlfXwogCisj aW5jbHVkZSA8bGludXgvY2FjaGUuaD4KICNpbmNsdWRlIDxsaW51eC9zdGRkZWYuaD4KIAogLyoK QEAgLTcxLDE3ICs3MiwyNCBAQCBleHRlcm4gdm9pZCBmcHNpbWRfdXBkYXRlX2N1cnJlbnRfc3Rh dGUoc3RydWN0IGZwc2ltZF9zdGF0ZSAqc3RhdGUpOwogCiBleHRlcm4gdm9pZCBmcHNpbWRfZmx1 c2hfdGFza19zdGF0ZShzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRhcmdldCk7CiAKKy8qIE1heGltdW0g VkwgdGhhdCBTVkUgVkwtYWdub3N0aWMgc29mdHdhcmUgY2FuIHRyYW5zcGFyZW50bHkgc3VwcG9y dCAqLworI2RlZmluZSBTVkVfVkxfQVJDSF9NQVggMHgxMDAKKwogZXh0ZXJuIHZvaWQgc3ZlX3Nh dmVfc3RhdGUodm9pZCAqc3RhdGUsIHUzMiAqcGZwc3IpOwogZXh0ZXJuIHZvaWQgc3ZlX2xvYWRf c3RhdGUodm9pZCBjb25zdCAqc3RhdGUsIHUzMiBjb25zdCAqcGZwc3IsCiAJCQkgICB1bnNpZ25l ZCBsb25nIHZxX21pbnVzXzEpOwogZXh0ZXJuIHVuc2lnbmVkIGludCBzdmVfZ2V0X3ZsKHZvaWQp OwogCitleHRlcm4gaW50IF9fcm9fYWZ0ZXJfaW5pdCBzdmVfbWF4X3ZsOworCiAjaWZkZWYgQ09O RklHX0FSTTY0X1NWRQogCiBleHRlcm4gc2l6ZV90IHN2ZV9zdGF0ZV9zaXplKHN0cnVjdCB0YXNr X3N0cnVjdCBjb25zdCAqdGFzayk7CiAKIGV4dGVybiB2b2lkIHN2ZV9hbGxvYyhzdHJ1Y3QgdGFz a19zdHJ1Y3QgKnRhc2spOwogZXh0ZXJuIHZvaWQgZnBzaW1kX3JlbGVhc2VfdGhyZWFkKHN0cnVj dCB0YXNrX3N0cnVjdCAqdGFzayk7CitleHRlcm4gaW50IHN2ZV9zZXRfdmVjdG9yX2xlbmd0aChz dHJ1Y3QgdGFza19zdHJ1Y3QgKnRhc2ssCisJCQkJIHVuc2lnbmVkIGxvbmcgdmwsIHVuc2lnbmVk IGxvbmcgZmxhZ3MpOwogCiAjZWxzZSAvKiAhIENPTkZJR19BUk02NF9TVkUgKi8KIApkaWZmIC0t Z2l0IGEvYXJjaC9hcm02NC9rZXJuZWwvZnBzaW1kLmMgYi9hcmNoL2FybTY0L2tlcm5lbC9mcHNp bWQuYwppbmRleCBmYTRlZDM0Li4zMjRjMTEyIDEwMDY0NAotLS0gYS9hcmNoL2FybTY0L2tlcm5l bC9mcHNpbWQuYworKysgYi9hcmNoL2FybTY0L2tlcm5lbC9mcHNpbWQuYwpAQCAtMTcsOCArMTcs MTAgQEAKICAqIGFsb25nIHdpdGggdGhpcyBwcm9ncmFtLiAgSWYgbm90LCBzZWUgPGh0dHA6Ly93 d3cuZ251Lm9yZy9saWNlbnNlcy8+LgogICovCiAKKyNpbmNsdWRlIDxsaW51eC9iaXRtYXAuaD4K ICNpbmNsdWRlIDxsaW51eC9ib3R0b21faGFsZi5oPgogI2luY2x1ZGUgPGxpbnV4L2J1Zy5oPgor I2luY2x1ZGUgPGxpbnV4L2NhY2hlLmg+CiAjaW5jbHVkZSA8bGludXgvY29tcGF0Lmg+CiAjaW5j bHVkZSA8bGludXgvY3B1Lmg+CiAjaW5jbHVkZSA8bGludXgvY3B1X3BtLmg+CkBAIC0yNyw2ICsy OSw3IEBACiAjaW5jbHVkZSA8bGludXgvaW5pdC5oPgogI2luY2x1ZGUgPGxpbnV4L3BlcmNwdS5o PgogI2luY2x1ZGUgPGxpbnV4L3ByZWVtcHQuaD4KKyNpbmNsdWRlIDxsaW51eC9wcmN0bC5oPgog I2luY2x1ZGUgPGxpbnV4L3B0cmFjZS5oPgogI2luY2x1ZGUgPGxpbnV4L3NjaGVkL3NpZ25hbC5o PgogI2luY2x1ZGUgPGxpbnV4L3NpZ25hbC5oPgpAQCAtMTEyLDYgKzExNSwyMCBAQCBzdGF0aWMg REVGSU5FX1BFUl9DUFUoc3RydWN0IGZwc2ltZF9zdGF0ZSAqLCBmcHNpbWRfbGFzdF9zdGF0ZSk7 CiAvKiBEZWZhdWx0IFZMIGZvciB0YXNrcyB0aGF0IGRvbid0IHNldCBpdCBleHBsaWNpdGx5OiAq Lwogc3RhdGljIGludCBzdmVfZGVmYXVsdF92bCA9IFNWRV9WTF9NSU47CiAKKyNpZmRlZiBDT05G SUdfQVJNNjRfU1ZFCisKKy8qIE1heGltdW0gc3VwcG9ydGVkIHZlY3RvciBsZW5ndGggYWNyb3Nz IGFsbCBDUFVzIChpbml0aWFsbHkgcG9pc29uZWQpICovCitpbnQgX19yb19hZnRlcl9pbml0IHN2 ZV9tYXhfdmwgPSAtMTsKKy8qIFNldCBvZiBhdmFpbGFibGUgdmVjdG9yIGxlbmd0aHMsIGFzIHZx X3RvX2JpdCh2cSk6ICovCitzdGF0aWMgREVDTEFSRV9CSVRNQVAoc3ZlX3ZxX21hcCwgU1ZFX1ZR X01BWCk7CisKKyNlbHNlIC8qICEgQ09ORklHX0FSTTY0X1NWRSAqLworCisvKiBEdW1teSBkZWNs YXJhdGlvbiBmb3IgY29kZSB0aGF0IHdpbGwgYmUgb3B0aW1pc2VkIG91dDogKi8KK2V4dGVybiBE RUNMQVJFX0JJVE1BUChzdmVfdnFfbWFwLCBTVkVfVlFfTUFYKTsKKworI2VuZGlmIC8qICEgQ09O RklHX0FSTTY0X1NWRSAqLworCiBzdGF0aWMgdm9pZCBzdmVfZnJlZShzdHJ1Y3QgdGFza19zdHJ1 Y3QgKnRhc2spCiB7CiAJa2ZyZWUodGFzay0+dGhyZWFkLnN2ZV9zdGF0ZSk7CkBAIC0yODEsNiAr Mjk4LDUwIEBAIHN0YXRpYyB2b2lkIHRhc2tfZnBzaW1kX3NhdmUodm9pZCkKIAlfX3Rhc2tfZnBz aW1kX3NhdmUoZmFsc2UpOwogfQogCisvKgorICogSGVscGVycyB0byB0cmFuc2xhdGUgYml0IGlu ZGljZXMgaW4gc3ZlX3ZxX21hcCB0byBWUSB2YWx1ZXMgKGFuZAorICogdmljZSB2ZXJzYSkuICBU aGlzIGFsbG93cyBmaW5kX25leHRfYml0KCkgdG8gYmUgdXNlZCB0byBmaW5kIHRoZQorICogX21h eGltdW1fIFZRIG5vdCBleGNlZWRpbmcgYSBjZXJ0YWluIHZhbHVlLgorICovCisKK3N0YXRpYyB1 bnNpZ25lZCBpbnQgdnFfdG9fYml0KHVuc2lnbmVkIGludCB2cSkKK3sKKwlyZXR1cm4gU1ZFX1ZR X01BWCAtIHZxOworfQorCitzdGF0aWMgdW5zaWduZWQgaW50IGJpdF90b192cSh1bnNpZ25lZCBp bnQgYml0KQoreworCWlmIChXQVJOX09OKGJpdCA+PSBTVkVfVlFfTUFYKSkKKwkJYml0ID0gU1ZF X1ZRX01BWCAtIDE7CisKKwlyZXR1cm4gU1ZFX1ZRX01BWCAtIGJpdDsKK30KKworLyoKKyAqIEFs bCB2ZWN0b3IgbGVuZ3RoIHNlbGVjdGlvbiBmcm9tIHVzZXJzcGFjZSBjb21lcyB0aHJvdWdoIGhl cmUuCisgKiBXZSdyZSBvbiBhIHNsb3cgcGF0aCwgc28gc29tZSBzYW5pdHktY2hlY2tzIGFyZSBp bmNsdWRlZC4KKyAqIElmIHRoaW5ncyBnbyB3cm9uZyB0aGVyZSdzIGEgYnVnIHNvbWV3aGVyZSwg YnV0IHRyeSB0byBmYWxsIGJhY2sgdG8gYQorICogc2FmZSBjaG9pY2UuCisgKi8KK3N0YXRpYyB1 bnNpZ25lZCBpbnQgZmluZF9zdXBwb3J0ZWRfdmVjdG9yX2xlbmd0aCh1bnNpZ25lZCBpbnQgdmwp Cit7CisJaW50IGJpdDsKKwlpbnQgbWF4X3ZsID0gc3ZlX21heF92bDsKKworCWlmIChXQVJOX09O KCFzdmVfdmxfdmFsaWQodmwpKSkKKwkJdmwgPSBTVkVfVkxfTUlOOworCisJaWYgKFdBUk5fT04o IXN2ZV92bF92YWxpZChtYXhfdmwpKSkKKwkJbWF4X3ZsID0gU1ZFX1ZMX01JTjsKKworCWlmICh2 bCA+IG1heF92bCkKKwkJdmwgPSBtYXhfdmw7CisKKwliaXQgPSBmaW5kX25leHRfYml0KHN2ZV92 cV9tYXAsIFNWRV9WUV9NQVgsCisJCQkgICAgdnFfdG9fYml0KHN2ZV92cV9mcm9tX3ZsKHZsKSkp OworCXJldHVybiBzdmVfdmxfZnJvbV92cShiaXRfdG9fdnEoYml0KSk7Cit9CisKICNkZWZpbmUg WlJFRyhzdmVfc3RhdGUsIHZxLCBuKSAoKGNoYXIgKikoc3ZlX3N0YXRlKSArCQlcCiAJKFNWRV9T SUdfWlJFR19PRkZTRVQodnEsIG4pIC0gU1ZFX1NJR19SRUdTX09GRlNFVCkpCiAKQEAgLTM3NSw2 ICs0MzYsNzYgQEAgdm9pZCBzdmVfYWxsb2Moc3RydWN0IHRhc2tfc3RydWN0ICp0YXNrKQogCUJV R19PTighdGFzay0+dGhyZWFkLnN2ZV9zdGF0ZSk7CiB9CiAKK2ludCBzdmVfc2V0X3ZlY3Rvcl9s ZW5ndGgoc3RydWN0IHRhc2tfc3RydWN0ICp0YXNrLAorCQkJICB1bnNpZ25lZCBsb25nIHZsLCB1 bnNpZ25lZCBsb25nIGZsYWdzKQoreworCWlmIChmbGFncyAmIH4odW5zaWduZWQgbG9uZykoUFJf U1ZFX1ZMX0lOSEVSSVQgfAorCQkJCSAgICAgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpKQorCQlyZXR1 cm4gLUVJTlZBTDsKKworCWlmICghc3ZlX3ZsX3ZhbGlkKHZsKSkKKwkJcmV0dXJuIC1FSU5WQUw7 CisKKwkvKgorCSAqIENsYW1wIHRvIHRoZSBtYXhpbXVtIHZlY3RvciBsZW5ndGggdGhhdCBWTC1h Z25vc3RpYyBTVkUgY29kZSBjYW4KKwkgKiB3b3JrIHdpdGguICBBIGZsYWcgbWF5IGJlIGFzc2ln bmVkIGluIHRoZSBmdXR1cmUgdG8gYWxsb3cgc2V0dGluZworCSAqIG9mIGxhcmdlciB2ZWN0b3Ig bGVuZ3RocyB3aXRob3V0IGNvbmZ1c2luZyBvbGRlciBzb2Z0d2FyZS4KKwkgKi8KKwlpZiAodmwg PiBTVkVfVkxfQVJDSF9NQVgpCisJCXZsID0gU1ZFX1ZMX0FSQ0hfTUFYOworCisJdmwgPSBmaW5k X3N1cHBvcnRlZF92ZWN0b3JfbGVuZ3RoKHZsKTsKKworCWlmIChmbGFncyAmIChQUl9TVkVfVkxf SU5IRVJJVCB8CisJCSAgICAgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpKQorCQl0YXNrLT50aHJlYWQu c3ZlX3ZsX29uZXhlYyA9IHZsOworCWVsc2UKKwkJLyogUmVzZXQgVkwgdG8gc3lzdGVtIGRlZmF1 bHQgb24gbmV4dCBleGVjOiAqLworCQl0YXNrLT50aHJlYWQuc3ZlX3ZsX29uZXhlYyA9IDA7CisK KwkvKiBPbmx5IGFjdHVhbGx5IHNldCB0aGUgVkwgaWYgbm90IGRlZmVycmVkOiAqLworCWlmIChm bGFncyAmIFBSX1NWRV9TRVRfVkxfT05FWEVDKQorCQlnb3RvIG91dDsKKworCWlmICh2bCA9PSB0 YXNrLT50aHJlYWQuc3ZlX3ZsKQorCQlnb3RvIG91dDsKKworCS8qCisJICogVG8gZW5zdXJlIHRo ZSBGUFNJTUQgYml0cyBvZiB0aGUgU1ZFIHZlY3RvciByZWdpc3RlcnMgYXJlIHByZXNlcnZlZCwK KwkgKiB3cml0ZSBhbnkgbGl2ZSByZWdpc3RlciBzdGF0ZSBiYWNrIHRvIHRhc2tfc3RydWN0LCBh bmQgY29udmVydCB0byBhCisJICogbm9uLVNWRSB0aHJlYWQuCisJICovCisJaWYgKHRhc2sgPT0g Y3VycmVudCkgeworCQlsb2NhbF9iaF9kaXNhYmxlKCk7CisKKwkJdGFza19mcHNpbWRfc2F2ZSgp OworCQlzZXRfdGhyZWFkX2ZsYWcoVElGX0ZPUkVJR05fRlBTVEFURSk7CisJfQorCisJZnBzaW1k X2ZsdXNoX3Rhc2tfc3RhdGUodGFzayk7CisJaWYgKHRlc3RfYW5kX2NsZWFyX3Rza190aHJlYWRf ZmxhZyh0YXNrLCBUSUZfU1ZFKSkKKwkJc3ZlX3RvX2Zwc2ltZCh0YXNrKTsKKworCWlmICh0YXNr ID09IGN1cnJlbnQpCisJCWxvY2FsX2JoX2VuYWJsZSgpOworCisJLyoKKwkgKiBGb3JjZSByZWFs bG9jYXRpb24gb2YgdGFzayBTVkUgc3RhdGUgdG8gdGhlIGNvcnJlY3Qgc2l6ZQorCSAqIG9uIG5l eHQgdXNlOgorCSAqLworCXN2ZV9mcmVlKHRhc2spOworCisJdGFzay0+dGhyZWFkLnN2ZV92bCA9 IHZsOworCitvdXQ6CisJaWYgKGZsYWdzICYgUFJfU1ZFX1ZMX0lOSEVSSVQpCisJCXNldF90c2tf dGhyZWFkX2ZsYWcodGFzaywgVElGX1NWRV9WTF9JTkhFUklUKTsKKwllbHNlCisJCWNsZWFyX3Rz a190aHJlYWRfZmxhZyh0YXNrLCBUSUZfU1ZFX1ZMX0lOSEVSSVQpOworCisJcmV0dXJuIDA7Cit9 CisKIHZvaWQgZnBzaW1kX3JlbGVhc2VfdGhyZWFkKHN0cnVjdCB0YXNrX3N0cnVjdCAqZGVhZF90 YXNrKQogewogCXN2ZV9mcmVlKGRlYWRfdGFzayk7CkBAIC00ODcsNyArNjE4LDcgQEAgdm9pZCBm cHNpbWRfdGhyZWFkX3N3aXRjaChzdHJ1Y3QgdGFza19zdHJ1Y3QgKm5leHQpCiAKIHZvaWQgZnBz aW1kX2ZsdXNoX3RocmVhZCh2b2lkKQogewotCWludCB2bDsKKwlpbnQgdmwsIHN1cHBvcnRlZF92 bDsKIAogCWlmICghc3lzdGVtX3N1cHBvcnRzX2Zwc2ltZCgpKQogCQlyZXR1cm47CkBAIC01MTUs NiArNjQ2LDEwIEBAIHZvaWQgZnBzaW1kX2ZsdXNoX3RocmVhZCh2b2lkKQogCQlpZiAoV0FSTl9P Tighc3ZlX3ZsX3ZhbGlkKHZsKSkpCiAJCQl2bCA9IFNWRV9WTF9NSU47CiAKKwkJc3VwcG9ydGVk X3ZsID0gZmluZF9zdXBwb3J0ZWRfdmVjdG9yX2xlbmd0aCh2bCk7CisJCWlmIChXQVJOX09OKHN1 cHBvcnRlZF92bCAhPSB2bCkpCisJCQl2bCA9IHN1cHBvcnRlZF92bDsKKwogCQljdXJyZW50LT50 aHJlYWQuc3ZlX3ZsID0gdmw7CiAKIAkJLyoKZGlmZiAtLWdpdCBhL2luY2x1ZGUvdWFwaS9saW51 eC9wcmN0bC5oIGIvaW5jbHVkZS91YXBpL2xpbnV4L3ByY3RsLmgKaW5kZXggYThkMDc1OS4uMWI2 NDkwMSAxMDA2NDQKLS0tIGEvaW5jbHVkZS91YXBpL2xpbnV4L3ByY3RsLmgKKysrIGIvaW5jbHVk ZS91YXBpL2xpbnV4L3ByY3RsLmgKQEAgLTE5Nyw0ICsxOTcsOSBAQCBzdHJ1Y3QgcHJjdGxfbW1f bWFwIHsKICMgZGVmaW5lIFBSX0NBUF9BTUJJRU5UX0xPV0VSCQkzCiAjIGRlZmluZSBQUl9DQVBf QU1CSUVOVF9DTEVBUl9BTEwJNAogCisvKiBhcm02NCBTY2FsYWJsZSBWZWN0b3IgRXh0ZW5zaW9u IGNvbnRyb2xzICovCisjIGRlZmluZSBQUl9TVkVfU0VUX1ZMX09ORVhFQwkJKDEgPDwgMTgpIC8q IGRlZmVyIGVmZmVjdCB1bnRpbCBleGVjICovCisjIGRlZmluZSBQUl9TVkVfVkxfTEVOX01BU0sJ CTB4ZmZmZgorIyBkZWZpbmUgUFJfU1ZFX1ZMX0lOSEVSSVQJCSgxIDw8IDE3KSAvKiBpbmhlcml0 IGFjcm9zcyBleGVjICovCisKICNlbmRpZiAvKiBfTElOVVhfUFJDVExfSCAqLwotLSAKMi4xLjQK Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmt2bWFybSBt YWlsaW5nIGxpc3QKa3ZtYXJtQGxpc3RzLmNzLmNvbHVtYmlhLmVkdQpodHRwczovL2xpc3RzLmNz LmNvbHVtYmlhLmVkdS9tYWlsbWFuL2xpc3RpbmZvL2t2bWFybQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com ([217.140.101.70]:48996 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932492AbdJJSjY (ORCPT ); Tue, 10 Oct 2017 14:39:24 -0400 From: Dave Martin Subject: [PATCH v3 14/28] arm64/sve: Backend logic for setting the vector length Date: Tue, 10 Oct 2017 19:38:31 +0100 Message-ID: <1507660725-7986-15-git-send-email-Dave.Martin@arm.com> In-Reply-To: <1507660725-7986-1-git-send-email-Dave.Martin@arm.com> References: <1507660725-7986-1-git-send-email-Dave.Martin@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Szabolcs Nagy , Richard Sandiford , Okamoto Takayuki , kvmarm@lists.cs.columbia.edu, libc-alpha@sourceware.org, linux-arch@vger.kernel.org Message-ID: <20171010183831.WyEZ3WE3TRcqqfcSP0ymlUqW7q2WCMhdSELVNfFZFPI@z> This patch implements the core logic for changing a task's vector length on request from userspace. This will be used by the ptrace and prctl frontends that are implemented in later patches. The SVE architecture permits, but does not require, implementations to support vector lengths that are not a power of two. To handle this, logic is added to check a requested vector length against a possibly sparse bitmap of available vector lengths at runtime, so that the best supported value can be chosen. Signed-off-by: Dave Martin Cc: Alex Bennée --- Changes since v2 ---------------- Bug fixes: * sve_set_vector_length() sets or clears TIF_SVE_VL_INHERIT based on the incoming flags, but it is erroneously always set/cleared for current, instead of for the requested task. Fixed these operations to operate on the target task. Without this fix, a PTRACE_SETREGSET for NT_ARM_SVE will change the vector length inheritance mode of the caller instead of that of the target task. * Fixed sve_set_vector_length() to guard against softirq instead of just preemption. This is now done by sve_set_vector_length() itself instead of its caller, not least because sve_free() should probably not be called from atomic context. (Bug detected by the extra WARN_ON()s in task_fpsimd_{load,save}().) Miscellaneous: * Add comments explaining the intent, purpose and basic constraints for fpsimd.c helpers. --- arch/arm64/include/asm/fpsimd.h | 8 +++ arch/arm64/kernel/fpsimd.c | 137 +++++++++++++++++++++++++++++++++++++++- include/uapi/linux/prctl.h | 5 ++ 3 files changed, 149 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 52e01c5..7dd3939 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -20,6 +20,7 @@ #ifndef __ASSEMBLY__ +#include #include /* @@ -71,17 +72,24 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); extern void fpsimd_flush_task_state(struct task_struct *target); +/* Maximum VL that SVE VL-agnostic software can transparently support */ +#define SVE_VL_ARCH_MAX 0x100 + extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); +extern int __ro_after_init sve_max_vl; + #ifdef CONFIG_ARM64_SVE extern size_t sve_state_size(struct task_struct const *task); extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_thread(struct task_struct *task); +extern int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags); #else /* ! CONFIG_ARM64_SVE */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index fa4ed34..324c112 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -17,8 +17,10 @@ * along with this program. If not, see . */ +#include #include #include +#include #include #include #include @@ -27,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -112,6 +115,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state); /* Default VL for tasks that don't set it explicitly: */ static int sve_default_vl = SVE_VL_MIN; +#ifdef CONFIG_ARM64_SVE + +/* Maximum supported vector length across all CPUs (initially poisoned) */ +int __ro_after_init sve_max_vl = -1; +/* Set of available vector lengths, as vq_to_bit(vq): */ +static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#else /* ! CONFIG_ARM64_SVE */ + +/* Dummy declaration for code that will be optimised out: */ +extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#endif /* ! CONFIG_ARM64_SVE */ + static void sve_free(struct task_struct *task) { kfree(task->thread.sve_state); @@ -281,6 +298,50 @@ static void task_fpsimd_save(void) __task_fpsimd_save(false); } +/* + * Helpers to translate bit indices in sve_vq_map to VQ values (and + * vice versa). This allows find_next_bit() to be used to find the + * _maximum_ VQ not exceeding a certain value. + */ + +static unsigned int vq_to_bit(unsigned int vq) +{ + return SVE_VQ_MAX - vq; +} + +static unsigned int bit_to_vq(unsigned int bit) +{ + if (WARN_ON(bit >= SVE_VQ_MAX)) + bit = SVE_VQ_MAX - 1; + + return SVE_VQ_MAX - bit; +} + +/* + * All vector length selection from userspace comes through here. + * We're on a slow path, so some sanity-checks are included. + * If things go wrong there's a bug somewhere, but try to fall back to a + * safe choice. + */ +static unsigned int find_supported_vector_length(unsigned int vl) +{ + int bit; + int max_vl = sve_max_vl; + + if (WARN_ON(!sve_vl_valid(vl))) + vl = SVE_VL_MIN; + + if (WARN_ON(!sve_vl_valid(max_vl))) + max_vl = SVE_VL_MIN; + + if (vl > max_vl) + vl = max_vl; + + bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, + vq_to_bit(sve_vq_from_vl(vl))); + return sve_vl_from_vq(bit_to_vq(bit)); +} + #define ZREG(sve_state, vq, n) ((char *)(sve_state) + \ (SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET)) @@ -375,6 +436,76 @@ void sve_alloc(struct task_struct *task) BUG_ON(!task->thread.sve_state); } +int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags) +{ + if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + return -EINVAL; + + if (!sve_vl_valid(vl)) + return -EINVAL; + + /* + * Clamp to the maximum vector length that VL-agnostic SVE code can + * work with. A flag may be assigned in the future to allow setting + * of larger vector lengths without confusing older software. + */ + if (vl > SVE_VL_ARCH_MAX) + vl = SVE_VL_ARCH_MAX; + + vl = find_supported_vector_length(vl); + + if (flags & (PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + task->thread.sve_vl_onexec = vl; + else + /* Reset VL to system default on next exec: */ + task->thread.sve_vl_onexec = 0; + + /* Only actually set the VL if not deferred: */ + if (flags & PR_SVE_SET_VL_ONEXEC) + goto out; + + if (vl == task->thread.sve_vl) + goto out; + + /* + * To ensure the FPSIMD bits of the SVE vector registers are preserved, + * write any live register state back to task_struct, and convert to a + * non-SVE thread. + */ + if (task == current) { + local_bh_disable(); + + task_fpsimd_save(); + set_thread_flag(TIF_FOREIGN_FPSTATE); + } + + fpsimd_flush_task_state(task); + if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) + sve_to_fpsimd(task); + + if (task == current) + local_bh_enable(); + + /* + * Force reallocation of task SVE state to the correct size + * on next use: + */ + sve_free(task); + + task->thread.sve_vl = vl; + +out: + if (flags & PR_SVE_VL_INHERIT) + set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + else + clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + + return 0; +} + void fpsimd_release_thread(struct task_struct *dead_task) { sve_free(dead_task); @@ -487,7 +618,7 @@ void fpsimd_thread_switch(struct task_struct *next) void fpsimd_flush_thread(void) { - int vl; + int vl, supported_vl; if (!system_supports_fpsimd()) return; @@ -515,6 +646,10 @@ void fpsimd_flush_thread(void) if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN; + supported_vl = find_supported_vector_length(vl); + if (WARN_ON(supported_vl != vl)) + vl = supported_vl; + current->thread.sve_vl = vl; /* diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a8d0759..1b64901 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -197,4 +197,9 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* arm64 Scalable Vector Extension controls */ +# define PR_SVE_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */ +# define PR_SVE_VL_LEN_MASK 0xffff +# define PR_SVE_VL_INHERIT (1 << 17) /* inherit across exec */ + #endif /* _LINUX_PRCTL_H */ -- 2.1.4