From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex =?utf-8?Q?Benn=C3=A9e?= Subject: Re: [PATCH 14/27] arm64/sve: Backend logic for setting the vector length Date: Wed, 23 Aug 2017 16:33:18 +0100 Message-ID: <87shgi9u4h.fsf@linaro.org> References: <1502280338-23002-1-git-send-email-Dave.Martin@arm.com> <1502280338-23002-15-git-send-email-Dave.Martin@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-reply-to: <1502280338-23002-15-git-send-email-Dave.Martin@arm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Dave Martin Cc: linux-arch@vger.kernel.org, libc-alpha@sourceware.org, Ard Biesheuvel , Szabolcs Nagy , gdb@sourceware.org, Yao Qi , Alan Hayward , Will Deacon , Richard Sandiford , Catalin Marinas , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org List-Id: linux-arch.vger.kernel.org CkRhdmUgTWFydGluIDxEYXZlLk1hcnRpbkBhcm0uY29tPiB3cml0ZXM6Cgo+IFRoaXMgcGF0Y2gg aW1wbGVtZW50cyB0aGUgY29yZSBsb2dpYyBmb3IgY2hhbmdpbmcgYSB0YXNrJ3MgdmVjdG9yCj4g bGVuZ3RoIG9uIHJlcXVlc3QgZnJvbSB1c2Vyc3BhY2UuICBUaGlzIHdpbGwgYmUgdXNlZCBieSB0 aGUgcHRyYWNlCj4gYW5kIHByY3RsIGZyb250ZW5kcyB0aGF0IGFyZSBpbXBsZW1lbnRlZCBpbiBs YXRlciBwYXRjaGVzLgo+Cj4gVGhlIFNWRSBhcmNoaXRlY3R1cmUgcGVybWl0cywgYnV0IGRvZXMg bm90IHJlcXVpcmUsIGltcGxlbWVudGF0aW9ucwo+IHRvIHN1cHBvcnQgdmVjdG9yIGxlbmd0aHMg dGhhdCBhcmUgbm90IGEgcG93ZXIgb2YgdHdvLiAgVG8gaGFuZGxlCj4gdGhpcywgbG9naWMgaXMg YWRkZWQgdG8gY2hlY2sgYSByZXF1ZXN0ZWQgdmVjdG9yIGxlbmd0aCBhZ2FpbnN0IGEKPiBwb3Nz aWJseSBzcGFyc2UgYml0bWFwIG9mIGF2YWlsYWJsZSB2ZWN0b3IgbGVuZ3RocyBhdCBydW50aW1l LCBzbwo+IHRoYXQgdGhlIGJlc3Qgc3VwcG9ydGVkIHZhbHVlIGNhbiBiZSBjaG9zZW4uCj4KPiBT aWduZWQtb2ZmLWJ5OiBEYXZlIE1hcnRpbiA8RGF2ZS5NYXJ0aW5AYXJtLmNvbT4KPiAtLS0KPiAg YXJjaC9hcm02NC9pbmNsdWRlL2FzbS9mcHNpbWQuaCB8ICAgNiArKysKPiAgYXJjaC9hcm02NC9r ZXJuZWwvZnBzaW1kLmMgICAgICB8IDExNiArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrCj4gIGluY2x1ZGUvdWFwaS9saW51eC9wcmN0bC5oICAgICAgfCAgIDUgKysKPiAg MyBmaWxlcyBjaGFuZ2VkLCAxMjcgaW5zZXJ0aW9ucygrKQo+Cj4gZGlmZiAtLWdpdCBhL2FyY2gv YXJtNjQvaW5jbHVkZS9hc20vZnBzaW1kLmggYi9hcmNoL2FybTY0L2luY2x1ZGUvYXNtL2Zwc2lt ZC5oCj4gaW5kZXggN2VmZDA0ZS4uMzliMjZkMiAxMDA2NDQKPiAtLS0gYS9hcmNoL2FybTY0L2lu Y2x1ZGUvYXNtL2Zwc2ltZC5oCj4gKysrIGIvYXJjaC9hcm02NC9pbmNsdWRlL2FzbS9mcHNpbWQu aAo+IEBAIC03MCwxMSArNzAsMTUgQEAgZXh0ZXJuIHZvaWQgZnBzaW1kX3VwZGF0ZV9jdXJyZW50 X3N0YXRlKHN0cnVjdCBmcHNpbWRfc3RhdGUgKnN0YXRlKTsKPgo+ICBleHRlcm4gdm9pZCBmcHNp bWRfZmx1c2hfdGFza19zdGF0ZShzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRhcmdldCk7Cj4KPiArI2Rl ZmluZSBTVkVfVkxfQVJDSF9NQVggMHgxMDAKPiArCgpIbW0gdGhpcyBpc24ndCB0aGUgc2FtZSBh cyBTVkVfVkxfTUFYLiBXaHkgYXJlbid0IHdlIHVzaW5nIHRoYXQ/Cgo+ICBleHRlcm4gdm9pZCBz dmVfc2F2ZV9zdGF0ZSh2b2lkICpzdGF0ZSwgdTMyICpwZnBzcik7Cj4gIGV4dGVybiB2b2lkIHN2 ZV9sb2FkX3N0YXRlKHZvaWQgY29uc3QgKnN0YXRlLCB1MzIgY29uc3QgKnBmcHNyLAo+ICAJCQkg ICB1bnNpZ25lZCBsb25nIHZxX21pbnVzXzEpOwo+ICBleHRlcm4gdW5zaWduZWQgaW50IHN2ZV9n ZXRfdmwodm9pZCk7Cj4KPiArZXh0ZXJuIGludCBzdmVfbWF4X3ZsOwo+ICsKPiAgI2lmZGVmIENP TkZJR19BUk02NF9TVkUKPgo+ICBleHRlcm4gc2l6ZV90IHN2ZV9zdGF0ZV9zaXplKHN0cnVjdCB0 YXNrX3N0cnVjdCBjb25zdCAqdGFzayk7Cj4gQEAgLTgzLDYgKzg3LDggQEAgZXh0ZXJuIHZvaWQg c3ZlX2FsbG9jKHN0cnVjdCB0YXNrX3N0cnVjdCAqdGFzayk7Cj4gIGV4dGVybiB2b2lkIGZwc2lt ZF9yZWxlYXNlX3RocmVhZChzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRhc2spOwo+ICBleHRlcm4gdm9p ZCBmcHNpbWRfZHVwX3N2ZShzdHJ1Y3QgdGFza19zdHJ1Y3QgKmRzdCwKPiAgCQkJICAgc3RydWN0 IHRhc2tfc3RydWN0IGNvbnN0ICpzcmMpOwo+ICtleHRlcm4gaW50IHN2ZV9zZXRfdmVjdG9yX2xl bmd0aChzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRhc2ssCj4gKwkJCQkgdW5zaWduZWQgbG9uZyB2bCwg dW5zaWduZWQgbG9uZyBmbGFncyk7Cj4KPiAgI2Vsc2UgLyogISBDT05GSUdfQVJNNjRfU1ZFICov Cj4KPiBkaWZmIC0tZ2l0IGEvYXJjaC9hcm02NC9rZXJuZWwvZnBzaW1kLmMgYi9hcmNoL2FybTY0 L2tlcm5lbC9mcHNpbWQuYwo+IGluZGV4IGU4Njc0ZjYuLmJjZTk1ZGUgMTAwNjQ0Cj4gLS0tIGEv YXJjaC9hcm02NC9rZXJuZWwvZnBzaW1kLmMKPiArKysgYi9hcmNoL2FybTY0L2tlcm5lbC9mcHNp bWQuYwo+IEBAIC0xOCwxMiArMTgsMTQgQEAKPiAgICovCj4KPiAgI2luY2x1ZGUgPGxpbnV4L2Jv dHRvbV9oYWxmLmg+Cj4gKyNpbmNsdWRlIDxsaW51eC9iaXRtYXAuaD4KPiAgI2luY2x1ZGUgPGxp bnV4L2NwdS5oPgo+ICAjaW5jbHVkZSA8bGludXgvY3B1X3BtLmg+Cj4gICNpbmNsdWRlIDxsaW51 eC9rZXJuZWwuaD4KPiAgI2luY2x1ZGUgPGxpbnV4L2luaXQuaD4KPiAgI2luY2x1ZGUgPGxpbnV4 L3BlcmNwdS5oPgo+ICAjaW5jbHVkZSA8bGludXgvcHJlZW1wdC5oPgo+ICsjaW5jbHVkZSA8bGlu dXgvcHJjdGwuaD4KPiAgI2luY2x1ZGUgPGxpbnV4L3B0cmFjZS5oPgo+ICAjaW5jbHVkZSA8bGlu dXgvc2NoZWQvc2lnbmFsLmg+Cj4gICNpbmNsdWRlIDxsaW51eC9zaWduYWwuaD4KPiBAQCAtMTEx LDYgKzExMywyMCBAQCBzdGF0aWMgREVGSU5FX1BFUl9DUFUoc3RydWN0IGZwc2ltZF9zdGF0ZSAq LCBmcHNpbWRfbGFzdF9zdGF0ZSk7Cj4gIC8qIERlZmF1bHQgVkwgZm9yIHRhc2tzIHRoYXQgZG9u J3Qgc2V0IGl0IGV4cGxpY2l0bHk6ICovCj4gIHN0YXRpYyBpbnQgc3ZlX2RlZmF1bHRfdmwgPSAt MTsKPgo+ICsjaWZkZWYgQ09ORklHX0FSTTY0X1NWRQo+ICsKPiArLyogTWF4aW11bSBzdXBwb3J0 ZWQgdmVjdG9yIGxlbmd0aCBhY3Jvc3MgYWxsIENQVXMgKGluaXRpYWxseSBwb2lzb25lZCkgKi8K PiAraW50IHN2ZV9tYXhfdmwgPSAtMTsKPiArLyogU2V0IG9mIGF2YWlsYWJsZSB2ZWN0b3IgbGVu Z3RocywgYXMgdnFfdG9fYml0KHZxKTogKi8KPiArc3RhdGljIERFQ0xBUkVfQklUTUFQKHN2ZV92 cV9tYXAsIFNWRV9WUV9NQVgpOwo+ICsKPiArI2Vsc2UgLyogISBDT05GSUdfQVJNNjRfU1ZFICov Cj4gKwo+ICsvKiBEdW1teSBkZWNsYXJhdGlvbiBmb3IgY29kZSB0aGF0IHdpbGwgYmUgb3B0aW1p c2VkIG91dDogKi8KPiArZXh0ZXJuIERFQ0xBUkVfQklUTUFQKHN2ZV92cV9tYXAsIFNWRV9WUV9N QVgpOwo+ICsKPiArI2VuZGlmIC8qICEgQ09ORklHX0FSTTY0X1NWRSAqLwo+ICsKPiAgc3RhdGlj IHZvaWQgc3ZlX2ZyZWUoc3RydWN0IHRhc2tfc3RydWN0ICp0YXNrKQo+ICB7Cj4gIAlrZnJlZSh0 YXNrLT50aHJlYWQuc3ZlX3N0YXRlKTsKPiBAQCAtMTQ4LDYgKzE2NCwzNyBAQCBzdGF0aWMgdm9p ZCBjaGFuZ2VfY3BhY3IodTY0IG9sZCwgdTY0IG5ldykKPiAgCQl3cml0ZV9zeXNyZWcobmV3LCBD UEFDUl9FTDEpOwo+ICB9Cj4KPiArc3RhdGljIHVuc2lnbmVkIGludCB2cV90b19iaXQodW5zaWdu ZWQgaW50IHZxKQo+ICt7Cj4gKwlCVUlMRF9CVUdfT04odnEgPCAxIHx8IHZxID4gU1ZFX1ZRX01B WCk7Cj4gKwo+ICsJcmV0dXJuIFNWRV9WUV9NQVggLSB2cTsKPiArfQo+ICsKPiArc3RhdGljIHVu c2lnbmVkIGludCBiaXRfdG9fdnEodW5zaWduZWQgaW50IGJpdCkKPiArewo+ICsJQlVJTERfQlVH X09OKGJpdCA+PSBTVkVfVlFfTUFYKTsKPiArCj4gKwlyZXR1cm4gU1ZFX1ZRX01BWCAtIGJpdDsK PiArfQo+ICsKPiArc3RhdGljIHVuc2lnbmVkIGludCBmaW5kX3N1cHBvcnRlZF92ZWN0b3JfbGVu Z3RoKHVuc2lnbmVkIGludCB2bCkKPiArewo+ICsJaW50IGJpdDsKPiArCj4gKwlCVUdfT04oIXN2 ZV92bF92YWxpZCh2bCkpOwo+ICsKPiArCUJVR19PTighc3ZlX3ZsX3ZhbGlkKHN2ZV9tYXhfdmwp KTsKPiArCWlmICh2bCA+IHN2ZV9tYXhfdmwpCj4gKwkJdmwgPSBzdmVfbWF4X3ZsOwo+ICsKPiAr CWJpdCA9IGZpbmRfbmV4dF9iaXQoc3ZlX3ZxX21hcCwgU1ZFX1ZRX01BWCwKPiArCQkJICAgIHZx X3RvX2JpdChzdmVfdnFfZnJvbV92bCh2bCkpKTsKPiArCUJVR19PTihiaXQgPCAwIHx8IGJpdCA+ PSBTVkVfVlFfTUFYKTsKPiArCj4gKwlyZXR1cm4gMTYgKiBiaXRfdG9fdnEoYml0KTsKPiArfQo+ ICsKPiAgI2RlZmluZSBaUkVHKHN2ZV9zdGF0ZSwgdnEsIG4pICgoY2hhciAqKShzdmVfc3RhdGUp ICsJCVwKPiAgCShTVkVfU0lHX1pSRUdfT0ZGU0VUKHZxLCBuKSAtIFNWRV9TSUdfUkVHU19PRkZT RVQpKQo+Cj4gQEAgLTIzNSw2ICsyODIsNzMgQEAgdm9pZCBmcHNpbWRfZHVwX3N2ZShzdHJ1Y3Qg dGFza19zdHJ1Y3QgKmRzdCwgc3RydWN0IHRhc2tfc3RydWN0IGNvbnN0ICpzcmMpCj4gIAl9Cj4g IH0KPgo+ICtpbnQgc3ZlX3NldF92ZWN0b3JfbGVuZ3RoKHN0cnVjdCB0YXNrX3N0cnVjdCAqdGFz aywKPiArCQkJICB1bnNpZ25lZCBsb25nIHZsLCB1bnNpZ25lZCBsb25nIGZsYWdzKQo+ICt7Cj4g KwlCVUdfT04odGFzayA9PSBjdXJyZW50ICYmIHByZWVtcHRpYmxlKCkpOwo+ICsKPiArCWlmIChm bGFncyAmIH4odW5zaWduZWQgbG9uZykoUFJfU1ZFX1ZMX0lOSEVSSVQgfAo+ICsJCQkJICAgICBQ Ul9TVkVfU0VUX1ZMX09ORVhFQykpCj4gKwkJcmV0dXJuIC1FSU5WQUw7Cj4gKwo+ICsJaWYgKCFz dmVfdmxfdmFsaWQodmwpKQo+ICsJCXJldHVybiAtRUlOVkFMOwo+ICsKPiArCS8qCj4gKwkgKiBD bGFtcCB0byB0aGUgbWF4aW11bSB2ZWN0b3IgbGVuZ3RoIHRoYXQgVkwtYWdub3N0aWMgU1ZFIGNv ZGUgY2FuCj4gKwkgKiB3b3JrIHdpdGguICBBIGZsYWcgbWF5IGJlIGFzc2lnbmVkIGluIHRoZSBm dXR1cmUgdG8gYWxsb3cgc2V0dGluZwo+ICsJICogb2YgbGFyZ2VyIHZlY3RvciBsZW5ndGhzIHdp dGhvdXQgY29uZnVzaW5nIG9sZGVyIHNvZnR3YXJlLgo+ICsJICovCj4gKwlpZiAodmwgPiBTVkVf VkxfQVJDSF9NQVgpCj4gKwkJdmwgPSBTVkVfVkxfQVJDSF9NQVg7Cj4gKwo+ICsJdmwgPSBmaW5k X3N1cHBvcnRlZF92ZWN0b3JfbGVuZ3RoKHZsKTsKPiArCj4gKwlpZiAoZmxhZ3MgJiAoUFJfU1ZF X1ZMX0lOSEVSSVQgfAo+ICsJCSAgICAgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpKQo+ICsJCXRhc2st PnRocmVhZC5zdmVfdmxfb25leGVjID0gdmw7Cj4gKwllbHNlCj4gKwkJLyogUmVzZXQgVkwgdG8g c3lzdGVtIGRlZmF1bHQgb24gbmV4dCBleGVjOiAqLwo+ICsJCXRhc2stPnRocmVhZC5zdmVfdmxf b25leGVjID0gMDsKPiArCj4gKwkvKiBPbmx5IGFjdHVhbGx5IHNldCB0aGUgVkwgaWYgbm90IGRl ZmVycmVkOiAqLwo+ICsJaWYgKGZsYWdzICYgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpCj4gKwkJZ290 byBvdXQ7Cj4gKwo+ICsJLyoKPiArCSAqIFRvIGVuc3VyZSB0aGUgRlBTSU1EIGJpdHMgb2YgdGhl IFNWRSB2ZWN0b3IgcmVnaXN0ZXJzIGFyZSBwcmVzZXJ2ZWQsCj4gKwkgKiB3cml0ZSBhbnkgbGl2 ZSByZWdpc3RlciBzdGF0ZSBiYWNrIHRvIHRhc2tfc3RydWN0LCBhbmQgY29udmVydCB0byBhCj4g KwkgKiBub24tU1ZFIHRocmVhZC4KPiArCSAqLwo+ICsJaWYgKHZsICE9IHRhc2stPnRocmVhZC5z dmVfdmwpIHsKPiArCQlpZiAodGFzayA9PSBjdXJyZW50KSB7Cj4gKwkJCXRhc2tfZnBzaW1kX3Nh dmUoKTsKPiArCQkJc2V0X3RocmVhZF9mbGFnKFRJRl9GT1JFSUdOX0ZQU1RBVEUpOwo+ICsJCX0K PiArCj4gKwkJaWYgKHRlc3RfYW5kX2NsZWFyX3Rza190aHJlYWRfZmxhZyh0YXNrLCBUSUZfU1ZF KSkKPiArCQkJc3ZlX3RvX2Zwc2ltZCh0YXNrKTsKPiArCj4gKwkJLyoKPiArCQkgKiBGb3JjZSBy ZWFsbG9jYXRpb24gb2YgdGFzayBTVkUgc3RhdGUgdG8gdGhlIGNvcnJlY3Qgc2l6ZQo+ICsJCSAq IG9uIG5leHQgdXNlOgo+ICsJCSAqLwo+ICsJCXN2ZV9mcmVlKHRhc2spOwo+ICsJfQo+ICsKPiAr CXRhc2stPnRocmVhZC5zdmVfdmwgPSB2bDsKPiArCj4gKwlmcHNpbWRfZmx1c2hfdGFza19zdGF0 ZSh0YXNrKTsKPiArCj4gK291dDoKPiArCWlmIChmbGFncyAmIFBSX1NWRV9WTF9JTkhFUklUKQo+ ICsJCXNldF90aHJlYWRfZmxhZyhUSUZfU1ZFX1ZMX0lOSEVSSVQpOwo+ICsJZWxzZQo+ICsJCWNs ZWFyX3RocmVhZF9mbGFnKFRJRl9TVkVfVkxfSU5IRVJJVCk7Cj4gKwo+ICsJcmV0dXJuIDA7Cj4g K30KPiArCj4gIHZvaWQgZnBzaW1kX3JlbGVhc2VfdGhyZWFkKHN0cnVjdCB0YXNrX3N0cnVjdCAq ZGVhZF90YXNrKQo+ICB7Cj4gIAlzdmVfZnJlZShkZWFkX3Rhc2spOwo+IEBAIC00MDcsNiArNTIx LDggQEAgdm9pZCBmcHNpbWRfZmx1c2hfdGhyZWFkKHZvaWQpCj4gIAkJICogSWYgbm90LCBzb21l dGhpbmcgd2VudCBiYWRseSB3cm9uZy4KPiAgCQkgKi8KPiAgCQlCVUdfT04oIXN2ZV92bF92YWxp ZChjdXJyZW50LT50aHJlYWQuc3ZlX3ZsKSk7Cj4gKwkJQlVHX09OKGZpbmRfc3VwcG9ydGVkX3Zl Y3Rvcl9sZW5ndGgoY3VycmVudC0+dGhyZWFkLnN2ZV92bCkgIT0KPiArCQkgICAgICAgY3VycmVu dC0+dGhyZWFkLnN2ZV92bCk7Cj4KPiAgCQkvKgo+ICAJCSAqIElmIHRoZSB0YXNrIGlzIG5vdCBz ZXQgdG8gaW5oZXJpdCwgZW5zdXJlIHRoYXQgdGhlIHZlY3Rvcgo+IGRpZmYgLS1naXQgYS9pbmNs dWRlL3VhcGkvbGludXgvcHJjdGwuaCBiL2luY2x1ZGUvdWFwaS9saW51eC9wcmN0bC5oCj4gaW5k ZXggYThkMDc1OS4uMWI2NDkwMSAxMDA2NDQKPiAtLS0gYS9pbmNsdWRlL3VhcGkvbGludXgvcHJj dGwuaAo+ICsrKyBiL2luY2x1ZGUvdWFwaS9saW51eC9wcmN0bC5oCj4gQEAgLTE5Nyw0ICsxOTcs OSBAQCBzdHJ1Y3QgcHJjdGxfbW1fbWFwIHsKPiAgIyBkZWZpbmUgUFJfQ0FQX0FNQklFTlRfTE9X RVIJCTMKPiAgIyBkZWZpbmUgUFJfQ0FQX0FNQklFTlRfQ0xFQVJfQUxMCTQKPgo+ICsvKiBhcm02 NCBTY2FsYWJsZSBWZWN0b3IgRXh0ZW5zaW9uIGNvbnRyb2xzICovCj4gKyMgZGVmaW5lIFBSX1NW RV9TRVRfVkxfT05FWEVDCQkoMSA8PCAxOCkgLyogZGVmZXIgZWZmZWN0IHVudGlsIGV4ZWMgKi8K PiArIyBkZWZpbmUgUFJfU1ZFX1ZMX0xFTl9NQVNLCQkweGZmZmYKPiArIyBkZWZpbmUgUFJfU1ZF X1ZMX0lOSEVSSVQJCSgxIDw8IDE3KSAvKiBpbmhlcml0IGFjcm9zcyBleGVjICovCj4gKwo+ICAj ZW5kaWYgLyogX0xJTlVYX1BSQ1RMX0ggKi8KCgotLQpBbGV4IEJlbm7DqWUKX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18Ka3ZtYXJtIG1haWxpbmcgbGlzdApr dm1hcm1AbGlzdHMuY3MuY29sdW1iaWEuZWR1Cmh0dHBzOi8vbGlzdHMuY3MuY29sdW1iaWEuZWR1 L21haWxtYW4vbGlzdGluZm8va3ZtYXJtCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f180.google.com ([209.85.128.180]:36923 "EHLO mail-wr0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932074AbdHWPdV (ORCPT ); Wed, 23 Aug 2017 11:33:21 -0400 Received: by mail-wr0-f180.google.com with SMTP id z91so1522990wrc.4 for ; Wed, 23 Aug 2017 08:33:20 -0700 (PDT) References: <1502280338-23002-1-git-send-email-Dave.Martin@arm.com> <1502280338-23002-15-git-send-email-Dave.Martin@arm.com> From: Alex =?utf-8?Q?Benn=C3=A9e?= Subject: Re: [PATCH 14/27] arm64/sve: Backend logic for setting the vector length In-reply-to: <1502280338-23002-15-git-send-email-Dave.Martin@arm.com> Date: Wed, 23 Aug 2017 16:33:18 +0100 Message-ID: <87shgi9u4h.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: Dave Martin Cc: linux-arm-kernel@lists.infradead.org, linux-arch@vger.kernel.org, libc-alpha@sourceware.org, gdb@sourceware.org, Ard Biesheuvel , Szabolcs Nagy , Catalin Marinas , Yao Qi , Alan Hayward , Will Deacon , Richard Sandiford , kvmarm@lists.cs.columbia.edu Message-ID: <20170823153318.wVDBocQL1DYjzagm7U_pXef0FbPTW1bjKfoPQhdr30Q@z> Dave Martin writes: > This patch implements the core logic for changing a task's vector > length on request from userspace. This will be used by the ptrace > and prctl frontends that are implemented in later patches. > > The SVE architecture permits, but does not require, implementations > to support vector lengths that are not a power of two. To handle > this, logic is added to check a requested vector length against a > possibly sparse bitmap of available vector lengths at runtime, so > that the best supported value can be chosen. > > Signed-off-by: Dave Martin > --- > arch/arm64/include/asm/fpsimd.h | 6 +++ > arch/arm64/kernel/fpsimd.c | 116 ++++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/prctl.h | 5 ++ > 3 files changed, 127 insertions(+) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 7efd04e..39b26d2 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -70,11 +70,15 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); > > extern void fpsimd_flush_task_state(struct task_struct *target); > > +#define SVE_VL_ARCH_MAX 0x100 > + Hmm this isn't the same as SVE_VL_MAX. Why aren't we using that? > extern void sve_save_state(void *state, u32 *pfpsr); > extern void sve_load_state(void const *state, u32 const *pfpsr, > unsigned long vq_minus_1); > extern unsigned int sve_get_vl(void); > > +extern int sve_max_vl; > + > #ifdef CONFIG_ARM64_SVE > > extern size_t sve_state_size(struct task_struct const *task); > @@ -83,6 +87,8 @@ extern void sve_alloc(struct task_struct *task); > extern void fpsimd_release_thread(struct task_struct *task); > extern void fpsimd_dup_sve(struct task_struct *dst, > struct task_struct const *src); > +extern int sve_set_vector_length(struct task_struct *task, > + unsigned long vl, unsigned long flags); > > #else /* ! CONFIG_ARM64_SVE */ > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index e8674f6..bce95de 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -18,12 +18,14 @@ > */ > > #include > +#include > #include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > @@ -111,6 +113,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state); > /* Default VL for tasks that don't set it explicitly: */ > static int sve_default_vl = -1; > > +#ifdef CONFIG_ARM64_SVE > + > +/* Maximum supported vector length across all CPUs (initially poisoned) */ > +int sve_max_vl = -1; > +/* Set of available vector lengths, as vq_to_bit(vq): */ > +static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); > + > +#else /* ! CONFIG_ARM64_SVE */ > + > +/* Dummy declaration for code that will be optimised out: */ > +extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); > + > +#endif /* ! CONFIG_ARM64_SVE */ > + > static void sve_free(struct task_struct *task) > { > kfree(task->thread.sve_state); > @@ -148,6 +164,37 @@ static void change_cpacr(u64 old, u64 new) > write_sysreg(new, CPACR_EL1); > } > > +static unsigned int vq_to_bit(unsigned int vq) > +{ > + BUILD_BUG_ON(vq < 1 || vq > SVE_VQ_MAX); > + > + return SVE_VQ_MAX - vq; > +} > + > +static unsigned int bit_to_vq(unsigned int bit) > +{ > + BUILD_BUG_ON(bit >= SVE_VQ_MAX); > + > + return SVE_VQ_MAX - bit; > +} > + > +static unsigned int find_supported_vector_length(unsigned int vl) > +{ > + int bit; > + > + BUG_ON(!sve_vl_valid(vl)); > + > + BUG_ON(!sve_vl_valid(sve_max_vl)); > + if (vl > sve_max_vl) > + vl = sve_max_vl; > + > + bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, > + vq_to_bit(sve_vq_from_vl(vl))); > + BUG_ON(bit < 0 || bit >= SVE_VQ_MAX); > + > + return 16 * bit_to_vq(bit); > +} > + > #define ZREG(sve_state, vq, n) ((char *)(sve_state) + \ > (SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET)) > > @@ -235,6 +282,73 @@ void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src) > } > } > > +int sve_set_vector_length(struct task_struct *task, > + unsigned long vl, unsigned long flags) > +{ > + BUG_ON(task == current && preemptible()); > + > + if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT | > + PR_SVE_SET_VL_ONEXEC)) > + return -EINVAL; > + > + if (!sve_vl_valid(vl)) > + return -EINVAL; > + > + /* > + * Clamp to the maximum vector length that VL-agnostic SVE code can > + * work with. A flag may be assigned in the future to allow setting > + * of larger vector lengths without confusing older software. > + */ > + if (vl > SVE_VL_ARCH_MAX) > + vl = SVE_VL_ARCH_MAX; > + > + vl = find_supported_vector_length(vl); > + > + if (flags & (PR_SVE_VL_INHERIT | > + PR_SVE_SET_VL_ONEXEC)) > + task->thread.sve_vl_onexec = vl; > + else > + /* Reset VL to system default on next exec: */ > + task->thread.sve_vl_onexec = 0; > + > + /* Only actually set the VL if not deferred: */ > + if (flags & PR_SVE_SET_VL_ONEXEC) > + goto out; > + > + /* > + * To ensure the FPSIMD bits of the SVE vector registers are preserved, > + * write any live register state back to task_struct, and convert to a > + * non-SVE thread. > + */ > + if (vl != task->thread.sve_vl) { > + if (task == current) { > + task_fpsimd_save(); > + set_thread_flag(TIF_FOREIGN_FPSTATE); > + } > + > + if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) > + sve_to_fpsimd(task); > + > + /* > + * Force reallocation of task SVE state to the correct size > + * on next use: > + */ > + sve_free(task); > + } > + > + task->thread.sve_vl = vl; > + > + fpsimd_flush_task_state(task); > + > +out: > + if (flags & PR_SVE_VL_INHERIT) > + set_thread_flag(TIF_SVE_VL_INHERIT); > + else > + clear_thread_flag(TIF_SVE_VL_INHERIT); > + > + return 0; > +} > + > void fpsimd_release_thread(struct task_struct *dead_task) > { > sve_free(dead_task); > @@ -407,6 +521,8 @@ void fpsimd_flush_thread(void) > * If not, something went badly wrong. > */ > BUG_ON(!sve_vl_valid(current->thread.sve_vl)); > + BUG_ON(find_supported_vector_length(current->thread.sve_vl) != > + current->thread.sve_vl); > > /* > * If the task is not set to inherit, ensure that the vector > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index a8d0759..1b64901 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -197,4 +197,9 @@ struct prctl_mm_map { > # define PR_CAP_AMBIENT_LOWER 3 > # define PR_CAP_AMBIENT_CLEAR_ALL 4 > > +/* arm64 Scalable Vector Extension controls */ > +# define PR_SVE_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */ > +# define PR_SVE_VL_LEN_MASK 0xffff > +# define PR_SVE_VL_INHERIT (1 << 17) /* inherit across exec */ > + > #endif /* _LINUX_PRCTL_H */ -- Alex Bennée