From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Brodkin Subject: Re: [PATCH] ARC: Improve cmpxchng syscall implementation Date: Wed, 21 Mar 2018 11:54:35 +0000 Message-ID: <1521633274.9805.30.camel@synopsys.com> References: <20180319110002.27419-1-abrodkin@synopsys.com> <5bc39838-b1c5-ef65-f97d-8777ed33bda0@synopsys.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <5bc39838-b1c5-ef65-f97d-8777ed33bda0@synopsys.com> Content-Language: en-US Content-ID: Sender: linux-kernel-owner@vger.kernel.org To: Vineet Gupta Cc: "wbx@uclibc-ng.org" , "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "jcmvbkbc@gmail.com" , "linux-arch@vger.kernel.org" , "linux-snps-arc@lists.infradead.org" List-Id: linux-arch.vger.kernel.org SGkgVmluZWV0LA0KDQpPbiBNb24sIDIwMTgtMDMtMTkgYXQgMTE6MjkgLTA3MDAsIFZpbmVldCBH dXB0YSB3cm90ZToNCj4gT24gMDMvMTkvMjAxOCAwNDowMCBBTSwgQWxleGV5IEJyb2RraW4gd3Jv dGU6DQo+ID4gYXJjX3Vzcl9jbXB4Y2hnIHN5c2NhbGwgaXMgc3VwcG9zZWQgdG8gYmUgdXNlZCBv biBwbGF0Zm9ybXMNCj4gPiB0aGF0IGxhY2sgc3VwcG9ydCBvZiBMb2FkLUxvY2tlZC9TdG9yZS1D b25kaXRpb25hbCBpbnN0cnVjdGlvbnMNCj4gPiBpbiBoYXJkd2FyZS4gQW5kIGluIHRoYXQgY2Fz ZSB3ZSBtaW1pYyBtaXNzaW5nIGhhcmR3YXJlIGZlYXR1cmVzDQo+ID4gd2l0aCBoZWxwIG9mIGtl cm5lbCdzIHN5Y2FsbCB0aGF0ICJhdG9taWNhbGx5IiBjaGVja3MgY3VycmVudA0KPiA+IHZhbHVl IGluIG1lbW9yeSBhbmQgdGhlbiBpZiBpdCBtYXRjaGVzIGNhbGxlciBleHBlY3RhdGlvbiBuZXcN Cj4gPiB2YWx1ZSBpcyB3cml0dGVuIHRvIHRoYXQgc2FtZSBsb2NhdGlvbi4NCj4gPiANCj4gDQo+ IC4uLg0KPiAuLi4NCj4gDQo+ID4gDQo+ID4gMi4gV2hhdCdzIHdvcnNlIGlmIHdlJ3JlIGRlYWxp bmcgd2l0aCBkYXRhIGZyb20gbm90IHlldCBhbGxvY2F0ZWQNCj4gPiAgICAgcGFnZSAodGhpbmsg b2YgcHJlLWNvcHktb24td3JpdGUgc3RhdGUpIHdlJ2xsIHN1Y2Nlc3NmdWxseQ0KPiA+ICAgICBy ZWFkIGRhdGEgYnV0IG9uIHdyaXRlIHdlJ2xsIHNpbGVudGx5IHJldHVybiB0byB1c2VyLXNwYWNl DQo+ID4gICAgIHdpdGggY29ycmVjdCByZXN1bHQgDQo+IA0KPiBUaGlzIGlzIHRlY2huaWNhbGx5 IGluY29ycmVjdCwgZXZlbiBmb3IgcmVhZGluZywgeW91IG5lZWQgYSBwYWdlLCB3aGljaCBjb3Vs ZCBiZSANCj4gY29tbW9uIHplcm8gcGFnZSBpbiBjZXJ0YWluIGNhc2VzLg0KDQpPayBJJ2xsIHJl d29yZCBpdCBsaWtlLg0KDQo+IA0KPiAod2hpY2ggd2UgcmVhbGx5IHJlYWQganVzdCBiZWZvcmUp LiBUaGF0IGxlYWRzDQo+ID4gICAgIHRvIHZlcnkgc3RyYW5nZSBwcm9ibGVtcyBpbiB1c2VyLXNw YWNlIGFwcCBmdXJ0aGVyIGRvd24gdGhlIGxpbmUNCj4gPiAgICAgYmVjYXVzZSBuZXcgdmFsdWUg d2FzIG5ldmVyIHdyaXR0ZW4gdG8gdGhlIGRlc3RpbmF0aW9uLg0KPiA+IA0KPiA+IDMuIFJlZ2Fy ZGxlc3Mgb2Ygd2hhdCB3ZW50IHdyb25nIHdlJ2xsIHJldHVybiBmcm9tIHN5c2NhbGwNCj4gPiAg ICAgYW5kIHVzZXItc3BhY2UgYXBwbGljYXRpb24gd2lsbCBjb250aW51ZSB0byBleGVjdXRlLg0K PiA+ICAgICBFdmVuIGlmIHVzZXIncyBwb2ludGVyIHdhcyBjb21wbGV0ZWx5IGJvZ3VzLg0KPiAN Cj4gQWdhaW4gd2UgYXJlIGV4YWdnZXJhdGluZyAoZnJvbSB0ZWNobmljYWwgY29ycmVjdG5lc3Mg UE9WKSAtIGlmIHVzZXIgcG9pbnRlciB3YXMgDQo+IGJvZ3MsIHRoZSByZWFkIHdvdWxkIG5vdCBo YXZlIHdvcmtlZCBpbiBmaXJzdCBwbGFjZSBldGMuIFNvIGxldHMgdG9uZSBkb3duIHRoZSANCj4g cmhldG9yaWMuDQoNCk9rIGhlcmUgSSBtYXkgcmVwaHJhc2UgaXQgbGlrZSB0aGF0Og0KLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLT44LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0N CjMuIFJlZ2FyZGxlc3Mgb2Ygd2hhdCB3ZW50IHdyb25nIHdlJ2xsIHJldHVybiBmcm9tIHN5c2Nh bGwNCiAgIGFuZCB1c2VyLXNwYWNlIGFwcGxpY2F0aW9uIHdpbGwgY29udGludWUgdG8gZXhlY3V0 ZS4NCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0+OC0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tDQoNCj4gDQo+ID4gICAgIEluIGNhc2Ugb2YgaGFyZHdhcmUgTEwvU0MgdGhhdCBh cHAgd291bGQgaGF2ZSBiZWVuIGtpbGxlZA0KPiA+ICAgICBieSB0aGUga2VybmVsLg0KPiA+IA0K PiA+IFdpdGggdGhhdCBjaGFuZ2Ugd2UgYXR0ZW1wdCB0byBpbXJvdmUgb24gYWxsIDMgaXRlbXMg YWJvdmU6DQo+ID4gDQo+ID4gMS4gV2Ugc3RpbGwgZGlzYWJsZSBwcmVlbXB0aW9uIGFyb3VuZCBy ZWFkLWFuZC13cml0ZSBvZg0KPiA+ICAgICB1c2VyJ3MgZGF0YSBidXQgaWYgd2UgaGFwcGVuIHRv IGZhaWwgd2l0aCBlaXRoZXIgb2YgdGhlbQ0KPiA+ICAgICB3ZSdyZSBlbmFibGluZyBwcmVlbXB0 aW9uIGFuZCB0cnkgdG8gZm9yY2UgcGFnZSBmYXVsdCBzbw0KPiA+ICAgICB0aGF0IHdlIGhhdmUg YSBjb3JyZWN0IG1hcHBpbmcgaW4gdGhlIFRMQi4gVGhlbiByZS10cnkNCj4gPiAgICAgYWdhaW4g aW4gImF0b21pYyIgY29udGV4dC4NCj4gPiANCj4gPiAyLiBJZiByZWFsIHBhZ2UgZmF1bHQgZmFp bHMgb3IgZXZlbiBhY2Nlc3Nfb2soKSByZXR1cm5zIGZhbHNlDQo+ID4gICAgIHdlIHNlbmQgU0lH U0VHViB0byB0aGUgdXNlci1zcGFjZSBwcm9jZXNzIHNvIGlmIHNvbWV0aGluZyBnb2VzDQo+ID4g ICAgIHNlcmlvdXNseSB3cm9uZyB3ZSdsbCBrbm93IGFib3V0IGl0IG11Y2ggZWFybGllci4NCj4g PiANCj4gDQo+IA0KPiA+ICAgDQo+ID4gICAJLyoNCj4gPiAgIAkgKiBUaGlzIGlzIG9ubHkgZm9y IG9sZCBjb3JlcyBsYWNraW5nIExMT0NLL1NDT05ELCB3aGljaCBieSBkZWZpbnRpb24NCj4gPiBA QCAtNjAsMjMgKzYyLDQ4IEBAIFNZU0NBTExfREVGSU5FMyhhcmNfdXNyX2NtcHhjaGcsIGludCAq LCB1YWRkciwgaW50LCBleHBlY3RlZCwgaW50LCBuZXcpDQo+ID4gICAJLyogWiBpbmRpY2F0ZXMg dG8gdXNlcnNwYWNlIGlmIG9wZXJhdGlvbiBzdWNjZWRlZCAqLw0KPiA+ICAgCXJlZ3MtPnN0YXR1 czMyICY9IH5TVEFUVVNfWl9NQVNLOw0KPiA+ICAgDQo+ID4gLQlpZiAoIWFjY2Vzc19vayhWRVJJ RllfV1JJVEUsIHVhZGRyLCBzaXplb2YoaW50KSkpDQo+ID4gLQkJcmV0dXJuIC1FRkFVTFQ7DQo+ ID4gKwlyZXQgPSBhY2Nlc3Nfb2soVkVSSUZZX1dSSVRFLCB1YWRkciwgc2l6ZW9mKCp1YWRkcikp Ow0KPiA+ICsJaWYgKCFyZXQpDQo+ID4gKwkJZ290byBmYWlsOw0KPiA+ICAgDQo+ID4gK2FnYWlu Og0KPiA+ICAgCXByZWVtcHRfZGlzYWJsZSgpOw0KPiA+ICAgDQo+ID4gLQlpZiAoX19nZXRfdXNl cih1dmFsLCB1YWRkcikpDQo+ID4gLQkJZ290byBkb25lOw0KPiA+IC0NCj4gPiAtCWlmICh1dmFs ID09IGV4cGVjdGVkKSB7DQo+ID4gLQkJaWYgKCFfX3B1dF91c2VyKG5ldywgdWFkZHIpKQ0KPiA+ ICsJcmV0ID0gX19nZXRfdXNlcih2YWwsIHVhZGRyKTsNCj4gPiArCWlmIChyZXQgPT0gLUVGQVVM VCkgew0KPiANCj4gDQo+IExldHMgc2VlIGlmIHRoaXMgd2FycmFudHMgYWRkaW5nIGNvbXBsZXhp dHkgISBUaGlzIGltcGxpZXMgdGhhdCBUTEIgZW50cnkgd2l0aCANCj4gUmVhZCBwZXJtaXNzaW9u cyBkaWRuJ3QgZXhpc3QgZm9yIHJlYWRpbmcgdGhlIHZhciBhbmQgcGFnZSBmYXVsdCBoYW5kbGVy IGNvdWxkIG5vdCANCj4gd2lyZSB1cCBldmVuIGEgemVybyBwYWdlIGR1ZSB0byBwcmVlbXB0X2Rp c2FibGUsIG1lYW5pbmcgaXQgd2FzIHNvbWV0aGluZyBub3QgDQo+IHRvdWNoZWQgYnkgdXNlcnNw YWNlIGFscmVhZHkgLSBzb3J0IG9mIHVuaW5pdGlhbGl6ZWQgdmFyaWFibGUgaW4gdXNlciBjb2Rl Lg0KDQpPayBJIGNvbXBsZXRlbHkgbWlzc2VkIHRoZSBmYWN0IHRoYXQgZmFzdCBwYXRoIFRMQiBt aXNzIGhhbmRsZXIgaXMgYmVpbmcNCmV4ZWN1dGVkIGV2ZW4gaWYgd2UgaGF2ZSBwcmVlbXB0aW9u IGRpc2FibGVkLiBTbyBnaXZlbiB0aGUgbWFwcGluZyBleGlzdA0Kd2UgZG8gbm90IG5lZWQgdG8g cmV0cnkgd2l0aCBlbmFibGVkIHByZWVtcHRpb24uDQoNClN0aWxsIG1heWJlIEknbSBhIGJpdCBw YXJhbm9pZCBoZXJlIGJ1dCBJTUhPIGl0J3MgZ29vZCB0byBiZSByZWFkeSBmb3IgYSBjb3JuZXIt Y2FzZQ0Kd2hlbiB0aGUgcG9pbnRlciBpcyBjb21wbGV0ZWx5IGJvZ3VzIGFuZCB0aGVyZSdzIG5v IG1hcHBpbmcgZm9yIGhpbS4NCkkgdW5kZXJzdGFuZCB0aGF0IHRvZGF5IHdlIG9ubHkgZXhwZWN0 IHRoaXMgc3lzY2FsbCB0byBiZSB1c2VkIGZyb20gbGliYydzDQppbnRlcm5hbHMgYnV0IGFzIGxv bmcgYXMgc3lzY2FsbCBleGlzdHMgbm9ib2R5IHN0b3BzIGFueWJvZHkgZnJvbSB1c2luZyBpdA0K ZGlyZWN0bHkgd2l0aG91dCBsaWJjLiBTbyBtYXliZSBpbnN0ZWFkIG9mIGRvaW5nIGdldF91c2Vy X3BhZ2VzX2Zhc3QoKSBqdXN0DQpzZW5kIGEgU0lHU0VHViB0byB0aGUgcHJvY2Vzcz8gQXQgbGVh c3QgdXNlciB3aWxsIHJlYWxpemUgdGhlcmUncyBzb21lIHByb2JsZW0NCmF0IGVhcmxpZXIgc3Rh Z2UuDQoNCj4gT3RoZXJ3aXNlIGl0IGlzIGV4dHJlbWVseSB1bmxpa2VseSB0byBzdGFydCB3aXRo IGEgVExCIGVudHJ5IHdpdGggUmVhZCANCj4gcGVybWlzc2lvbnMsIGZvbGxvd2VkIGJ5IHN5c2Nh bGwgVHJhcCBvbmx5IHRvIGZpbmQgdGhlIGVudHJ5IG1pc3NpbmcsIHVubGVzcyBhIA0KPiBnbG9i YWwgVExCIGZsdXNoIGNhbWUgZnJvbSBvdGhlciBjb3JlcywgcmlnaHQgaW4gdGhlIG1pZGRsZS4g QnV0IHRoaXMgc3lzY2FsbCBpcyANCj4gbm90IGd1YXJhbnRlZWQgdG8gd29yayB3aXRoIFNNUCBh bnl3YXlzLCBzbyBsZXRzIGlnbm9yZSBhbnkgU01QIG1pc2RvaW5ncyBoZXJlLg0KDQpXZWxsIGJ1 dCB0aGF0J3MgZXhhY3RseSB0aGUgc2l0dWF0aW9uIEkgd2FzIGRlYnVnZ2luZzogd2Ugc3RhcnQg ZnJvbSBkYXRhIGZyb20gcmVhZC1vbmx5DQpwYWdlIGFuZCBvbiBhdHRlbXB0IHRvIHdyaXRlIGJh Y2sgbW9kaWZpZWQgdmFsdWUgQ09XIG1hY2hpbmVyeSBnZXRzIGludm9sdmVkLg0KDQpUaGF0IHdh cyBvbiBVUCBwbGF0Zm9ybS4NCg0KPiBOb3cgaW4gY2FzZSBpdCB3YXMgKmFuKiB1bmluaXRpYWxp emVkIHZhciwgZG8gd2UgaGF2ZSB0byBndWFyYW50ZWUgYW55IHdlbGwgDQo+IGRlZmluZWQgc2Vt YW50aWNzIGZvciB0aGUga2VybmVsIGVtdWxhdGlvbiBvZiBjbXB4Y2hnID8gSU1PIGl0IHNob3Vs ZCBiZSBmaW5lIHRvIA0KPiByZXR1cm4gMCBvciAtRUZBVUxUIGV0Yy4gSW5mYWN0IC1FRkFVTFQg aXMgYmV0dGVyIGFzIGl0IHdpbGwgZm9yY2UgYSByZXRyeSBsb29wIG9uIA0KPiB1c2VyIHNpZGUs IGdpdmVuIHRoZSB0eXBpY2FsIGNtcHhjaGcgdXNhZ2UgcGF0dGVybi4NCg0KVGhlIHByb2JsZW0g aXMgbGliYyBvbmx5IGV4cGVjdHMgdG8gZ2V0IGEgdmFsdWUgcmVhZCBmcm9tIG1lbW9yeS4NCkFu ZCBpbiB0aGVvcnkgZXhwZWN0ZWQgdmFsdWUgbWlnaHQgYmUgLTE0IHdoaWNoIGlzIGJhc2ljYWxs eSAtRUZBVUxULg0KSSdtIG5vdCB0YWxraW5nIGFib3V0IDAgYXQgYWxsIGJlY2F1c2UgaW4gc29t ZSBjYXNlcyB0aGF0J3MgZXhhY3RseSB3aGF0DQp1c2VyLXNwYWNlIGV4cGVjdHMuDQoNClNvIGlm IHdlIHJlYWQgdW5leHBlY3RlZCB2YWx1ZSB0aGVuIHdlJ2xsIGp1c3QgcmV0dXJuIGl0IHdpdGhv dXQgZXZlbiBhdHRlbXB0aW5nDQp0byB3cml0ZS4NCg0KSWYgd2UgcmVhZCBleHBlY3RlZCBkYXRh IGJ1dCBmYWlsIHRvIHdyaXRlIHRoZW4gd2UnbGwgc2VuZCBhIFNJR1NFR1YgYW5kDQpyZXR1cm4g d2hhdGV2ZXIuLi4gbGV0IGl0IGJlIC1FRkFVTFQgLSBhbnl3YXlzIHRoZSBhcHAgd2lsbCBiZSBr aWxsZWQgb24gZXhpdCBmcm9tDQp0aGlzIHN5c2NhbGwuDQoNCi1BbGV4ZXk= From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey.Brodkin@synopsys.com (Alexey Brodkin) Date: Wed, 21 Mar 2018 11:54:35 +0000 Subject: [PATCH] ARC: Improve cmpxchng syscall implementation In-Reply-To: <5bc39838-b1c5-ef65-f97d-8777ed33bda0@synopsys.com> References: <20180319110002.27419-1-abrodkin@synopsys.com> <5bc39838-b1c5-ef65-f97d-8777ed33bda0@synopsys.com> List-ID: Message-ID: <1521633274.9805.30.camel@synopsys.com> To: linux-snps-arc@lists.infradead.org Hi Vineet, On Mon, 2018-03-19@11:29 -0700, Vineet Gupta wrote: > On 03/19/2018 04:00 AM, Alexey Brodkin wrote: > > arc_usr_cmpxchg syscall is supposed to be used on platforms > > that lack support of Load-Locked/Store-Conditional instructions > > in hardware. And in that case we mimic missing hardware features > > with help of kernel's sycall that "atomically" checks current > > value in memory and then if it matches caller expectation new > > value is written to that same location. > > > > ... > ... > > > > > 2. What's worse if we're dealing with data from not yet allocated > > page (think of pre-copy-on-write state) we'll successfully > > read data but on write we'll silently return to user-space > > with correct result > > This is technically incorrect, even for reading, you need a page, which could be > common zero page in certain cases. Ok I'll reword it like. > > (which we really read just before). That leads > > to very strange problems in user-space app further down the line > > because new value was never written to the destination. > > > > 3. Regardless of what went wrong we'll return from syscall > > and user-space application will continue to execute. > > Even if user's pointer was completely bogus. > > Again we are exaggerating (from technical correctness POV) - if user pointer was > bogs, the read would not have worked in first place etc. So lets tone down the > rhetoric. Ok here I may rephrase it like that: ------------------------------->8----------------------------- 3. Regardless of what went wrong we'll return from syscall and user-space application will continue to execute. ------------------------------->8----------------------------- > > > In case of hardware LL/SC that app would have been killed > > by the kernel. > > > > With that change we attempt to imrove on all 3 items above: > > > > 1. We still disable preemption around read-and-write of > > user's data but if we happen to fail with either of them > > we're enabling preemption and try to force page fault so > > that we have a correct mapping in the TLB. Then re-try > > again in "atomic" context. > > > > 2. If real page fault fails or even access_ok() returns false > > we send SIGSEGV to the user-space process so if something goes > > seriously wrong we'll know about it much earlier. > > > > > > > > /* > > * This is only for old cores lacking LLOCK/SCOND, which by defintion > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > > /* Z indicates to userspace if operation succeded */ > > regs->status32 &= ~STATUS_Z_MASK; > > > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > > - return -EFAULT; > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > > + if (!ret) > > + goto fail; > > > > +again: > > preempt_disable(); > > > > - if (__get_user(uval, uaddr)) > > - goto done; > > - > > - if (uval == expected) { > > - if (!__put_user(new, uaddr)) > > + ret = __get_user(val, uaddr); > > + if (ret == -EFAULT) { > > > Lets see if this warrants adding complexity ! This implies that TLB entry with > Read permissions didn't exist for reading the var and page fault handler could not > wire up even a zero page due to preempt_disable, meaning it was something not > touched by userspace already - sort of uninitialized variable in user code. Ok I completely missed the fact that fast path TLB miss handler is being executed even if we have preemption disabled. So given the mapping exist we do not need to retry with enabled preemption. Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case when the pointer is completely bogus and there's no mapping for him. I understand that today we only expect this syscall to be used from libc's internals but as long as syscall exists nobody stops anybody from using it directly without libc. So maybe instead of doing get_user_pages_fast() just send a SIGSEGV to the process? At least user will realize there's some problem at earlier stage. > Otherwise it is extremely unlikely to start with a TLB entry with Read > permissions, followed by syscall Trap only to find the entry missing, unless a > global TLB flush came from other cores, right in the middle. But this syscall is > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. Well but that's exactly the situation I was debugging: we start from data from read-only page and on attempt to write back modified value COW machinery gets involved. That was on UP platform. > Now in case it was *an* uninitialized var, do we have to guarantee any well > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on > user side, given the typical cmpxchg usage pattern. The problem is libc only expects to get a value read from memory. And in theory expected value might be -14 which is basically -EFAULT. I'm not talking about 0 at all because in some cases that's exactly what user-space expects. So if we read unexpected value then we'll just return it without even attempting to write. If we read expected data but fail to write then we'll send a SIGSEGV and return whatever... let it be -EFAULT - anyways the app will be killed on exit from this syscall. -Alexey