From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [PATCH hmm 02/15] mm/mmu_notifier: add an interval tree notifier Date: Mon, 21 Oct 2019 18:54:25 +0000 Message-ID: <20191021185421.GG6285@mellanox.com> References: <20191015181242.8343-1-jgg@ziepe.ca> <20191015181242.8343-3-jgg@ziepe.ca> <20191021183056.GA3177@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20191021183056.GA3177@redhat.com> Content-Language: en-US Content-ID: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jerome Glisse Cc: Andrea Arcangeli , Ralph Campbell , "linux-rdma@vger.kernel.org" , John Hubbard , "Felix.Kuehling@amd.com" , "dri-devel@lists.freedesktop.org" , Michal Hocko , "linux-mm@kvack.org" , "amd-gfx@lists.freedesktop.org" , Ben Skeggs List-Id: amd-gfx.lists.freedesktop.org T24gTW9uLCBPY3QgMjEsIDIwMTkgYXQgMDI6MzA6NTZQTSAtMDQwMCwgSmVyb21lIEdsaXNzZSB3 cm90ZToKCj4gPiArLyoqCj4gPiArICogbW11X3JhbmdlX3JlYWRfcmV0cnkgLSBFbmQgYSByZWFk IHNpZGUgY3JpdGljYWwgc2VjdGlvbiBhZ2FpbnN0IGEgVkEgcmFuZ2UKPiA+ICsgKiBtcm46IFRo ZSByYW5nZSB1bmRlciBsb2NrCj4gPiArICogc2VxOiBUaGUgcmV0dXJuIG9mIHRoZSBwYWlyZWQg bW11X3JhbmdlX3JlYWRfYmVnaW4oKQo+ID4gKyAqCj4gPiArICogVGhpcyBNVVNUIGJlIGNhbGxl ZCB1bmRlciBhIHVzZXIgcHJvdmlkZWQgbG9jayB0aGF0IGlzIGFsc28gaGVsZAo+ID4gKyAqIHVu Y29uZGl0aW9uYWxseSBieSBvcC0+aW52YWxpZGF0ZSgpLiBUaGF0IGxvY2sgcHJvdmlkZXMgdGhl IHJlcXVpcmVkIFNNUAo+ID4gKyAqIGJhcnJpZXIgZm9yIGhhbmRsaW5nIGludmFsaWRhdGVfc2Vx Lgo+ID4gKyAqCj4gPiArICogRWFjaCBjYWxsIHNob3VsZCBiZSBwYWlyZWQgd2l0aCBhIHNpbmds ZSBtbXVfcmFuZ2VfcmVhZF9iZWdpbigpIGFuZAo+ID4gKyAqIHNob3VsZCBiZSB1c2VkIHRvIGNv bmNsdWRlIHRoZSByZWFkIHNpZGUuCj4gPiArICoKPiA+ICsgKiBSZXR1cm5zIHRydWUgaWYgYW4g aW52YWxpZGF0aW9uIGNvbGxpZGVkIHdpdGggdGhpcyBjcml0aWNhbCBzZWN0aW9uLCBhbmQKPiA+ ICsgKiB0aGUgY2FsbGVyIHNob3VsZCByZXRyeS4KPiA+ICsgKi8KPiA+ICtzdGF0aWMgaW5saW5l IGJvb2wgbW11X3JhbmdlX3JlYWRfcmV0cnkoc3RydWN0IG1tdV9yYW5nZV9ub3RpZmllciAqbXJu LAo+ID4gKwkJCQkJdW5zaWduZWQgbG9uZyBzZXEpCj4gPiArewo+ID4gKwlyZXR1cm4gUkVBRF9P TkNFKG1ybi0+aW52YWxpZGF0ZV9zZXEpICE9IHNlcTsKPiA+ICt9Cj4gCj4gV2hhdCBhYm91dCBj YWxsaW5nIHRoaXMgbW11X3JhbmdlX3JlYWRfZW5kKCkgaW5zdGVhZCA/IFRvIG1hdGNoCj4gd2l0 aCB0aGUgbW11X3JhbmdlX3JlYWRfYmVnaW4oKS4KCl9lbmQgbWFrZSBzb21lIHNlbnNlIHRvbywg YnV0IEkgcGlja2VkIF9yZXRyeSBmb3Igc3ltbWV0cnkgd2l0aCB0aGUKc2VxY291bnRfKiBmYW1p bHkgb2YgZnVuY3Rpb25zIHdoaWNoIHVzZWQgcmV0cnkuCgpJIHRoaW5rIHJldHJ5IG1ha2VzIGl0 IGNsZWFyZXIgdGhhdCBpdCBpcyBleHBlY3RlZCB0byBmYWlsIGFuZCByZXRyeQppcyByZXF1aXJl ZC4KCj4gPiArCS8qCj4gPiArCSAqIFRoZSBpbnZfZW5kIGluY29ycG9yYXRlcyBhIGRlZmVycmVk IG1lY2hhbmlzbSBsaWtlIHJ0bmwuIEFkZHMgYW5kCj4gCj4gVGhlIHJ0bmwgcmVmZXJlbmNlIGlz IGxvc3Qgb24gcGVvcGxlIHVuZmFtaWxpYXIgd2l0aCB0aGUgbmV0d29yayA6KQo+IGNvZGUgbWF5 YmUgbGlrZSBydG5sX2xvY2soKS9ydG5sX3VubG9jaygpIHNvIHBlb3BsZSBoYXZlIGEgY2hhbmNl IHRvCj4gZ3JlcCB0aGUgcmlnaHQgZnVuY3Rpb24uIEFzc3VtaW5nIGkgYW0gbXlzZWxmIGdldHRp bmcgdGhlIHJpZ2h0Cj4gcmVmZXJlbmNlIDopCgpZZXAsIHlvdSBnb3QgaXQsIEkgd2lsbCB1cGRh dGUKCj4gPiArCS8qCj4gPiArCSAqIG1ybi0+aW52YWxpZGF0ZV9zZXEgaXMgYWx3YXlzIHNldCB0 byBhbiBvZGQgdmFsdWUuIFRoaXMgZW5zdXJlcwo+ID4gKwkgKiB0aGF0IGlmIHNlcSBkb2VzIHdy YXAgd2Ugd2lsbCBhbHdheXMgY2xlYXIgdGhlIGJlbG93IHNsZWVwIGluIHNvbWUKPiA+ICsJICog cmVhc29uYWJsZSB0aW1lIGFzIG1tbl9tbS0+aW52YWxpZGF0ZV9zZXEgaXMgZXZlbiBpbiB0aGUg aWRsZQo+ID4gKwkgKiBzdGF0ZS4KPiAKPiBJIHRoaW5rIHRoaXMgY29tbWVudCBzaG91bGQgYmUg d2l0aCB0aGUgc3RydWN0IG1tdV9yYW5nZV9ub3RpZmllcgo+IGRlZmluaXRpb24gYW5kIHlvdSBz aG91bGQganVzdCBwb2ludCB0byBpdCBmcm9tIGhlcmUgYXMgdGhlIHNhbWUKPiBjb21tZW50IHdv dWxkIGJlIHVzZWZ1bCBkb3duIGJlbG93LgoKSSBoYWQgaXQgaGVyZSBiZWNhdXNlIGl0IGlzIGNy aXRpY2FsIHRvIHVuZGVyc3RhbmRpbmcgdGhlIHdhaXRfZXZlbnQKYW5kIHdoeSBpdCBkb2Vzbid0 IGp1c3QgYmxvY2sgaW5kZWZpbml0ZWx5LCBidXQgeWVzIHRoaXMgcHJvcGVydHkKY29tZXMgdXAg YmVsb3cgdG9vIHdoaWNoIHJlZmVycyBiYWNrIGhlcmUuCgpGdW5kYW1lbnRhbGx5IHRoaXMgd2Fp dCBldmVudCBpcyB3aHkgdGhpcyBhcHByb2FjaCB0byBrZWVwIGFuIG9kZAp2YWx1ZSBpbiB0aGUg bXJuIGlzIHVzZWQuCgo+ID4gLWludCBfX21tdV9ub3RpZmllcl9pbnZhbGlkYXRlX3JhbmdlX3N0 YXJ0KHN0cnVjdCBtbXVfbm90aWZpZXJfcmFuZ2UgKnJhbmdlKQo+ID4gK3N0YXRpYyBpbnQgbW5f aXRyZWVfaW52YWxpZGF0ZShzdHJ1Y3QgbW11X25vdGlmaWVyX21tICptbW5fbW0sCj4gPiArCQkJ CSAgICAgY29uc3Qgc3RydWN0IG1tdV9ub3RpZmllcl9yYW5nZSAqcmFuZ2UpCj4gPiArewo+ID4g KwlzdHJ1Y3QgbW11X3JhbmdlX25vdGlmaWVyICptcm47Cj4gPiArCXVuc2lnbmVkIGxvbmcgY3Vy X3NlcTsKPiA+ICsKPiA+ICsJZm9yIChtcm4gPSBtbl9pdHJlZV9pbnZfc3RhcnRfcmFuZ2UobW1u X21tLCByYW5nZSwgJmN1cl9zZXEpOyBtcm47Cj4gPiArCSAgICAgbXJuID0gbW5faXRyZWVfaW52 X25leHQobXJuLCByYW5nZSkpIHsKPiA+ICsJCWJvb2wgcmV0Owo+ID4gKwo+ID4gKwkJV1JJVEVf T05DRShtcm4tPmludmFsaWRhdGVfc2VxLCBjdXJfc2VxKTsKPiA+ICsJCXJldCA9IG1ybi0+b3Bz LT5pbnZhbGlkYXRlKG1ybiwgcmFuZ2UpOwo+ID4gKwkJaWYgKCFyZXQgJiYgIVdBUk5fT04obW11 X25vdGlmaWVyX3JhbmdlX2Jsb2NrYWJsZShyYW5nZSkpKQo+IAo+IElzbid0IHRoZSBsb2dpYyB3 cm9uZyBoZXJlID8gV2Ugd2FudCB0byB3YXJuIGlmIHRoZSByYW5nZQo+IHdhcyBtYXJrIGFzIGJs b2NrYWJsZSBhbmQgaW52YWxpZGF0ZSByZXR1cm5lZCBmYWxzZS4gQWxzbwo+IHdlIHdlbnQgdG8g YmFja29mZiBubyBtYXR0ZXIgd2hhdCBpZiB0aGUgaW52YWxpZGF0ZSByZXR1cm4KPiBmYWxzZSBp ZToKCklmIGludmFsaWRhdGUgcmV0dXJuZWQgZmFsc2UgYW5kIHRoZSBjYWxsZXIgaXMgYmxvY2th YmxlIHRoZW4gd2UgZG8Kbm90IHdhbnQgdG8gcmV0dXJuLCB3ZSBtdXN0IGNvbnRpbnVlIHByb2Nl c3Npbmcgb3RoZXIgcmFuZ2VzIC0gdG8gdHJ5CnRvIGNvcGUgd2l0aCB0aGUgZGVmZWN0aXZlIGRy aXZlci4KCkNhbGxlcnMgaW4gYmxvY2tpbmcgbW9kZSBpZ25vcmUgdGhlIHJldHVybiB2YWx1ZSBh bmQgZ28gYWhlYWQgdG8KaW52YWxpZGF0ZS4uCgpXb3VsZCBpdCBiZSBjbGVhcmVyIGFzIAoKaWYg KCFyZXQpIHsKICAgaWYgKFdBUk5fT04obW11X25vdGlmaWVyX3JhbmdlX2Jsb2NrYWJsZShyYW5n ZSkpKQogICAgICAgY29udGludWU7CiAgIGdvdG8gb3V0X3dvdWxkX2Jsb2NrOwp9Cgo/Cgo+ID4g QEAgLTI4NCwyMSArNTg5LDIyIEBAIGludCBfX21tdV9ub3RpZmllcl9yZWdpc3RlcihzdHJ1Y3Qg bW11X25vdGlmaWVyICptbiwgc3RydWN0IG1tX3N0cnVjdCAqbW0pCj4gPiAgCQkgKiB0aGUgd3Jp dGUgc2lkZSBvZiB0aGUgbW1hcF9zZW0uCj4gPiAgCQkgKi8KPiA+ICAJCW1tdV9ub3RpZmllcl9t bSA9Cj4gPiAtCQkJa21hbGxvYyhzaXplb2Yoc3RydWN0IG1tdV9ub3RpZmllcl9tbSksIEdGUF9L RVJORUwpOwo+ID4gKwkJCWt6YWxsb2Moc2l6ZW9mKHN0cnVjdCBtbXVfbm90aWZpZXJfbW0pLCBH RlBfS0VSTkVMKTsKPiA+ICAJCWlmICghbW11X25vdGlmaWVyX21tKQo+ID4gIAkJCXJldHVybiAt RU5PTUVNOwo+ID4gIAo+ID4gIAkJSU5JVF9ITElTVF9IRUFEKCZtbXVfbm90aWZpZXJfbW0tPmxp c3QpOwo+ID4gIAkJc3Bpbl9sb2NrX2luaXQoJm1tdV9ub3RpZmllcl9tbS0+bG9jayk7Cj4gPiAr CQltbXVfbm90aWZpZXJfbW0tPmludmFsaWRhdGVfc2VxID0gMjsKPiAKPiBXaHkgc3RhcnRpbmcg YXQgMiA/CgpHb29kIHF1ZXN0aW9uLiBJZiBldmVyeXRoaW5nIGlzIGNvZGVkIHByb3Blcmx5IHRo ZSBzdGFydGluZyB2YWx1ZQpkb2Vzbid0IG1hdHRlcgoKSSBsZWZ0IGl0IGxpa2UgdGhpcyBiZWNh dXNlIGl0IG1ha2VzIGRlYnVnZ2luZyBhIHRpbnkgYml0IHNpbXBsZXIsIGllCmlmIHlvdSBwcmlu dCB0aGUgc2VxIG51bWJlciB0aGVuIHRoZSBmaXJzdCBtbXVfcmFuZ2Vfbm90aWZpZmVycyB3aWxs CmdldCAxIGFzIHRoZWlyIGludGlhbCBzZXEgKHNlZSBfX21tdV9yYW5nZV9ub3RpZmllcl9pbnNl cnQpIGluc3RlYWQgb2YKVUxPTkdfTUFYCgo+ID4gKwkJbW11X25vdGlmaWVyX21tLT5pdHJlZSA9 IFJCX1JPT1RfQ0FDSEVEOwo+ID4gKwkJaW5pdF93YWl0cXVldWVfaGVhZCgmbW11X25vdGlmaWVy X21tLT53cSk7Cj4gPiArCQlJTklUX0hMSVNUX0hFQUQoJm1tdV9ub3RpZmllcl9tbS0+ZGVmZXJy ZWRfbGlzdCk7Cj4gPiAgCX0KPiA+ICAKPiA+ICAJcmV0ID0gbW1fdGFrZV9hbGxfbG9ja3MobW0p Owo+ID4gIAlpZiAodW5saWtlbHkocmV0KSkKPiA+ICAJCWdvdG8gb3V0X2NsZWFuOwo+ID4gIAo+ ID4gLQkvKiBQYWlycyB3aXRoIHRoZSBtbWRyb3AgaW4gbW11X25vdGlmaWVyX3VucmVnaXN0ZXJf KiAqLwo+ID4gLQltbWdyYWIobW0pOwo+ID4gLQo+ID4gIAkvKgo+ID4gIAkgKiBTZXJpYWxpemUg dGhlIHVwZGF0ZSBhZ2FpbnN0IG1tdV9ub3RpZmllcl91bnJlZ2lzdGVyLiBBCj4gPiAgCSAqIHNp ZGUgbm90ZTogbW11X25vdGlmaWVyX3JlbGVhc2UgY2FuJ3QgcnVuIGNvbmN1cnJlbnRseSB3aXRo Cj4gPiBAQCAtMzA2LDEzICs2MTIsMjYgQEAgaW50IF9fbW11X25vdGlmaWVyX3JlZ2lzdGVyKHN0 cnVjdCBtbXVfbm90aWZpZXIgKm1uLCBzdHJ1Y3QgbW1fc3RydWN0ICptbSkKPiA+ICAJICogY3Vy cmVudC0+bW0gb3IgZXhwbGljaXRseSB3aXRoIGdldF90YXNrX21tKCkgb3Igc2ltaWxhcikuCj4g PiAgCSAqIFdlIGNhbid0IHJhY2UgYWdhaW5zdCBhbnkgb3RoZXIgbW11IG5vdGlmaWVyIG1ldGhv ZCBlaXRoZXIKPiA+ICAJICogdGhhbmtzIHRvIG1tX3Rha2VfYWxsX2xvY2tzKCkuCj4gPiArCSAq Cj4gPiArCSAqIHJlbGVhc2Ugc2VtYW50aWNzIGFyZSBwcm92aWRlZCBmb3IgdXNlcnMgbm90IGlu c2lkZSBhIGxvY2sgY292ZXJlZAo+ID4gKwkgKiBieSBtbV90YWtlX2FsbF9sb2NrcygpLiBhY3F1 aXJlIGNhbiBvbmx5IGJlIHVzZWQgd2hpbGUgaG9sZGluZyB0aGUKPiA+ICsJICogbW1ncmFiIG9y IG1tZ2V0LCBhbmQgaXMgc2FmZSBiZWNhdXNlIG9uY2UgY3JlYXRlZCB0aGUKPiA+ICsJICogbW11 X25vdGlmaWZlcl9tbSBpcyBub3QgZnJlZWQgdW50aWwgdGhlIG1tIGlzIGRlc3Ryb3llZC4KPiA+ ICAJICovCj4gPiAgCWlmIChtbXVfbm90aWZpZXJfbW0pCj4gPiAtCQltbS0+bW11X25vdGlmaWVy X21tID0gbW11X25vdGlmaWVyX21tOwo+ID4gKwkJc21wX3N0b3JlX3JlbGVhc2UoJm1tLT5tbXVf bm90aWZpZXJfbW0sIG1tdV9ub3RpZmllcl9tbSk7Cj4gCj4gSSBkbyBub3QgdW5kZXJzdGFuZCB3 aHkgeW91IG5lZWQgdGhlIHJlbGVhc2Ugc2VtYW50aWNzIGhlcmUsIHdlCj4gYXJlIHVuZGVyIHRo ZSBtbWFwX3NlbSBpbiB3cml0ZSBtb2RlIHdoZW4gd2UgcmVsZWFzZSBpdCB0aGUgbG9jawo+IGJh cnJpZXIgd2lsbCBtYWtlIHN1cmUgYW55b25lIGVsc2Ugc2VlcyB0aGUgbmV3IG1tdV9ub3RpZmll cl9tbQoKSXQgcGFpcnMgd2l0aCB0aGUgc21wX2xvYWRfYWNxdWlyZSgpIGluIG1tdV9yYW5nZV9u b3RpZmllcl9pbnNlcnQoKQp3aGljaCBpcyBub3QgY2FsbGVkIHdpdGggdGhlIG1tYXBfc2VtIGhl bGQuIAoKU2luY2UgdGhhdCByZWFkZXIgaXMgbm90IGxvY2tlZCB3ZSBuZWVkIHJlbGVhc2Ugc2Vt YW50aWNzIGhlcmUgdG8KZW5zdXJlIHRoZSB1bmxvY2tlZCByZWFkZXIgc2VlcyBhIGZ1bGx5IGlu aXRpbmFsaXplZCBtbXVfbm90aWZpZXJfbW0Kc3RydWN0dXJlIHdoZW4gaXQgb2JzZXJ2ZXMgdGhl IHBvaW50ZXIuCgo+ID4gKy8qKgo+ID4gKyAqIG1tdV9yYW5nZV9ub3RpZmllcl9pbnNlcnQgLSBJ bnNlcnQgYSByYW5nZSBub3RpZmllcgo+ID4gKyAqIEBtcm46IFJhbmdlIG5vdGlmaWVyIHRvIHJl Z2lzdGVyCj4gPiArICogQHN0YXJ0OiBTdGFydGluZyB2aXJ0dWFsIGFkZHJlc3MgdG8gbW9uaXRv cgo+ID4gKyAqIEBsZW5ndGg6IExlbmd0aCBvZiB0aGUgcmFuZ2UgdG8gbW9uaXRvcgo+ID4gKyAq IEBtbSA6IG1tX3N0cnVjdCB0byBhdHRhY2ggdG8KPiA+ICsgKgo+ID4gKyAqIFRoaXMgZnVuY3Rp b24gc3Vic2NyaWJlcyB0aGUgcmFuZ2Ugbm90aWZpZXIgZm9yIG5vdGlmaWNhdGlvbnMgZnJvbSB0 aGUgbW0uCj4gPiArICogVXBvbiByZXR1cm4gdGhlIG9wcyByZWxhdGVkIHRvIG1tdV9yYW5nZV9u b3RpZmllciB3aWxsIGJlIGNhbGxlZCB3aGVuZXZlcgo+ID4gKyAqIGFuIGV2ZW50IHRoYXQgaW50 ZXJzZWN0cyB3aXRoIHRoZSBnaXZlbiByYW5nZSBvY2N1cnMuCj4gPiArICoKPiA+ICsgKiBVcG9u IHJldHVybiB0aGUgcmFuZ2Vfbm90aWZpZXIgbWF5IG5vdCBiZSBwcmVzZW50IGluIHRoZSBpbnRl cnZhbCB0cmVlIHlldC4KPiA+ICsgKiBUaGUgY2FsbGVyIG11c3QgdXNlIHRoZSBub3JtYWwgcmFu Z2Ugbm90aWZpZXIgbG9ja2luZyBmbG93IHZpYQo+ID4gKyAqIG1tdV9yYW5nZV9yZWFkX2JlZ2lu KCkgdG8gZXN0YWJsaXNoIFNQVEVzIGZvciB0aGlzIHJhbmdlLgo+ID4gKyAqLwo+ID4gK2ludCBt bXVfcmFuZ2Vfbm90aWZpZXJfaW5zZXJ0KHN0cnVjdCBtbXVfcmFuZ2Vfbm90aWZpZXIgKm1ybiwK PiA+ICsJCQkgICAgICB1bnNpZ25lZCBsb25nIHN0YXJ0LCB1bnNpZ25lZCBsb25nIGxlbmd0aCwK PiA+ICsJCQkgICAgICBzdHJ1Y3QgbW1fc3RydWN0ICptbSkKPiA+ICt7Cj4gPiArCXN0cnVjdCBt bXVfbm90aWZpZXJfbW0gKm1tbl9tbTsKPiA+ICsJaW50IHJldDsKPiA+ICsKPiA+ICsJbWlnaHRf bG9jaygmbW0tPm1tYXBfc2VtKTsKPiA+ICsKPiA+ICsJbW1uX21tID0gc21wX2xvYWRfYWNxdWly ZSgmbW0tPm1tdV9ub3RpZmllcl9tbSk7CgpSaWdodCBoZXJlIHdlIGRvbid0IGhhdmUgdGhlIG1t YXBfc2VtIHNvIHRoaXMgbG9hZCBpcyB1bmxvY2tlZC4KCklmIHdlIG9ic2VydmUgIW1tbl9tbSB3 ZSBtdXN0IGFsc28gb2JzZXJ2ZSBhbGwgc3RvcmVzIGRvbmUgdG8gc2V0IGl0CnVwLiBJZSB3ZSBo YXZlIHRvIG9ic2VydmUgdGhlIHNwaW5fbG9ja19pbml0LCBSQl9ST09UX0NBQ0hFRC9ldGMKCj4g PiArCWlmICghbW1uX21tIHx8ICFtbW5fbW0tPmhhc19pbnRlcnZhbCkgewo+ID4gKwkJcmV0ID0g bW11X25vdGlmaWVyX3JlZ2lzdGVyKE5VTEwsIG1tKTsKPiA+ICsJCWlmIChyZXQpCj4gPiArCQkJ cmV0dXJuIHJldDsKPiA+ICsJCW1tbl9tbSA9IG1tLT5tbXVfbm90aWZpZXJfbW07Cj4gPiArCX0K PiA+ICsJcmV0dXJuIF9fbW11X3JhbmdlX25vdGlmaWVyX2luc2VydChtcm4sIHN0YXJ0LCBsZW5n dGgsIG1tbl9tbSwgbW0pOwo+ID4gK30KPiA+ICtFWFBPUlRfU1lNQk9MX0dQTChtbXVfcmFuZ2Vf bm90aWZpZXJfaW5zZXJ0KTsKPiA+ICsKPiA+ICtpbnQgbW11X3JhbmdlX25vdGlmaWVyX2luc2Vy dF9sb2NrZWQoc3RydWN0IG1tdV9yYW5nZV9ub3RpZmllciAqbXJuLAo+ID4gKwkJCQkgICAgIHVu c2lnbmVkIGxvbmcgc3RhcnQsIHVuc2lnbmVkIGxvbmcgbGVuZ3RoLAo+ID4gKwkJCQkgICAgIHN0 cnVjdCBtbV9zdHJ1Y3QgKm1tKQo+ID4gK3sKPiA+ICsJc3RydWN0IG1tdV9ub3RpZmllcl9tbSAq bW1uX21tOwo+ID4gKwlpbnQgcmV0Owo+ID4gKwo+ID4gKwlsb2NrZGVwX2Fzc2VydF9oZWxkX3dy aXRlKCZtbS0+bW1hcF9zZW0pOwo+ID4gKwo+ID4gKwltbW5fbW0gPSBtbS0+bW11X25vdGlmaWVy X21tOwo+IAo+IFNob3VsZG4ndCB5b3UgYmUgdXNpbmcgc21wX2xvYWRfYWNxdWlyZSgpID8KClRo aXMgZnVuY3Rpb24gaXMgY2FsbGVkIHdoaWxlIGhvbGRpbmcgdGhlIG1tYXBfc2VtLiBBcyB5b3Ug bm90ZWQgYWJvdmUKYWxsIHdyaXRlcnMgdG8gbW0tPm1tdV9ub3RpZmllcl9tbSBob2xkIHRoZSB3 cml0ZSBzaWRlIG9mIG1tYXBfc2VtLAp0aHVzIGhlcmUgdGhlIHJlYWQgc2lkZSBpcyBmdWxseSBs b2NrZWQgYW5kIGRvZXNuJ3QgbmVlZCB0aGUgYWNxdWlyZS4KCk5vdGUgdGhlIGxvY2tkZXAgYW5u b3RhdGlvbnMgbWFya2luZyB0aGUgZXhwZWN0ZWQgbG9ja2luZyBlbnZpcm9tZW50CmZvciB0aGUg dHdvIGZ1bmN0aW9ucy4KClRoYW5rcywKSmFzb24KX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVsIG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlz dHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4v bGlzdGluZm8vZHJpLWRldmVs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CD2CCA9EB7 for ; Mon, 21 Oct 2019 18:54:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1259720882 for ; Mon, 21 Oct 2019 18:54:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="GAcWDK0s" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730048AbfJUSya (ORCPT ); Mon, 21 Oct 2019 14:54:30 -0400 Received: from mail-eopbgr00089.outbound.protection.outlook.com ([40.107.0.89]:58624 "EHLO EUR02-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729894AbfJUSya (ORCPT ); Mon, 21 Oct 2019 14:54:30 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=exll6zXqG9QciNC0xs8f+K/it/HGkdh8z22e4P8rXYUtgv/NyyYMBIVAHl5hdsegW52UEZxU3UWEHRPHlTbHkP2GEq+QzQSDPmRQgjZ9sikv4m2Va6SKCdvWBywyCVy1EDGSpCNxFPFQoheSRgGtjKoZa5C/fMB1ROzp/q/S7efg2Mmv2pVOhn/fP1+7JO2bqu0+GK0aOiQPLh6AW5Nt/47E8kbI8cFcz/0wSYtMBKQC7XILyLrt9hkyOlkOEu5dcDOeyxBayM+0iadBTd/xvHp2oNOYvVxcoIQaKNspx0MQz2sgdi4MOOlMvo8iGNMTDooRXsNTNh5RPxUrRgkRLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EFAD1vzDDPDJ4MUzEEAE8ugFlTaiSRQHLPRqglVLhjo=; b=YUcUrVTGLa3srhoN/hwvIKMOM/O5sblofQj+WUTsd60ASYo7O37liCUGhQp5mMoIvDU7fHakt4ebT6z4q5CI3DTz5bpSpjhnWqo76QDXGXobP3Fl7yi1riVd7P5l0EWu2PwJpG/P5rwRM7P4e3VGqSFclWWtn148rYsx+cOirOnkN8H5RrK6/JNfxp15Gow0gD/8YgjM+JO6yb/HgFI93EILEogGgGw262yuc4YE2mqbGVVmBUV9chTAQDVCXtJedmGHeXE4owPxmDWPN2ARk+cSki1EmjVXKDmL9OsG9hmunmuQ6LV6d0V6e51GMQBOK24KAAD9Y2y/HE4O3qeIQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EFAD1vzDDPDJ4MUzEEAE8ugFlTaiSRQHLPRqglVLhjo=; b=GAcWDK0s9Xh/y91InRJvMnZvgWGdWwZ2mwDu+oTy0t492xW5VDfTpO6r7oXrAw9V4aNVoRd08lUN+tKOUyW7gk/YvsLwPVQeUiIXgFn0sXCp1YnJW8lCTdMSN/mz+PHznupQf25+9wCwQcRn0UrDqSiP55W64SPVyfu5KrxomEk= Received: from VI1PR05MB4141.eurprd05.prod.outlook.com (52.133.14.15) by VI1PR05MB4783.eurprd05.prod.outlook.com (20.176.1.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2367.24; Mon, 21 Oct 2019 18:54:25 +0000 Received: from VI1PR05MB4141.eurprd05.prod.outlook.com ([fe80::75ae:b00b:69d8:3db0]) by VI1PR05MB4141.eurprd05.prod.outlook.com ([fe80::75ae:b00b:69d8:3db0%7]) with mapi id 15.20.2347.029; Mon, 21 Oct 2019 18:54:25 +0000 From: Jason Gunthorpe To: Jerome Glisse CC: Ralph Campbell , John Hubbard , "Felix.Kuehling@amd.com" , "linux-rdma@vger.kernel.org" , "linux-mm@kvack.org" , Andrea Arcangeli , "dri-devel@lists.freedesktop.org" , "amd-gfx@lists.freedesktop.org" , Ben Skeggs , Michal Hocko Subject: Re: [PATCH hmm 02/15] mm/mmu_notifier: add an interval tree notifier Thread-Topic: [PATCH hmm 02/15] mm/mmu_notifier: add an interval tree notifier Thread-Index: AQHVg4SuHCkuEaFxsk60wlfA7oiQWadldFEAgAAGi4A= Date: Mon, 21 Oct 2019 18:54:25 +0000 Message-ID: <20191021185421.GG6285@mellanox.com> References: <20191015181242.8343-1-jgg@ziepe.ca> <20191015181242.8343-3-jgg@ziepe.ca> <20191021183056.GA3177@redhat.com> In-Reply-To: <20191021183056.GA3177@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: MN2PR01CA0019.prod.exchangelabs.com (2603:10b6:208:10c::32) To VI1PR05MB4141.eurprd05.prod.outlook.com (2603:10a6:803:44::15) authentication-results: spf=none (sender IP is ) smtp.mailfrom=jgg@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [142.162.113.180] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 1fbf9f65-56ac-47e1-4696-08d756581799 x-ms-office365-filtering-ht: Tenant x-ms-traffictypediagnostic: VI1PR05MB4783: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0197AFBD92 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(4636009)(136003)(376002)(346002)(366004)(39860400002)(396003)(189003)(199004)(14454004)(11346002)(446003)(86362001)(33656002)(1076003)(25786009)(6916009)(81156014)(8936002)(8676002)(81166006)(478600001)(64756008)(66066001)(66476007)(66556008)(66946007)(66446008)(486006)(186003)(5660300002)(386003)(256004)(14444005)(5024004)(2616005)(476003)(26005)(102836004)(6506007)(71190400001)(76176011)(99286004)(52116002)(71200400001)(3846002)(7416002)(305945005)(6116002)(7736002)(6486002)(4326008)(6246003)(36756003)(229853002)(6512007)(6436002)(2906002)(54906003)(316002);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR05MB4783;H:VI1PR05MB4141.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 12Pj4hgoldSvQVPPXVnCdVZ7O3He2QOny5MLsSpPjGnriOWGioi8Vy3rgD/lFzAWFdxQFzOkYpXIe8+BUPTmdq+IrNM2xZgFI1NIkOkwGg1JAYj84CRzcETuyZe7YnRsGAGQ54ygSCTE0ultUXGvXyA8znvreycOSlVX7qlQWS6EDndrAC5ujawUNpeMNlmaKLF/VHW4EV2CADo6/Oy7Ii+2TvzyxUvKfit1Mqgbi5ugUTN+ax1s5hqi0wWfi0tUxjAX9i2Ml2gakVgQmJ8Z+UfKsyhDXziYlzhh17RyFkM4DgNJ+SEsjE5Q4HR8lMsocXuPoMQeYbIC8qbn5boeCRlrUpcIJ/JB2KSCRqTmu38qegTL8r84WUzveRN8cyw1LHj2nzybrDgFmt7Du/BjGBe+epu7fyEPtSOUqpO6fteMbbUtgBNpwM6ZXRQQYQda x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1fbf9f65-56ac-47e1-4696-08d756581799 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Oct 2019 18:54:25.6871 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: CsGDHQc+LIrneNeifg1aIbp5dK/n5j0zlXCz53IkdcLSt1572LiOobCwR6SH1GQsGz/8XQ4UTs3VfK80tI3Lqg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR05MB4783 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Mon, Oct 21, 2019 at 02:30:56PM -0400, Jerome Glisse wrote: > > +/** > > + * mmu_range_read_retry - End a read side critical section against a V= A range > > + * mrn: The range under lock > > + * seq: The return of the paired mmu_range_read_begin() > > + * > > + * This MUST be called under a user provided lock that is also held > > + * unconditionally by op->invalidate(). That lock provides the require= d SMP > > + * barrier for handling invalidate_seq. > > + * > > + * Each call should be paired with a single mmu_range_read_begin() and > > + * should be used to conclude the read side. > > + * > > + * Returns true if an invalidation collided with this critical section= , and > > + * the caller should retry. > > + */ > > +static inline bool mmu_range_read_retry(struct mmu_range_notifier *mrn= , > > + unsigned long seq) > > +{ > > + return READ_ONCE(mrn->invalidate_seq) !=3D seq; > > +} >=20 > What about calling this mmu_range_read_end() instead ? To match > with the mmu_range_read_begin(). _end make some sense too, but I picked _retry for symmetry with the seqcount_* family of functions which used retry. I think retry makes it clearer that it is expected to fail and retry is required. > > + /* > > + * The inv_end incorporates a deferred mechanism like rtnl. Adds and >=20 > The rtnl reference is lost on people unfamiliar with the network :) > code maybe like rtnl_lock()/rtnl_unlock() so people have a chance to > grep the right function. Assuming i am myself getting the right > reference :) Yep, you got it, I will update > > + /* > > + * mrn->invalidate_seq is always set to an odd value. This ensures > > + * that if seq does wrap we will always clear the below sleep in some > > + * reasonable time as mmn_mm->invalidate_seq is even in the idle > > + * state. >=20 > I think this comment should be with the struct mmu_range_notifier > definition and you should just point to it from here as the same > comment would be useful down below. I had it here because it is critical to understanding the wait_event and why it doesn't just block indefinitely, but yes this property comes up below too which refers back here. Fundamentally this wait event is why this approach to keep an odd value in the mrn is used. > > -int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r= ange) > > +static int mn_itree_invalidate(struct mmu_notifier_mm *mmn_mm, > > + const struct mmu_notifier_range *range) > > +{ > > + struct mmu_range_notifier *mrn; > > + unsigned long cur_seq; > > + > > + for (mrn =3D mn_itree_inv_start_range(mmn_mm, range, &cur_seq); mrn; > > + mrn =3D mn_itree_inv_next(mrn, range)) { > > + bool ret; > > + > > + WRITE_ONCE(mrn->invalidate_seq, cur_seq); > > + ret =3D mrn->ops->invalidate(mrn, range); > > + if (!ret && !WARN_ON(mmu_notifier_range_blockable(range))) >=20 > Isn't the logic wrong here ? We want to warn if the range > was mark as blockable and invalidate returned false. Also > we went to backoff no matter what if the invalidate return > false ie: If invalidate returned false and the caller is blockable then we do not want to return, we must continue processing other ranges - to try to cope with the defective driver. Callers in blocking mode ignore the return value and go ahead to invalidate.. Would it be clearer as=20 if (!ret) { if (WARN_ON(mmu_notifier_range_blockable(range))) continue; goto out_would_block; } ? > > @@ -284,21 +589,22 @@ int __mmu_notifier_register(struct mmu_notifier *= mn, struct mm_struct *mm) > > * the write side of the mmap_sem. > > */ > > mmu_notifier_mm =3D > > - kmalloc(sizeof(struct mmu_notifier_mm), GFP_KERNEL); > > + kzalloc(sizeof(struct mmu_notifier_mm), GFP_KERNEL); > > if (!mmu_notifier_mm) > > return -ENOMEM; > > =20 > > INIT_HLIST_HEAD(&mmu_notifier_mm->list); > > spin_lock_init(&mmu_notifier_mm->lock); > > + mmu_notifier_mm->invalidate_seq =3D 2; >=20 > Why starting at 2 ? Good question. If everything is coded properly the starting value doesn't matter I left it like this because it makes debugging a tiny bit simpler, ie if you print the seq number then the first mmu_range_notififers will get 1 as their intial seq (see __mmu_range_notifier_insert) instead of ULONG_MAX > > + mmu_notifier_mm->itree =3D RB_ROOT_CACHED; > > + init_waitqueue_head(&mmu_notifier_mm->wq); > > + INIT_HLIST_HEAD(&mmu_notifier_mm->deferred_list); > > } > > =20 > > ret =3D mm_take_all_locks(mm); > > if (unlikely(ret)) > > goto out_clean; > > =20 > > - /* Pairs with the mmdrop in mmu_notifier_unregister_* */ > > - mmgrab(mm); > > - > > /* > > * Serialize the update against mmu_notifier_unregister. A > > * side note: mmu_notifier_release can't run concurrently with > > @@ -306,13 +612,26 @@ int __mmu_notifier_register(struct mmu_notifier *= mn, struct mm_struct *mm) > > * current->mm or explicitly with get_task_mm() or similar). > > * We can't race against any other mmu notifier method either > > * thanks to mm_take_all_locks(). > > + * > > + * release semantics are provided for users not inside a lock covered > > + * by mm_take_all_locks(). acquire can only be used while holding the > > + * mmgrab or mmget, and is safe because once created the > > + * mmu_notififer_mm is not freed until the mm is destroyed. > > */ > > if (mmu_notifier_mm) > > - mm->mmu_notifier_mm =3D mmu_notifier_mm; > > + smp_store_release(&mm->mmu_notifier_mm, mmu_notifier_mm); >=20 > I do not understand why you need the release semantics here, we > are under the mmap_sem in write mode when we release it the lock > barrier will make sure anyone else sees the new mmu_notifier_mm It pairs with the smp_load_acquire() in mmu_range_notifier_insert() which is not called with the mmap_sem held.=20 Since that reader is not locked we need release semantics here to ensure the unlocked reader sees a fully initinalized mmu_notifier_mm structure when it observes the pointer. > > +/** > > + * mmu_range_notifier_insert - Insert a range notifier > > + * @mrn: Range notifier to register > > + * @start: Starting virtual address to monitor > > + * @length: Length of the range to monitor > > + * @mm : mm_struct to attach to > > + * > > + * This function subscribes the range notifier for notifications from = the mm. > > + * Upon return the ops related to mmu_range_notifier will be called wh= enever > > + * an event that intersects with the given range occurs. > > + * > > + * Upon return the range_notifier may not be present in the interval t= ree yet. > > + * The caller must use the normal range notifier locking flow via > > + * mmu_range_read_begin() to establish SPTEs for this range. > > + */ > > +int mmu_range_notifier_insert(struct mmu_range_notifier *mrn, > > + unsigned long start, unsigned long length, > > + struct mm_struct *mm) > > +{ > > + struct mmu_notifier_mm *mmn_mm; > > + int ret; > > + > > + might_lock(&mm->mmap_sem); > > + > > + mmn_mm =3D smp_load_acquire(&mm->mmu_notifier_mm); Right here we don't have the mmap_sem so this load is unlocked. If we observe !mmn_mm we must also observe all stores done to set it up. Ie we have to observe the spin_lock_init, RB_ROOT_CACHED/etc > > + if (!mmn_mm || !mmn_mm->has_interval) { > > + ret =3D mmu_notifier_register(NULL, mm); > > + if (ret) > > + return ret; > > + mmn_mm =3D mm->mmu_notifier_mm; > > + } > > + return __mmu_range_notifier_insert(mrn, start, length, mmn_mm, mm); > > +} > > +EXPORT_SYMBOL_GPL(mmu_range_notifier_insert); > > + > > +int mmu_range_notifier_insert_locked(struct mmu_range_notifier *mrn, > > + unsigned long start, unsigned long length, > > + struct mm_struct *mm) > > +{ > > + struct mmu_notifier_mm *mmn_mm; > > + int ret; > > + > > + lockdep_assert_held_write(&mm->mmap_sem); > > + > > + mmn_mm =3D mm->mmu_notifier_mm; >=20 > Shouldn't you be using smp_load_acquire() ? This function is called while holding the mmap_sem. As you noted above all writers to mm->mmu_notifier_mm hold the write side of mmap_sem, thus here the read side is fully locked and doesn't need the acquire. Note the lockdep annotations marking the expected locking enviroment for the two functions. Thanks, Jason