From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread. Date: Mon, 2 Oct 2017 06:59:12 +0300 Message-ID: <20171002065801-mutt-send-email-mst@kernel.org> References: <201709111927.IDD00574.tFVJHLOSOOMQFF@I-love.SAKURA.ne.jp> <20170929065654-mutt-send-email-mst@kernel.org> <201709291344.FID60965.VHtMQFFJFSLOOO@I-love.SAKURA.ne.jp> <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7DE286E21B for ; Mon, 2 Oct 2017 03:59:21 +0000 (UTC) Content-Disposition: inline In-Reply-To: <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Tetsuo Handa Cc: airlied@linux.ie, jasowang@redhat.com, jiangshanlai@gmail.com, josh@joshtriplett.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, mathieu.desnoyers@efficios.com, rostedt@goodmis.org, rodrigo.vivi@intel.com, paulmck@linux.vnet.ibm.com, intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org T24gU3VuLCBPY3QgMDEsIDIwMTcgYXQgMDI6NDQ6MzRQTSArMDkwMCwgVGV0c3VvIEhhbmRhIHdy b3RlOgo+IFRldHN1byBIYW5kYSB3cm90ZToKPiA+IE1pY2hhZWwgUy4gVHNpcmtpbiB3cm90ZToK PiA+ID4gT24gTW9uLCBTZXAgMTEsIDIwMTcgYXQgMDc6Mjc6MTlQTSArMDkwMCwgVGV0c3VvIEhh bmRhIHdyb3RlOgo+ID4gPiA+IEhlbGxvLgo+ID4gPiA+IAo+ID4gPiA+IEkgbm90aWNlZCB0aGF0 IHZpcnRpb19iYWxsb29uIGlzIHVzaW5nIHJlZ2lzdGVyX29vbV9ub3RpZmllcigpIGFuZAo+ID4g PiA+IGxlYWtfYmFsbG9vbigpIGZyb20gdmlydGJhbGxvb25fb29tX25vdGlmeSgpIG1pZ2h0IGRl cGVuZCBvbgo+ID4gPiA+IF9fR0ZQX0RJUkVDVF9SRUNMQUlNIG1lbW9yeSBhbGxvY2F0aW9uLgo+ ID4gPiA+IAo+ID4gPiA+IEluIGxlYWtfYmFsbG9vbigpLCBtdXRleF9sb2NrKCZ2Yi0+YmFsbG9v bl9sb2NrKSBpcyBjYWxsZWQgaW4gb3JkZXIgdG8KPiA+ID4gPiBzZXJpYWxpemUgYWdhaW5zdCBm aWxsX2JhbGxvb24oKS4gQnV0IGluIGZpbGxfYmFsbG9vbigpLAo+ID4gPiA+IGFsbG9jX3BhZ2Uo R0ZQX0hJR0hVU0VSW19NT1ZBQkxFXSB8IF9fR0ZQX05PTUVNQUxMT0MgfCBfX0dGUF9OT1JFVFJZ KSBpcwo+ID4gPiA+IGNhbGxlZCB3aXRoIHZiLT5iYWxsb29uX2xvY2sgbXV0ZXggaGVsZC4gU2lu Y2UgR0ZQX0hJR0hVU0VSW19NT1ZBQkxFXSBpbXBsaWVzCj4gPiA+ID4gX19HRlBfRElSRUNUX1JF Q0xBSU0gfCBfX0dGUF9JTyB8IF9fR0ZQX0ZTLCB0aGlzIGFsbG9jYXRpb24gYXR0ZW1wdCBtaWdo dAo+ID4gPiA+IGRlcGVuZCBvbiBzb21lYm9keSBlbHNlJ3MgX19HRlBfRElSRUNUX1JFQ0xBSU0g fCAhX19HRlBfTk9SRVRSWSBtZW1vcnkKPiA+ID4gPiBhbGxvY2F0aW9uLiBTdWNoIF9fR0ZQX0RJ UkVDVF9SRUNMQUlNIHwgIV9fR0ZQX05PUkVUUlkgYWxsb2NhdGlvbiBjYW4gcmVhY2gKPiA+ID4g PiBfX2FsbG9jX3BhZ2VzX21heV9vb20oKSBhbmQgaG9sZCBvb21fbG9jayBtdXRleCBhbmQgY2Fs bCBvdXRfb2ZfbWVtb3J5KCkuCj4gPiA+ID4gQW5kIGxlYWtfYmFsbG9vbigpIGlzIGNhbGxlZCBi eSB2aXJ0YmFsbG9vbl9vb21fbm90aWZ5KCkgdmlhCj4gPiA+ID4gYmxvY2tpbmdfbm90aWZpZXJf Y2FsbF9jaGFpbigpIGNhbGxiYWNrIHdoZW4gdmItPmJhbGxvb25fbG9jayBtdXRleCBpcyBhbHJl YWR5Cj4gPiA+ID4gaGVsZCBieSBmaWxsX2JhbGxvb24oKS4gQXMgYSByZXN1bHQsIGRlc3BpdGUg X19HRlBfTk9SRVRSWSBpcyBzcGVjaWZpZWQsCj4gPiA+ID4gZmlsbF9iYWxsb29uKCkgY2FuIGlu ZGlyZWN0bHkgZ2V0IHN0dWNrIHdhaXRpbmcgZm9yIHZiLT5iYWxsb29uX2xvY2sgbXV0ZXgKPiA+ ID4gPiBhdCBsZWFrX2JhbGxvb24oKS4KPiA+ID4gCj4gPiA+IFRoYXQgd291bGQgYmUgdHJpY2t5 IHRvIGZpeC4gSSBndWVzcyB3ZSdsbCBuZWVkIHRvIGRyb3AgdGhlIGxvY2sKPiA+ID4gd2hpbGUg YWxsb2NhdGluZyBtZW1vcnkgLSBub3QgYW4gZWFzeSBmaXguCj4gPiA+IAo+ID4gPiA+IEFsc28s IGluIGxlYWtfYmFsbG9vbigpLCB2aXJ0cXVldWVfYWRkX291dGJ1ZihHRlBfS0VSTkVMKSBpcyBj YWxsZWQgdmlhCj4gPiA+ID4gdGVsbF9ob3N0KCkuIFJlYWNoaW5nIF9fYWxsb2NfcGFnZXNfbWF5 X29vbSgpIGZyb20gdGhpcyB2aXJ0cXVldWVfYWRkX291dGJ1ZigpCj4gPiA+ID4gcmVxdWVzdCBm cm9tIGxlYWtfYmFsbG9vbigpIGZyb20gdmlydGJhbGxvb25fb29tX25vdGlmeSgpIGZyb20KPiA+ ID4gPiBibG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCkgZnJvbSBvdXRfb2ZfbWVtb3J5KCkg bGVhZHMgdG8gT09NIGxvY2t1cAo+ID4gPiA+IGJlY2F1c2Ugb29tX2xvY2sgbXV0ZXggaXMgYWxy ZWFkeSBoZWxkIGJlZm9yZSBjYWxsaW5nIG91dF9vZl9tZW1vcnkoKS4KPiA+ID4gCj4gPiA+IEkg Z3Vlc3Mgd2Ugc2hvdWxkIGp1c3QgZG8KPiA+ID4gCj4gPiA+IEdGUF9LRVJORUwgJiB+X19HRlBf RElSRUNUX1JFQ0xBSU0gdGhlcmUgdGhlbj8KPiA+IAo+ID4gWWVzLCBidXQgR0ZQX0tFUk5FTCAm IH5fX0dGUF9ESVJFQ1RfUkVDTEFJTSB3aWxsIGVmZmVjdGl2ZWx5IGJlIEdGUF9OT1dBSVQsIGZv cgo+ID4gX19HRlBfSU8gYW5kIF9fR0ZQX0ZTIHdvbid0IG1ha2Ugc2Vuc2Ugd2l0aG91dCBfX0dG UF9ESVJFQ1RfUkVDTEFJTS4gSXQgbWlnaHQKPiA+IHNpZ25pZmljYW50bHkgaW5jcmVhc2VzIHBv c3NpYmlsaXR5IG9mIG1lbW9yeSBhbGxvY2F0aW9uIGZhaWx1cmUuCj4gPiAKPiA+ID4gCj4gPiA+ IAo+ID4gPiA+IAo+ID4gPiA+IE9PTSBub3RpZmllciBjYWxsYmFjayBzaG91bGQgbm90IChkaXJl Y3RseSBvciBpbmRpcmVjdGx5KSBkZXBlbmQgb24KPiA+ID4gPiBfX0dGUF9ESVJFQ1RfUkVDTEFJ TSBtZW1vcnkgYWxsb2NhdGlvbiBhdHRlbXB0LiBDYW4geW91IGZpeCB0aGlzIGRlcGVuZGVuY3k/ Cj4gPiA+IAo+ID4gCj4gPiBBbm90aGVyIGlkZWEgd291bGQgYmUgdG8gdXNlIGEga2VybmVsIHRo cmVhZCAob3Igd29ya3F1ZXVlKSBzbyB0aGF0Cj4gPiB2aXJ0YmFsbG9vbl9vb21fbm90aWZ5KCkg Y2FuIHdhaXQgd2l0aCB0aW1lb3V0Lgo+ID4gCj4gPiBXZSBjb3VsZCBvZmZsb2FkIGVudGlyZSBi bG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCZvb21fbm90aWZ5X2xpc3QsIDAsICZmcmVlZCkK PiA+IGNhbGwgdG8gYSBrZXJuZWwgdGhyZWFkIChvciB3b3JrcXVldWUpIHdpdGggdGltZW91dCBp ZiBNTSBmb2xrcyBhZ3JlZS4KPiA+IAo+IAo+IEJlbG93IGlzIGEgcGF0Y2ggd2hpY2ggb2ZmbG9h ZHMgYmxvY2tpbmdfbm90aWZpZXJfY2FsbF9jaGFpbigpIGNhbGwuIFdoYXQgZG8geW91IHRoaW5r Pwo+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPiBbUkZDXSBbUEFU Q0hdIG1tLG9vbTogT2ZmbG9hZCBPT00gbm90aWZ5IGNhbGxiYWNrIHRvIGEga2VybmVsIHRocmVh ZC4KPiAKPiBTaW5jZSBvb21fbm90aWZ5X2xpc3QgaXMgdHJhdmVyc2VkIHZpYSBibG9ja2luZ19u b3RpZmllcl9jYWxsX2NoYWluKCksCj4gaXQgaXMgbGVnYWwgdG8gc2xlZXAgaW5zaWRlIE9PTSBu b3RpZmllciBjYWxsYmFjayBmdW5jdGlvbi4KPiAKPiBIb3dldmVyLCBzaW5jZSBvb21fbm90aWZ5 X2xpc3QgaXMgdHJhdmVyc2VkIHdpdGggb29tX2xvY2sgaGVsZCwKPiBfX0dGUF9ESVJFQ1RfUkVD TEFJTSAmJiAhX19HRlBfTk9SRVRSWSBtZW1vcnkgYWxsb2NhdGlvbiBhdHRlbXB0IGNhbm5vdAo+ IGZhaWwgd2hlbiB0cmF2ZXJzaW5nIG9vbV9ub3RpZnlfbGlzdCBlbnRyaWVzLiBUaGVyZWZvcmUs IE9PTSBub3RpZmllcgo+IGNhbGxiYWNrIGZ1bmN0aW9uIHNob3VsZCBub3QgKGRpcmVjdGx5IG9y IGluZGlyZWN0bHkpIGRlcGVuZCBvbgo+IF9fR0ZQX0RJUkVDVF9SRUNMQUlNICYmICFfX0dGUF9O T1JFVFJZIG1lbW9yeSBhbGxvY2F0aW9uIGF0dGVtcHQuCj4gCj4gQ3VycmVudGx5IHRoZXJlIGFy ZSA1IHJlZ2lzdGVyX29vbV9ub3RpZmllcigpIHVzZXJzIGluIHRoZSBtYWlubGluZSBrZXJuZWwu Cj4gCj4gICBhcmNoL3Bvd2VycGMvcGxhdGZvcm1zL3BzZXJpZXMvY21tLmMKPiAgIGFyY2gvczM5 MC9tbS9jbW0uYwo+ICAgZHJpdmVycy9ncHUvZHJtL2k5MTUvaTkxNV9nZW1fc2hyaW5rZXIuYwo+ ICAgZHJpdmVycy92aXJ0aW8vdmlydGlvX2JhbGxvb24uYwo+ICAga2VybmVsL3JjdS90cmVlX3Bs dWdpbi5oCj4gCj4gQW1vbmcgdGhlc2UgdXNlcnMsIGF0IGxlYXN0IHZpcnRpb19iYWxsb29uLmMg aGFzIHBvc3NpYmlsaXR5IG9mIE9PTSBsb2NrdXAKPiBiZWNhdXNlIGl0IGlzIHVzaW5nIG11dGV4 IHdoaWNoIGNhbiBkZXBlbmQgb24gR0ZQX0tFUk5FTCBtZW1vcnkgYWxsb2NhdGlvbnMuCj4gKEJv dGggY21tLmMgc2VlbSB0byBiZSBzYWZlIGFzIHRoZXkgdXNlIHNwaW5sb2Nrcy4gSSdtIG5vdCBz dXJlIGFib3V0Cj4gdHJlZV9wbHVnaW4uaCBhbmQgaTkxNV9nZW1fc2hyaW5rZXIuYyAuIFBsZWFz ZSBjaGVjay4pCj4gCj4gQnV0IGNvbnZlcnRpbmcgc3VjaCBhbGxvY2F0aW9ucyB0byB1c2UgR0ZQ X05PV0FJVCBpcyBub3Qgb25seSBwcm9uZSB0bwo+IGFsbG9jYXRpb24gZmFpbHVyZXMgdW5kZXIg bWVtb3J5IHByZXNzdXJlIGJ1dCBhbHNvIGRpZmZpY3VsdCB0byBhdWRpdAo+IHdoZXRoZXIgYWxs IGxvY2F0aW9ucyBhcmUgY29udmVydGVkIHRvIHVzZSBHRlBfTk9XQUlULgo+IAo+IFRoZXJlZm9y ZSwgdGhpcyBwYXRjaCBvZmZsb2FkcyBibG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCkgY2Fs bCB0byBhCj4gZGVkaWNhdGVkIGtlcm5lbCB0aHJlYWQgYW5kIHdhaXQgZm9yIGNvbXBsZXRpb24g d2l0aCB0aW1lb3V0IG9mIDUgc2Vjb25kcwo+IHNvIHRoYXQgd2UgY2FuIGNvbXBsZXRlbHkgZm9y Z2V0IGFib3V0IHBvc3NpYmlsaXR5IG9mIE9PTSBsb2NrdXAgZHVlIHRvCj4gT09NIG5vdGlmaWVy IGNhbGxiYWNrIGZ1bmN0aW9uLgo+IAo+ICg1IHNlY29uZHMgaXMgY2hvc2VuIGZyb20gbXkgZ3Vl c3MgdGhhdCBibG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCkKPiBzaG91bGQgbm90IHRha2Ug bG9uZywgZm9yIHdlIGFyZSB1c2luZyBtdXRleF90cnlsb2NrKCZvb21fbG9jaykgYXQKPiBfX2Fs bG9jX3BhZ2VzX21heV9vb20oKSBiYXNlZCBvbiBhbiBhc3N1bXB0aW9uIHRoYXQgb3V0X29mX21l bW9yeSgpIHNob3VsZAo+IHJlY2xhaW0gbWVtb3J5IHNob3J0bHkuKQo+IAo+IFRoZSBrZXJuZWwg dGhyZWFkIGlzIGNyZWF0ZWQgdXBvbiBmaXJzdCByZWdpc3Rlcl9vb21fbm90aWZpZXIoKSBjYWxs Lgo+IFRodXMsIHRob3NlIGVudmlyb25tZW50cyB3aGljaCBkbyBub3QgdXNlIHJlZ2lzdGVyX29v bV9ub3RpZmllcigpIHdpbGwKPiBub3Qgd2FzdGUgcmVzb3VyY2UgZm9yIHRoZSBkZWRpY2F0ZWQg a2VybmVsIHRocmVhZC4KPiAKPiBTaWduZWQtb2ZmLWJ5OiBUZXRzdW8gSGFuZGEgPHBlbmd1aW4t a2VybmVsQEktbG92ZS5TQUtVUkEubmUuanA+Cj4gLS0tCj4gIG1tL29vbV9raWxsLmMgfCA0MCAr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tCj4gIDEgZmlsZSBjaGFuZ2Vk LCAzNiBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQo+IAo+IGRpZmYgLS1naXQgYS9tbS9v b21fa2lsbC5jIGIvbW0vb29tX2tpbGwuYwo+IGluZGV4IGRlZTBmNzUuLmQ5NzQ0ZjcgMTAwNjQ0 Cj4gLS0tIGEvbW0vb29tX2tpbGwuYwo+ICsrKyBiL21tL29vbV9raWxsLmMKPiBAQCAtOTgxLDkg Kzk4MSwzNyBAQCBzdGF0aWMgdm9pZCBjaGVja19wYW5pY19vbl9vb20oc3RydWN0IG9vbV9jb250 cm9sICpvYywKPiAgfQo+ICAKPiAgc3RhdGljIEJMT0NLSU5HX05PVElGSUVSX0hFQUQob29tX25v dGlmeV9saXN0KTsKPiArc3RhdGljIGJvb2wgb29tX25vdGlmaWVyX3JlcXVlc3RlZDsKPiArc3Rh dGljIHVuc2lnbmVkIGxvbmcgb29tX25vdGlmaWVyX2ZyZWVkOwo+ICtzdGF0aWMgc3RydWN0IHRh c2tfc3RydWN0ICpvb21fbm90aWZpZXJfdGg7Cj4gK3N0YXRpYyBERUNMQVJFX1dBSVRfUVVFVUVf SEVBRChvb21fbm90aWZpZXJfcmVxdWVzdF93YWl0KTsKPiArc3RhdGljIERFQ0xBUkVfV0FJVF9R VUVVRV9IRUFEKG9vbV9ub3RpZmllcl9yZXNwb25zZV93YWl0KTsKPiArCj4gK3N0YXRpYyBpbnQg b29tX25vdGlmaWVyKHZvaWQgKnVudXNlZCkKPiArewo+ICsJd2hpbGUgKHRydWUpIHsKPiArCQl3 YWl0X2V2ZW50X2ZyZWV6YWJsZShvb21fbm90aWZpZXJfcmVxdWVzdF93YWl0LAo+ICsJCQkJICAg ICBvb21fbm90aWZpZXJfcmVxdWVzdGVkKTsKPiArCQlibG9ja2luZ19ub3RpZmllcl9jYWxsX2No YWluKCZvb21fbm90aWZ5X2xpc3QsIDAsCj4gKwkJCQkJICAgICAmb29tX25vdGlmaWVyX2ZyZWVk KTsKPiArCQlvb21fbm90aWZpZXJfcmVxdWVzdGVkID0gZmFsc2U7Cj4gKwkJd2FrZV91cCgmb29t X25vdGlmaWVyX3Jlc3BvbnNlX3dhaXQpOwo+ICsJfQo+ICsJcmV0dXJuIDA7Cj4gK30KPiAgCj4g IGludCByZWdpc3Rlcl9vb21fbm90aWZpZXIoc3RydWN0IG5vdGlmaWVyX2Jsb2NrICpuYikKPiAg ewo+ICsJaWYgKCFvb21fbm90aWZpZXJfdGgpIHsKPiArCQlzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRo ID0ga3RocmVhZF9ydW4ob29tX25vdGlmaWVyLCBOVUxMLAo+ICsJCQkJCQkgICAgICJvb21fbm90 aWZpZXIiKTsKPiArCj4gKwkJaWYgKElTX0VSUih0aCkpIHsKPiArCQkJcHJfZXJyKCJVbmFibGUg dG8gc3RhcnQgT09NIG5vdGlmaWVyIHRocmVhZC5cbiIpOwo+ICsJCQlyZXR1cm4gKGludCkgUFRS X0VSUih0aCk7Cj4gKwkJfQo+ICsJCW9vbV9ub3RpZmllcl90aCA9IHRoOwo+ICsJfQo+ICAJcmV0 dXJuIGJsb2NraW5nX25vdGlmaWVyX2NoYWluX3JlZ2lzdGVyKCZvb21fbm90aWZ5X2xpc3QsIG5i KTsKPiAgfQo+ICBFWFBPUlRfU1lNQk9MX0dQTChyZWdpc3Rlcl9vb21fbm90aWZpZXIpOwo+IEBA IC0xMDA1LDE3ICsxMDMzLDIxIEBAIGludCB1bnJlZ2lzdGVyX29vbV9ub3RpZmllcihzdHJ1Y3Qg bm90aWZpZXJfYmxvY2sgKm5iKQo+ICAgKi8KPiAgYm9vbCBvdXRfb2ZfbWVtb3J5KHN0cnVjdCBv b21fY29udHJvbCAqb2MpCj4gIHsKPiAtCXVuc2lnbmVkIGxvbmcgZnJlZWQgPSAwOwo+ICAJZW51 bSBvb21fY29uc3RyYWludCBjb25zdHJhaW50ID0gQ09OU1RSQUlOVF9OT05FOwo+ICAKPiAgCWlm IChvb21fa2lsbGVyX2Rpc2FibGVkKQo+ICAJCXJldHVybiBmYWxzZTsKPiAgCj4gLQlpZiAoIWlz X21lbWNnX29vbShvYykpIHsKPiAtCQlibG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCZvb21f bm90aWZ5X2xpc3QsIDAsICZmcmVlZCk7Cj4gLQkJaWYgKGZyZWVkID4gMCkKPiArCWlmICghaXNf bWVtY2dfb29tKG9jKSAmJiBvb21fbm90aWZpZXJfdGgpIHsKPiArCQlvb21fbm90aWZpZXJfcmVx dWVzdGVkID0gdHJ1ZTsKPiArCQl3YWtlX3VwKCZvb21fbm90aWZpZXJfcmVxdWVzdF93YWl0KTsK PiArCQl3YWl0X2V2ZW50X3RpbWVvdXQob29tX25vdGlmaWVyX3Jlc3BvbnNlX3dhaXQsCj4gKwkJ CQkgICAhb29tX25vdGlmaWVyX3JlcXVlc3RlZCwgNSAqIEhaKTsKCkkgZ3Vlc3MgdGhpcyBtZWFu cyB3aGF0IHdhcyBlYXJsaWVyIGEgZGVhZGxvY2sgd2lsbCBmcmVlIHVwIGFmdGVyIDUKc2Vjb25k cywgYnkgYSA1IHNlYyBkb3dudGltZSBpcyBzdGlsbCBhIGxvdCwgaXNuJ3QgaXQ/CgoKPiArCQlp ZiAob29tX25vdGlmaWVyX2ZyZWVkKSB7Cj4gKwkJCW9vbV9ub3RpZmllcl9mcmVlZCA9IDA7Cj4g IAkJCS8qIEdvdCBzb21lIG1lbW9yeSBiYWNrIGluIHRoZSBsYXN0IHNlY29uZC4gKi8KPiAgCQkJ cmV0dXJuIHRydWU7Cj4gKwkJfQo+ICAJfQo+ICAKPiAgCS8qCj4gLS0gCj4gMS44LjMuMQpfX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpJbnRlbC1nZnggbWFp bGluZyBsaXN0CkludGVsLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5m cmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id D2E0C6B0033 for ; Sun, 1 Oct 2017 23:59:21 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id i12so3748621qka.15 for ; Sun, 01 Oct 2017 20:59:21 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id a27si2878550qtd.410.2017.10.01.20.59.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 01 Oct 2017 20:59:20 -0700 (PDT) Date: Mon, 2 Oct 2017 06:59:12 +0300 From: "Michael S. Tsirkin" Subject: Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread. Message-ID: <20171002065801-mutt-send-email-mst@kernel.org> References: <201709111927.IDD00574.tFVJHLOSOOMQFF@I-love.SAKURA.ne.jp> <20170929065654-mutt-send-email-mst@kernel.org> <201709291344.FID60965.VHtMQFFJFSLOOO@I-love.SAKURA.ne.jp> <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: jasowang@redhat.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com, airlied@linux.ie, paulmck@linux.vnet.ibm.com, josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, virtualization@lists.linux-foundation.org, intel-gfx@lists.freedesktop.org, linux-mm@kvack.org On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > Michael S. Tsirkin wrote: > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > Hello. > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > at leak_balloon(). > > > > > > That would be tricky to fix. I guess we'll need to drop the lock > > > while allocating memory - not an easy fix. > > > > > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via > > > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() > > > > request from leak_balloon() from virtballoon_oom_notify() from > > > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup > > > > because oom_lock mutex is already held before calling out_of_memory(). > > > > > > I guess we should just do > > > > > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then? > > > > Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for > > __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might > > significantly increases possibility of memory allocation failure. > > > > > > > > > > > > > > > > OOM notifier callback should not (directly or indirectly) depend on > > > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency? > > > > > > > Another idea would be to use a kernel thread (or workqueue) so that > > virtballoon_oom_notify() can wait with timeout. > > > > We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed) > > call to a kernel thread (or workqueue) with timeout if MM folks agree. > > > > Below is a patch which offloads blocking_notifier_call_chain() call. What do you think? > ---------------------------------------- > [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread. > > Since oom_notify_list is traversed via blocking_notifier_call_chain(), > it is legal to sleep inside OOM notifier callback function. > > However, since oom_notify_list is traversed with oom_lock held, > __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot > fail when traversing oom_notify_list entries. Therefore, OOM notifier > callback function should not (directly or indirectly) depend on > __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt. > > Currently there are 5 register_oom_notifier() users in the mainline kernel. > > arch/powerpc/platforms/pseries/cmm.c > arch/s390/mm/cmm.c > drivers/gpu/drm/i915/i915_gem_shrinker.c > drivers/virtio/virtio_balloon.c > kernel/rcu/tree_plugin.h > > Among these users, at least virtio_balloon.c has possibility of OOM lockup > because it is using mutex which can depend on GFP_KERNEL memory allocations. > (Both cmm.c seem to be safe as they use spinlocks. I'm not sure about > tree_plugin.h and i915_gem_shrinker.c . Please check.) > > But converting such allocations to use GFP_NOWAIT is not only prone to > allocation failures under memory pressure but also difficult to audit > whether all locations are converted to use GFP_NOWAIT. > > Therefore, this patch offloads blocking_notifier_call_chain() call to a > dedicated kernel thread and wait for completion with timeout of 5 seconds > so that we can completely forget about possibility of OOM lockup due to > OOM notifier callback function. > > (5 seconds is chosen from my guess that blocking_notifier_call_chain() > should not take long, for we are using mutex_trylock(&oom_lock) at > __alloc_pages_may_oom() based on an assumption that out_of_memory() should > reclaim memory shortly.) > > The kernel thread is created upon first register_oom_notifier() call. > Thus, those environments which do not use register_oom_notifier() will > not waste resource for the dedicated kernel thread. > > Signed-off-by: Tetsuo Handa > --- > mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++---- > 1 file changed, 36 insertions(+), 4 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index dee0f75..d9744f7 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc, > } > > static BLOCKING_NOTIFIER_HEAD(oom_notify_list); > +static bool oom_notifier_requested; > +static unsigned long oom_notifier_freed; > +static struct task_struct *oom_notifier_th; > +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait); > +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait); > + > +static int oom_notifier(void *unused) > +{ > + while (true) { > + wait_event_freezable(oom_notifier_request_wait, > + oom_notifier_requested); > + blocking_notifier_call_chain(&oom_notify_list, 0, > + &oom_notifier_freed); > + oom_notifier_requested = false; > + wake_up(&oom_notifier_response_wait); > + } > + return 0; > +} > > int register_oom_notifier(struct notifier_block *nb) > { > + if (!oom_notifier_th) { > + struct task_struct *th = kthread_run(oom_notifier, NULL, > + "oom_notifier"); > + > + if (IS_ERR(th)) { > + pr_err("Unable to start OOM notifier thread.\n"); > + return (int) PTR_ERR(th); > + } > + oom_notifier_th = th; > + } > return blocking_notifier_chain_register(&oom_notify_list, nb); > } > EXPORT_SYMBOL_GPL(register_oom_notifier); > @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb) > */ > bool out_of_memory(struct oom_control *oc) > { > - unsigned long freed = 0; > enum oom_constraint constraint = CONSTRAINT_NONE; > > if (oom_killer_disabled) > return false; > > - if (!is_memcg_oom(oc)) { > - blocking_notifier_call_chain(&oom_notify_list, 0, &freed); > - if (freed > 0) > + if (!is_memcg_oom(oc) && oom_notifier_th) { > + oom_notifier_requested = true; > + wake_up(&oom_notifier_request_wait); > + wait_event_timeout(oom_notifier_response_wait, > + !oom_notifier_requested, 5 * HZ); I guess this means what was earlier a deadlock will free up after 5 seconds, by a 5 sec downtime is still a lot, isn't it? > + if (oom_notifier_freed) { > + oom_notifier_freed = 0; > /* Got some memory back in the last second. */ > return true; > + } > } > > /* > -- > 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org