From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread. Date: Mon, 2 Oct 2017 17:11:55 +0300 Message-ID: <20171002170642-mutt-send-email-mst@kernel.org> References: <20170929065654-mutt-send-email-mst@kernel.org> <201709291344.FID60965.VHtMQFFJFSLOOO@I-love.SAKURA.ne.jp> <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> <20171002065801-mutt-send-email-mst@kernel.org> <20171002090627.547gkmzvutrsamex@dhcp22.suse.cz> <201710022033.GFE82801.HLOVOFFJtSFQMO@I-love.SAKURA.ne.jp> <20171002115035.7sph6ul6hsszdwa4@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6A3686E32F for ; Mon, 2 Oct 2017 14:11:59 +0000 (UTC) Content-Disposition: inline In-Reply-To: <20171002115035.7sph6ul6hsszdwa4@dhcp22.suse.cz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Michal Hocko Cc: linux-mm@kvack.org, Tetsuo Handa , jasowang@redhat.com, jiangshanlai@gmail.com, josh@joshtriplett.org, virtualization@lists.linux-foundation.org, airlied@linux.ie, mathieu.desnoyers@efficios.com, rostedt@goodmis.org, rodrigo.vivi@intel.com, paulmck@linux.vnet.ibm.com, intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org T24gTW9uLCBPY3QgMDIsIDIwMTcgYXQgMDE6NTA6MzVQTSArMDIwMCwgTWljaGFsIEhvY2tvIHdy b3RlOgo+IE9uIE1vbiAwMi0xMC0xNyAyMDozMzo1MiwgVGV0c3VvIEhhbmRhIHdyb3RlOgo+ID4g TWljaGFsIEhvY2tvIHdyb3RlOgo+ID4gPiBbSG1tLCBJIGRvIG5vdCBzZWUgdGhlIG9yaWdpbmFs IHBhdGNoIHdoaWNoIHRoaXMgaGFzIGJlZW4gYSByZXBseSB0b10KPiA+IAo+ID4gdXJibC5ob3N0 ZWRlbWFpbC5jb20gYW5kIGIuYmFycmFjdWRhY2VudHJhbC5vcmcgYmxvY2tlZCBteSBJUCBhZGRy ZXNzLAo+ID4gYW5kIHRoZSByZXN0IGFyZSAiUmVjaXBpZW50IGFkZHJlc3MgcmVqZWN0ZWQ6IEdy ZXlsaXN0ZWQiIG9yCj4gPiAiRGVmZXJyZWQ6IDQ1MS00LjMuMCBNdWx0aXBsZSBkZXN0aW5hdGlv biBkb21haW5zIHBlciB0cmFuc2FjdGlvbiBpcyB1bnN1cHBvcnRlZC4iLAo+ID4gYW5kIGFmdGVy IGFsbCBkcm9wcGVkIGF0IHRoZSBzZXJ2ZXJzLiBTYWQuLi4KPiA+IAo+ID4gPiAKPiA+ID4gT24g TW9uIDAyLTEwLTE3IDA2OjU5OjEyLCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4g T24gU3VuLCBPY3QgMDEsIDIwMTcgYXQgMDI6NDQ6MzRQTSArMDkwMCwgVGV0c3VvIEhhbmRhIHdy b3RlOgo+ID4gPiA+ID4gVGV0c3VvIEhhbmRhIHdyb3RlOgo+ID4gPiA+ID4gPiBNaWNoYWVsIFMu IFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4gPiA+ID4gT24gTW9uLCBTZXAgMTEsIDIwMTcgYXQgMDc6 Mjc6MTlQTSArMDkwMCwgVGV0c3VvIEhhbmRhIHdyb3RlOgo+ID4gPiA+ID4gPiA+ID4gSGVsbG8u Cj4gPiA+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gPiA+IEkgbm90aWNlZCB0aGF0IHZpcnRpb19i YWxsb29uIGlzIHVzaW5nIHJlZ2lzdGVyX29vbV9ub3RpZmllcigpIGFuZAo+ID4gPiA+ID4gPiA+ ID4gbGVha19iYWxsb29uKCkgZnJvbSB2aXJ0YmFsbG9vbl9vb21fbm90aWZ5KCkgbWlnaHQgZGVw ZW5kIG9uCj4gPiA+ID4gPiA+ID4gPiBfX0dGUF9ESVJFQ1RfUkVDTEFJTSBtZW1vcnkgYWxsb2Nh dGlvbi4KPiA+ID4gPiA+ID4gPiA+IAo+ID4gPiA+ID4gPiA+ID4gSW4gbGVha19iYWxsb29uKCks IG11dGV4X2xvY2soJnZiLT5iYWxsb29uX2xvY2spIGlzIGNhbGxlZCBpbiBvcmRlciB0bwo+ID4g PiA+ID4gPiA+ID4gc2VyaWFsaXplIGFnYWluc3QgZmlsbF9iYWxsb29uKCkuIEJ1dCBpbiBmaWxs X2JhbGxvb24oKSwKPiA+ID4gPiA+ID4gPiA+IGFsbG9jX3BhZ2UoR0ZQX0hJR0hVU0VSW19NT1ZB QkxFXSB8IF9fR0ZQX05PTUVNQUxMT0MgfCBfX0dGUF9OT1JFVFJZKSBpcwo+ID4gPiA+ID4gPiA+ ID4gY2FsbGVkIHdpdGggdmItPmJhbGxvb25fbG9jayBtdXRleCBoZWxkLiBTaW5jZSBHRlBfSElH SFVTRVJbX01PVkFCTEVdIGltcGxpZXMKPiA+ID4gPiA+ID4gPiA+IF9fR0ZQX0RJUkVDVF9SRUNM QUlNIHwgX19HRlBfSU8gfCBfX0dGUF9GUywgdGhpcyBhbGxvY2F0aW9uIGF0dGVtcHQgbWlnaHQK PiA+ID4gPiA+ID4gPiA+IGRlcGVuZCBvbiBzb21lYm9keSBlbHNlJ3MgX19HRlBfRElSRUNUX1JF Q0xBSU0gfCAhX19HRlBfTk9SRVRSWSBtZW1vcnkKPiA+ID4gPiA+ID4gPiA+IGFsbG9jYXRpb24u IFN1Y2ggX19HRlBfRElSRUNUX1JFQ0xBSU0gfCAhX19HRlBfTk9SRVRSWSBhbGxvY2F0aW9uIGNh biByZWFjaAo+ID4gPiA+ID4gPiA+ID4gX19hbGxvY19wYWdlc19tYXlfb29tKCkgYW5kIGhvbGQg b29tX2xvY2sgbXV0ZXggYW5kIGNhbGwgb3V0X29mX21lbW9yeSgpLgo+ID4gPiA+ID4gPiA+ID4g QW5kIGxlYWtfYmFsbG9vbigpIGlzIGNhbGxlZCBieSB2aXJ0YmFsbG9vbl9vb21fbm90aWZ5KCkg dmlhCj4gPiA+ID4gPiA+ID4gPiBibG9ja2luZ19ub3RpZmllcl9jYWxsX2NoYWluKCkgY2FsbGJh Y2sgd2hlbiB2Yi0+YmFsbG9vbl9sb2NrIG11dGV4IGlzIGFscmVhZHkKPiA+ID4gPiA+ID4gPiA+ IGhlbGQgYnkgZmlsbF9iYWxsb29uKCkuIEFzIGEgcmVzdWx0LCBkZXNwaXRlIF9fR0ZQX05PUkVU UlkgaXMgc3BlY2lmaWVkLAo+ID4gPiA+ID4gPiA+ID4gZmlsbF9iYWxsb29uKCkgY2FuIGluZGly ZWN0bHkgZ2V0IHN0dWNrIHdhaXRpbmcgZm9yIHZiLT5iYWxsb29uX2xvY2sgbXV0ZXgKPiA+ID4g PiA+ID4gPiA+IGF0IGxlYWtfYmFsbG9vbigpLgo+ID4gPiAKPiA+ID4gVGhpcyBpcyByZWFsbHkg bmFzdHkhIEFuZCBJIHdvdWxkIGFyZ3VlIHRoYXQgdGhpcyBpcyBhbiBhYnVzZSBvZiB0aGUgb29t Cj4gPiA+IG5vdGlmaWVyIGludGVyZmFjZSBmcm9tIHRoZSB2aXJ0aW8gY29kZS4gT09NIG5vdGlm aWVycyBhcmUgYW4gdWdseSBoYWNrCj4gPiA+IG9uIGl0cyBvd24gYnV0IGFsbCBpdHMgdXNlcnMg aGF2ZSB0byBiZSByZWFsbHkgY2FyZWZ1bCB0byBub3QgZGVwZW5kIG9uCj4gPiA+IGFueSBhbGxv Y2F0aW9uIHJlcXVlc3QgYmVjYXVzZSB0aGF0IGlzIGEgc3RyYWlnaHQgZGVhZGxvY2sgc2l0dWF0 aW9uLgo+ID4gCj4gPiBQbGVhc2UgZGVzY3JpYmUgc3VjaCB3YXJuaW5nIGF0Cj4gPiAiaW50IHJl Z2lzdGVyX29vbV9ub3RpZmllcihzdHJ1Y3Qgbm90aWZpZXJfYmxvY2sgKm5iKSIgZGVmaW5pdGlv bi4KPiAKPiBZZXMsIHdlIGNhbiBhbmQgc2hvdWxkIGRvIHRoYXQuIEFsdGhvdWdoIEkgd291bGQg cHJlZmVyIHRvIHNpbXBseQo+IGRvY3VtZW50IHRoaXMgQVBJIGFzIGRlcHJlY2F0ZWQuIENhcmUg dG8gc2VuZCBhIHBhdGNoPyBJIGFtIHF1aXRlIGJ1c3kKPiB3aXRoIG90aGVyIHN0dWZmLgo+IAo+ ID4gPiBJIGRvIG5vdCB0aGluayB0aGF0IG1ha2luZyBvb20gbm90aWZpZXIgQVBJIG1vcmUgY29t cGxleCBpcyB0aGUgd2F5IHRvCj4gPiA+IGdvLiBDYW4gd2Ugc2ltcGx5IGNoYW5nZSB0aGUgbG9j ayB0byB0cnlfbG9jaz8KPiA+IAo+ID4gVXNpbmcgbXV0ZXhfdHJ5bG9jaygmdmItPmJhbGxvb25f bG9jaykgYWxvbmUgaXMgbm90IHN1ZmZpY2llbnQuIEluc2lkZSB0aGUKPiA+IG11dGV4LCBfX0dG UF9ESVJFQ1RfUkVDTEFJTSAmJiAhX19HRlBfTk9SRVRSWSBhbGxvY2F0aW9uIGF0dGVtcHQgaXMg dXNlZAo+ID4gd2hpY2ggd2lsbCBmYWlsIHRvIG1ha2UgcHJvZ3Jlc3MgZHVlIHRvIG9vbV9sb2Nr IGFscmVhZHkgaGVsZC4gVGhlcmVmb3JlLAo+ID4gdmlydGJhbGxvb25fb29tX25vdGlmeSgpIG5l ZWRzIHRvIGd1YXJhbnRlZSB0aGF0IGFsbCBhbGxvY2F0aW9uIGF0dGVtcHRzIHVzZQo+ID4gR0ZQ X05PV0FJVCB3aGVuIGNhbGxlZCBmcm9tIHZpcnRiYWxsb29uX29vbV9ub3RpZnkoKS4KPiAKPiBP aGgsIEkgbWlzc2VkIHlvdXIgcG9pbnQgYW5kIHRob3VnaHQgdGhlIGRlcGVuZGVuY3kgaXMgaW5k aXJlY3QKCkkgZG8gdGhpbmsgdGhpcyBpcyB0aGUgY2FzZS4gU2VlIGJlbG93LgoKCj4gYW5kIHNv bWUKPiBvdGhlciBjYWxsIHBhdGggaXMgYWxsb2NhdGluZyB3aGlsZSBob2xkaW5nIHRoZSBsb2Nr LiBCdXQgeW91IHNlZW0gdG8gYmUKPiByaWdodCBhbmQKPiBsZWFrX2JhbGxvb24KPiAgIHRlbGxf aG9zdAo+ICAgICB2aXJ0cXVldWVfYWRkX291dGJ1Zgo+ICAgICAgIHZpcnRxdWV1ZV9hZGQKPiAK PiBjYW4gZG8gR0ZQX0tFUk5FTCBhbGxvY2F0aW9uIGFuZCB0aGlzIGlzIGNsZWFybHkgd3Jvbmcu IE5vYm9keSBzaG91bGQKPiB0cnkgdG8gYWxsb2NhdGUgd2hpbGUgd2UgYXJlIGluIHRoZSBPT00g cGF0aC4gTWljaGFlbCwgaXMgdGhlcmUgYW55IHdheQo+IHRvIGRyb3AgdGhpcz8KClllcyAtIGlu IHByYWN0aWNlIGl0IHdvbid0IGV2ZXIgYWxsb2NhdGUgLSB0aGF0IHBhdGggaXMgbmV2ZXIgdGFr ZW4Kd2l0aCBhZGRfb3V0YnVmIC0gaXQgaXMgZm9yIGFkZF9zZ3Mgb25seS4KCklNSE8gdGhlIGlz c3VlIGlzIGJhbGxvb24gaW5mbGF0aW9uIHdoaWNoIG5lZWRzIHRvIGFsbG9jYXRlCm1lbW9yeS4g SXQgZG9lcyBpdCB1bmRlciBhIG11dGV4LCBhbmQgb29tIGhhbmRsZXIgdHJpZXMgdG8gdGFrZSB0 aGUKc2FtZSBtdXRleC4KCgo+IC0tIAo+IE1pY2hhbCBIb2Nrbwo+IFNVU0UgTGFicwpfX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpJbnRlbC1nZnggbWFpbGlu ZyBsaXN0CkludGVsLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5mcmVl ZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 750846B025F for ; Mon, 2 Oct 2017 10:12:00 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id t134so1593276oih.6 for ; Mon, 02 Oct 2017 07:12:00 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id c8si4533837oih.496.2017.10.02.07.11.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Oct 2017 07:11:59 -0700 (PDT) Date: Mon, 2 Oct 2017 17:11:55 +0300 From: "Michael S. Tsirkin" Subject: Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread. Message-ID: <20171002170642-mutt-send-email-mst@kernel.org> References: <20170929065654-mutt-send-email-mst@kernel.org> <201709291344.FID60965.VHtMQFFJFSLOOO@I-love.SAKURA.ne.jp> <201710011444.IBD05725.VJSFHOOMOFtLQF@I-love.SAKURA.ne.jp> <20171002065801-mutt-send-email-mst@kernel.org> <20171002090627.547gkmzvutrsamex@dhcp22.suse.cz> <201710022033.GFE82801.HLOVOFFJtSFQMO@I-love.SAKURA.ne.jp> <20171002115035.7sph6ul6hsszdwa4@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171002115035.7sph6ul6hsszdwa4@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Tetsuo Handa , jasowang@redhat.com, jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com, rodrigo.vivi@intel.com, airlied@linux.ie, paulmck@linux.vnet.ibm.com, josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, virtualization@lists.linux-foundation.org, intel-gfx@lists.freedesktop.org, linux-mm@kvack.org On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote: > On Mon 02-10-17 20:33:52, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > [Hmm, I do not see the original patch which this has been a reply to] > > > > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address, > > and the rest are "Recipient address rejected: Greylisted" or > > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.", > > and after all dropped at the servers. Sad... > > > > > > > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote: > > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote: > > > > > Tetsuo Handa wrote: > > > > > > Michael S. Tsirkin wrote: > > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > > > > > Hello. > > > > > > > > > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > > > > > at leak_balloon(). > > > > > > This is really nasty! And I would argue that this is an abuse of the oom > > > notifier interface from the virtio code. OOM notifiers are an ugly hack > > > on its own but all its users have to be really careful to not depend on > > > any allocation request because that is a straight deadlock situation. > > > > Please describe such warning at > > "int register_oom_notifier(struct notifier_block *nb)" definition. > > Yes, we can and should do that. Although I would prefer to simply > document this API as deprecated. Care to send a patch? I am quite busy > with other stuff. > > > > I do not think that making oom notifier API more complex is the way to > > > go. Can we simply change the lock to try_lock? > > > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used > > which will fail to make progress due to oom_lock already held. Therefore, > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use > > GFP_NOWAIT when called from virtballoon_oom_notify(). > > Ohh, I missed your point and thought the dependency is indirect I do think this is the case. See below. > and some > other call path is allocating while holding the lock. But you seem to be > right and > leak_balloon > tell_host > virtqueue_add_outbuf > virtqueue_add > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should > try to allocate while we are in the OOM path. Michael, is there any way > to drop this? Yes - in practice it won't ever allocate - that path is never taken with add_outbuf - it is for add_sgs only. IMHO the issue is balloon inflation which needs to allocate memory. It does it under a mutex, and oom handler tries to take the same mutex. > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org