From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932402AbdJ3Jqu (ORCPT ); Mon, 30 Oct 2017 05:46:50 -0400 Received: from mga09.intel.com ([134.134.136.24]:55859 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932294AbdJ3Jqt (ORCPT ); Mon, 30 Oct 2017 05:46:49 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,319,1505804400"; d="scan'208";a="329517368" Message-ID: <59F6F58A.2030000@intel.com> Date: Mon, 30 Oct 2017 17:48:58 +0800 From: Wei Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Michael S. Tsirkin" , linux-kernel@vger.kernel.org CC: Tetsuo Handa , Michal Hocko , Jason Wang , virtualization@lists.linux-foundation.org, linux-mm@kvack.org Subject: Re: [PATCH] virtio_balloon: fix deadlock on OOM References: <1507900754-32239-1-git-send-email-mst@redhat.com> In-Reply-To: <1507900754-32239-1-git-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/13/2017 09:21 PM, Michael S. Tsirkin wrote: > fill_balloon doing memory allocations under balloon_lock > can cause a deadlock when leak_balloon is called from > virtballoon_oom_notify and tries to take same lock. > > To fix, split page allocation and enqueue and do allocations outside the lock. > > Here's a detailed analysis of the deadlock by Tetsuo Handa: > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > serialize against fill_balloon(). But in fill_balloon(), > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > is specified, this allocation attempt might indirectly depend on somebody > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if > leak_balloon() is called from out_of_memory(). > > Thread1 Thread2 > fill_balloon() > takes a balloon_lock > balloon_page_enqueue() > alloc_page(GFP_HIGHUSER_MOVABLE) > direct reclaim (__GFP_FS context) takes a fs lock > waits for that fs lock alloc_page(GFP_NOFS) > __alloc_pages_may_oom() > takes the oom_lock > out_of_memory() > blocking_notifier_call_chain() > leak_balloon() > tries to take that balloon_lock and deadlocks > > Reported-by: Tetsuo Handa > Cc: Michal Hocko > Cc: Wei Wang > --- The "virtio-balloon enhancement" series has a dependency on this patch. Could you send out a new version soon? Or I can include it in the series if you want. Best, Wei