From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 335BDC3ABB2 for ; Wed, 28 May 2025 09:27:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C653A6B0082; Wed, 28 May 2025 05:27:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3CC26B008A; Wed, 28 May 2025 05:27:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B53456B0095; Wed, 28 May 2025 05:27:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 95E566B0082 for ; Wed, 28 May 2025 05:27:54 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3A9701D0D3C for ; Wed, 28 May 2025 09:27:54 +0000 (UTC) X-FDA: 83491789668.18.6C304B9 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf28.hostedemail.com (Postfix) with ESMTP id 09419C0002 for ; Wed, 28 May 2025 09:27:51 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=n+W2arJC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=PLgbUBee; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=n+W2arJC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=PLgbUBee; spf=pass (imf28.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748424472; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B4mwwW9r0OHxt1yikCbUfXebtF5UgdxJ9FNgvzHDIfE=; b=3s0r1cpvCwl5k9gv3T4GXG5iGx3bVR7zPrPu0D7PrCdZ1YZoKRGOipL8R9jvF6eDmOmG3B h3UFlDYoLb54ub/m0mOBrW5sRx/xNLf1qpP4UjT00uah99xP1voQ0gwJRny5lW8lC32C11 JfQIqod1L1wEsm+W/CdkWvZh5pHsWnM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=n+W2arJC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=PLgbUBee; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=n+W2arJC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=PLgbUBee; spf=pass (imf28.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748424472; a=rsa-sha256; cv=none; b=ythCBhL/UucdN7OoMytA9Leb8n94TL1Uu3oz0bMvwR0kvRNwXyrWauBGon4dq7cVxsWzV2 gTrh10mAa2ND7aIUQ6TKqkY0XjscArSbfv6Qdb7lhqylf+CikokXpQWCQmHqrcpoV34XSp qobBJI6Yqm2bidbOFHTAbp+tsDbU4VA= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3959221BFC; Wed, 28 May 2025 09:27:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748424470; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=B4mwwW9r0OHxt1yikCbUfXebtF5UgdxJ9FNgvzHDIfE=; b=n+W2arJC9K4LnK8d5fiG8AWGKKHDpg2CEWV2F7309g7HyeUKZS8KPrbC70sqLPwSwr2jZu F94CF4I9wfrIvAUic+im00Q29l49NvTK33VFQksiJuk+WfP27KHDdJ6S4RM/oqIiAOrbdE 9KMGpN56EZlNIg3Quv9Zdg4WW5/vXSk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748424470; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=B4mwwW9r0OHxt1yikCbUfXebtF5UgdxJ9FNgvzHDIfE=; b=PLgbUBeeeE9X8XNwWP958G8cg9vwCEdd90usoWLBmPLRdkns9WWz6AyA7Mdc+WIgALZnUr f75JSjED5VJg2YAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748424470; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=B4mwwW9r0OHxt1yikCbUfXebtF5UgdxJ9FNgvzHDIfE=; b=n+W2arJC9K4LnK8d5fiG8AWGKKHDpg2CEWV2F7309g7HyeUKZS8KPrbC70sqLPwSwr2jZu F94CF4I9wfrIvAUic+im00Q29l49NvTK33VFQksiJuk+WfP27KHDdJ6S4RM/oqIiAOrbdE 9KMGpN56EZlNIg3Quv9Zdg4WW5/vXSk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748424470; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=B4mwwW9r0OHxt1yikCbUfXebtF5UgdxJ9FNgvzHDIfE=; b=PLgbUBeeeE9X8XNwWP958G8cg9vwCEdd90usoWLBmPLRdkns9WWz6AyA7Mdc+WIgALZnUr f75JSjED5VJg2YAg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8073C136E3; Wed, 28 May 2025 09:27:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id BQmlHBXXNmgyawAAD6G6ig (envelope-from ); Wed, 28 May 2025 09:27:49 +0000 Date: Wed, 28 May 2025 11:27:46 +0200 From: Oscar Salvador To: Gavin Guo Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan , David Hildenbrand , Peter Xu Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table Message-ID: References: <20250528023326.3499204-1-gavinguo@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250528023326.3499204-1-gavinguo@igalia.com> X-Stat-Signature: snu8md7h951pxyi1iwbpaoct9mdiotxj X-Rspamd-Queue-Id: 09419C0002 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748424471-728747 X-HE-Meta: U2FsdGVkX1+rpMxsP0mz2S6wKnS9oyq1z5TgtFK7XfHtBN8xUil0tSF0Fax4kyaveETZRqQUGtuQd9f5SZof8UEwOnx1WykVCd5kjHqsti5MUEbRWJElSfD1257kMgJjISRMeF+koWHZOXLiP5GbcgHPuEC0+bd3ZPT3Fn1yXnhD1eZNU4OxHMyC0/sg/XMoa5r5Pq65KyCr31yosmnZh7YApEapJhg+cDIGzz/0sm1s47nV2LCI3R0bDJmQ0vNZpjXOPgWVhQNEt2sbRfe1Fw1AsWSYeTQpRG5DIF/lOESuhT85Q4Sk9QsGxwi57pVwf925/kFZhnn2rleO9h/mbBjdoU6aDoJ+g+wdystah36brfI6+RI9LSizmpcK/AtJRiVOrKDisSMsCawlO10eJG55e+sXBKlYbtCHMBAwUbOOiAzjXzzPblIj01G8GOxt1nem0Rsf49BG88xWYdrZtZ1vHT0QF0iyluG9xeVGg+3fvW2ocHL3Bt4TulYtLMS8zidkYVKoJ4w9KnSSKHwpBw+15QRTTXVD5rd0CMvwqWyBE2+OF0e4GgQQPSXVxUwqNWQHAZTt3GoWMK0TGFcX/eC744lRyOyBWVxQoNJpZeb5NbFVp7FkTRCwDoS3vPfhnUPtIUtsSEFxBZcLnqwif82Es0tpQjP55QROeGiTKs1RyU4Hc7Q+00xmifwaDxKeKXROX2ZLbRUbYZCV93eYgeehpYO28euXW+Eu/q6hzTCn/9PzdCU4xSGxY3Rq5pXFAeEIJ4SXXRnXtzL7U5NQzKLC6cGLTlp/Upw9/bXqL5qzIC9V+3xmHbVGyPHqTEmh5vjKzJCBOLaJBBlFZGMFrXowAVTWv1uSnMbYJZFztl8bBCzMhKxSI/2xtZg5sV/qWPa77pxRS6epcMoGJCA0U9xS4m6ujdLcLtx1uLTv1o/PW86G6e/7IC6qTKpBj4x3tTLysb7fvjX09t+UjZU Qcp1j4CU NqLLsnWqKIqbfBbtFyBIcC4MBF7wGu8z74MkBSOKCFKYVFcTaUSnB/ZGgIUFYoozQK5rfAWslnJ5YKcjueVu2/Kjif5veJk4Y2KzmplR/piU3vWbE36RLJH7C4QfJ7QW6QmyByo1Qusse55E5FA087d0HEHbryW68S19haA0P6C7NqaorVAyT3KAu7uqF7mlq+a0qqIFNWhy12iqHUz8dTg2RLuoIwcnMvPs1W24NFgz1qxYjq30ynLn4B1M3Jmj6FRf9qofuJXXMMS7XI+/+alZwg/xoP0dJsDMYV64Y39SLsUAiL3QT/XZu1qRFNdwnMCUQ13POKBjKBCZ/LOOQP0BHzrCbXvmgof+a0iXs14ii4Po3jo5CU8cJFCJ+iXDZ4VDPryYnhBbkMjwotJQGZDkLsisPsZc2VPNIla8r6mBJUDBHb5Yj+Y+R+7s/CzQHK5h6/u0Kop8aaD91reU9Ba5Eeg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 10:33:26AM +0800, Gavin Guo wrote: > There is ABBA dead locking scenario happening between hugetlb_fault() > and hugetlb_wp() on the pagecache folio's lock and hugetlb global mutex, > which is reproducible with syzkaller [1]. As below stack traces reveal, > process-1 tries to take the hugetlb global mutex (A3), but with the > pagecache folio's lock hold. Process-2 took the hugetlb global mutex but > tries to take the pagecache folio's lock. > > Process-1 Process-2 > ========= ========= > hugetlb_fault > mutex_lock (A1) > filemap_lock_hugetlb_folio (B1) > hugetlb_wp > alloc_hugetlb_folio #error > mutex_unlock (A2) > hugetlb_fault > mutex_lock (A4) > filemap_lock_hugetlb_folio (B4) > unmap_ref_private > mutex_lock (A3) > > Fix it by releasing the pagecache folio's lock at (A2) of process-1 so > that pagecache folio's lock is available to process-2 at (B4), to avoid > the deadlock. In process-1, a new variable is added to track if the > pagecache folio's lock has been released by its child function > hugetlb_wp() to avoid double releases on the lock in hugetlb_fault(). > The similar changes are applied to hugetlb_no_page(). > > Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=drive_link [1] > Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization") > Cc: > Cc: Hugh Dickins > Cc: Florent Revest > Reviewed-by: Gavin Shan > Signed-off-by: Gavin Guo ... > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6a3cf7935c14..560b9b35262a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6137,7 +6137,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, > * Keep the pte_same checks anyway to make transition from the mutex easier. > */ > static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > - struct vm_fault *vmf) > + struct vm_fault *vmf, > + bool *pagecache_folio_locked) > { > struct vm_area_struct *vma = vmf->vma; > struct mm_struct *mm = vma->vm_mm; > @@ -6234,6 +6235,18 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > u32 hash; > > folio_put(old_folio); > + /* > + * The pagecache_folio has to be unlocked to avoid > + * deadlock and we won't re-lock it in hugetlb_wp(). The > + * pagecache_folio could be truncated after being > + * unlocked. So its state should not be reliable > + * subsequently. > + */ > + if (pagecache_folio) { > + folio_unlock(pagecache_folio); > + if (pagecache_folio_locked) > + *pagecache_folio_locked = false; > + } I am having a problem with this patch as I think it keeps carrying on an assumption that it is not true. I was discussing this matter yesterday with Peter Xu (CCed now), who has also some experience in this field. Exactly against what pagecache_folio's lock protects us when pagecache_folio != old_folio? There are two cases here: 1) pagecache_folio = old_folio (original page in the pagecache) 2) pagecache_folio != old_folio (original page has already been mapped privately and CoWed, old_folio contains the new folio) For case 1), we need to hold the lock because we are copying old_folio to the new one in hugetlb_wp(). That is clear. But for case 2), unless I am missing something, we do not really need the pagecache_folio's lock at all, do we? (only old_folio's one) The only reason pagecache_folio gets looked up in the pagecache is to check whether the current task has mapped and faulted in the file privately, which means that a reservation has been consumed (a new folio was allocated). That is what the whole dance about "old_folio != pagecache_folio && HPAGE_RESV_OWNER" in hugetlb_wp() is about. And the original mapping cannot really go away either from under us, as remove_inode_hugepages() needs to take the mutex in order to evict it, which would be the only reason counters like resv_huge_pages (adjusted in remove_inode_hugepages()->hugetlb_unreserve_pages()) would interfere with alloc_hugetlb_folio() from hugetlb_wp(). So, again, unless I am missing something there is no need for the pagecache_folio lock when pagecache_folio != old_folio, let alone the need to hold it throughout hugetlb_wp(). I think we could just look up the cache, and unlock it right away. So, the current situation (previous to this patch) is already misleading for case 2). And comments like: /* * The pagecache_folio has to be unlocked to avoid * deadlock and we won't re-lock it in hugetlb_wp(). The * pagecache_folio could be truncated after being * unlocked. So its state should not be reliable * subsequently. */ Keep carrying on the assumption that we need the lock. Now, if the above is true, I would much rather see this reworked (I have some ideas I discussed with Peter yesterday), than keep it as is. Let me also CC David who tends to have a good overview in this. -- Oscar Salvador SUSE Labs