From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66E27C54E64 for ; Mon, 25 Mar 2024 03:22:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 002216B0088; Sun, 24 Mar 2024 23:22:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF33B6B0089; Sun, 24 Mar 2024 23:22:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE35D6B008A; Sun, 24 Mar 2024 23:22:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CF83D6B0088 for ; Sun, 24 Mar 2024 23:22:39 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9EC8E1A03FF for ; Mon, 25 Mar 2024 03:22:39 +0000 (UTC) X-FDA: 81934114038.20.4EBD358 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf08.hostedemail.com (Postfix) with ESMTP id 372DD160013 for ; Mon, 25 Mar 2024 03:22:36 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=TOkzjnL6; dmarc=none; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711336958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iqc52fNCJobqUzERbW40WnlDTkGcB7WwGc1zVlyZKvQ=; b=MVK2FF2henM0fiQ9NyYs4WUYmfwOnplVo5x22E6rbShqMLABZG4RKGej/C09bkFOw5vQZZ kSl4gPsF+4ZFOuROT3ndZpH5Mo7Qi7i5FO4Kqen4M7qaCJGBzeWSZXJ2K+bo5PHdqH88fm rmZpUzw+PM3ItO7sCbO4acHn/zN7uWE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=TOkzjnL6; dmarc=none; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711336958; a=rsa-sha256; cv=none; b=Bif9wGyCS2LkY4rCcE9L/oL11/vUn+R2FFM/Y1qpSCQmIrXzkuqPMrV5s1LsiZMuPo/ZJE BcWMpkiHP2A1KNE1xssADL+ebSsAZotdK/deQ4eyRTpC7NTBkTcrzReXogis+vVKLTft4l srJPWTT6g31biLV3Ec/Gfrsk6ao4RZQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=iqc52fNCJobqUzERbW40WnlDTkGcB7WwGc1zVlyZKvQ=; b=TOkzjnL6BUv7T92lJO5XntAlUY KsDq4bgdcc08zQCTrImHMntACEY4Wg1uzkw4ZyBude71X+tFe9pNWd5tIewIriDl/MPkVDG9FQ8KF kEwc42NZDXGkM+Yizn5h6IZ87AIt61Y9lNeSJ7IvwF83GoGrNK4UL10MSISSo0hl2k9M6U3WcMaKD pi+cF81QoD63cqjDpj5x3KC5YVi343PdsYXlSlbObk1SSnOIlM21k4OzDtCrmTxxLS1+1jlZZa0CR 3u4JcxK8uA5e6qPgVIA39JEAVlsHNHgdSn6PGQENQRSi5b4hlC289YznYRMQcjWtueOjqB8f5VnRy SHA6Dccg==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1roaut-0000000FYNr-0Ljl; Mon, 25 Mar 2024 03:22:31 +0000 Date: Mon, 25 Mar 2024 03:22:30 +0000 From: Matthew Wilcox To: Zhaoyang Huang Cc: =?utf-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?utf-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Subject: Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: wmi4oednt9jbhbekhogfkk9gnsdbuiq7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 372DD160013 X-HE-Tag: 1711336956-379180 X-HE-Meta: U2FsdGVkX19cHcdyFUBUtjZafVlD2qLoVftCAHhXukmdyHaNVNZ80vQXIJ52xPy9P3BvnWXG7JckvpYGlwUMaMR+OCEeKHMqh79POM0jAFje0MNvO5HpDZqjOiIZ8ZHElox0/iKDXPyKy2HY1QbLHGFWFw18yyhGTdQRphAH8knJYT0NVitcXUUrN/dbSfL9t2Xen5LMJKMvWzayDNcaqVQfDfJo+7OPCb6O9PQ01miRcP0sw9sWvC5Pqnb5OCNXh1cLztTXikcP2acRi0vf8ZrHMZma0UyYYP3rqq3z6e5KhxfZYJ3ZNOClGHNuGjN3H3mpup3YoKv+kZ3KMtVbB9iK+6RLcbBe0qHT4EogJrH2MBNARcoBvw5scLLFyAU7jjg5LCNzhL5BcYwqjJPgjucIgNA0l13A6UprdjwVFa3sDr1P0WvBiheE4a009KFswJ94BMRPQef1ePBFo0W/0czyw8mmcZF/fbEeqYDrvuAcewbZRV0bGkYLk/K/2xXmgSgi1TGJkpiCw+opUg+rI6pkCajHm/HzC0sO8czbyK9jgMxU9ngy2VepLLlPwRDHitM00jSVLKgJlUE05LgV+cwV+NUtNB8qg4bScem2bBfFcuPsMGnh7k3fJZfOxZhscRZ2h60lDNn0a+DEYe/08qU7fd0IQ/7IEtHCwashlOS7i9mHFzwlo0pTgZKR4KoVOjcuIg5AgF0meUCgFuqxVPbijMiMUj+/5Lwa+9oclEFbgUWIuzaznqBm17++bTwm3t6lJgmBmBHLtqr3QbFEeFI3FC3QXvjnv3Xb45q+HPu3vc2VinW7V2q387qNhzrZEAh2GjlCaV4fRzPneYP2mr2hxpyGlloU6/e0eeJSq3Ckv1QtD2Tn2Q3E5BHH5PsxmH7MMjYud4rl24sY5JjEWq9HehHwBhLMGGSuN+CQUas+u65RRsBio+o/7tONglQupp99Jy3NSRgcsx5tNw4 w4XTpQZZ Zho5ZWEtXeAKeRH3klfS+nJQJvCzcZiMKI3BbbCQk+k3HHYzweGYwxSwToX16j9EIwaqhqLpkfEv2rL5UYN92Fzqqmx4d32bf5+pKPUYV71jFiLnRmWXi6/QQrSowoeEHvwFEay5+83ILSkViqF9ih8RFyZokNYpG8BLRvvfifqR2YkrLQJRX13GsxDI/7+6NTUVLZy3sfnJ3RpqISBFnUD/lx5WCtaLXwDFC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Mar 24, 2024 at 07:14:27PM +0800, Zhaoyang Huang wrote: > ok. It seems like madvise is robust enough to leave no BUGs. I presume > another two scenarios which call folio_isloate_lru by any other ways > but PTE. Besides, scenario 2 reminds me of a previous bug reported by > me as find_get_entry entered in a livelock where the folio's refcnt == > 0 but remains at xarray which causes the reset->retry loops forever. I > would like to reply in that thread for more details. > > Scenario 1: > 0. Thread_bad gets the folio by find_get_entry and preempted before > folio_lock (could be the second round scan of > truncate_inode_pages_range) > refcnt == 2(page_cache, fbatch_bad), PG_lru == true, PG_lock == false > folio = find_get_entry > folio_try_get_rcu > > folio_try_lock > > 1. Thread_truncate get the folio via > truncate_inode_pages_range->find_lock_entries > refcnt == 3(page_cache, fbatch_bad, fbatch_truncate), PG_lru == > true, PG_lock == true Hang on, you can't have two threads in truncate_inode_pages_range() at the same time. I appreciate that we don't have any documentation of that, but if it were possible, we'd see other crashes. Removing the folio from the page cache sets folio->mapping to NULL. And __filemap_remove_folio() uses folio->mapping in filemap_unaccount_folio() and page_cache_delete(), so we'd get NULL pointer dereferences. I see a hint in the DAX code that it's an fs-dependent lock: /* * This gets called from truncate / punch_hole path. As such, the caller * must hold locks protecting against concurrent modifications of the * page cache (usually fs-private i_mmap_sem for writing). Since the * caller has seen a DAX entry for this index, we better find it * at that index as well... */ so maybe that's why there's no lockdep_assert() in truncate_inode_pages_range(), but there should be a comment. > Scenario 2: > 0. Thread_bad gets the folio by find_get_entry and preempted before > folio_lock (could be the second round scan of > truncate_inode_pages_range) > refcnt == 2(page_cache, fbatch_bad), PG_lru == true, PG_lock == false > folio = find_get_entry > folio_try_get_rcu > > folio_try_lock > > 1. Thread_readahead remove the folio from page cache and drop one > refcnt by filemap_remove_folio(get rid of the folios which failed to > launch IO during readahead) > refcnt == 1(fbatch_bad), PG_lru == true, PG_lock == true So readaahead inserts the folio locked, and then calls filemap_remove_folio() without having unlocked it. filemap_remove_folio() sets folio->mapping to NULL in page_cache_delete(). When "Thread_bad" wakes up, it gets the folio lock, calls truncate_inode_folio() and sees that folio->mapping != mapping, so it doesn't call filemap_remove_folio(). > 4. Thread_bad schedule back from step 0 and clear one refcnt wrongly > when doing truncate_inode_folio->filemap_remove_folio as it take this > refcnt as the page cache one > refcnt == 1'(thread_isolate), PG_lru == false, PG_lock == false > find_get_entries > folio = find_get_entry > folio_try_get_rcu > folio_lock > mapping != mapping as folio_lock_entries does> > truncate_inode_folio > filemap_remove_folio >