From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7557AC54E60 for ; Sun, 17 Mar 2024 20:46:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 613846B0082; Sun, 17 Mar 2024 16:46:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C3726B0083; Sun, 17 Mar 2024 16:46:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48B926B0085; Sun, 17 Mar 2024 16:46:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3553D6B0082 for ; Sun, 17 Mar 2024 16:46:47 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A93A0120312 for ; Sun, 17 Mar 2024 20:46:46 +0000 (UTC) X-FDA: 81907714812.26.C4C4BC5 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf05.hostedemail.com (Postfix) with ESMTP id 8B88D10001A for ; Sun, 17 Mar 2024 20:46:44 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=b92MQ+KJ; spf=none (imf05.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710708405; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v6PWbj/mXZq7FDrSc1NJ3BLmb2CI422t7EcGhIN71c0=; b=7B0C8bb2V1XVSrRbC0jKdKIigqqqV0k2vfFi6xEPGP6NTGNfo+D3zOy1ZsIMlpI0MOhuWO 8KxLDLL9AVRrHjDCDrkrCZulcfYdlOjbLU3NXpIiRy2ooCkUhJSaz6cuE7Y+nzaB4JCn7S XeglHwsncObwRpCCI3sQy/i3i9EnyH8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710708405; a=rsa-sha256; cv=none; b=MfmaugWKp4h1ATMYOGh8JfS7Ez+DTDzk06GbrVt5v0/eNLW9sWvWI01whuMoeXHOCyai8u EGJBl8LaljHVr2r/z5U1fmb/T5a56YERT9j6Ry8xp2NnxCQ9s4GgvSXcpgdMrgOD7jUs0G wKBm/Kiu/J3xjf5m+4Jv1PDxnzq3IMw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=b92MQ+KJ; spf=none (imf05.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=v6PWbj/mXZq7FDrSc1NJ3BLmb2CI422t7EcGhIN71c0=; b=b92MQ+KJGg4tspLYmBItbm4l83 IkbCPvs2rjvKRjIPRJwzYv1dmMvG7WpuxZ/EEl3BaJYW9DUo1U7ZVIunZHp+7f/J16AhmNhl/VYli E7GYw/x3YBvKCcDtb6EFui+Tc6rgX3kI2+fNWRUNLWTiwmxG9ZrmoJ5tNgjzGbn09zr/nVTRuSDe9 MgA4ci1rda8vUT3xsrQUJ/2mxgxOKsRGknRsch8ER4w3EDmU/2G8k1WUB6cQXOy9MZmLjTa1RtnFk TQgLY+pw7DChRU0ziEFKl9IjBaUPtd2iE8UMq6h+zlG7LJ8yHaj5jrLESfL+kUbncOwxW6AhoQdXS 10xMDw+w==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rlxOt-0000000FZ76-21n6; Sun, 17 Mar 2024 20:46:35 +0000 Date: Sun, 17 Mar 2024 20:46:35 +0000 From: Matthew Wilcox To: Zhaoyang Huang Cc: "zhaoyang.huang" , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, steve.kang@unisoc.com Subject: Re: [PATCH] mm: fix a race scenario in folio_isolate_lru Message-ID: References: <20240314083921.1146937-1-zhaoyang.huang@unisoc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 8B88D10001A X-Rspam-User: X-Stat-Signature: 8srgb9wyb4afhbeyn89yewazarwo7raz X-Rspamd-Server: rspam03 X-HE-Tag: 1710708404-41183 X-HE-Meta: U2FsdGVkX18lYtBwIZ80/DlNgLSKAPzlNkPssid6JzUWZy9MMS3y7EfTi60s2wAq51g1RQ1P+v7mU4QZB6sSaaHhxjxM0UPCCPM0CqZ2phUtr1uJKH8Id8S8MwvYVSsJfloJgRdlzFpwdujozQjmHtp6UTqMPwdyezd9Q1TgWjuMSlhYhN+32j8AM2+dBV4GfyDu7+mMOL9GOh6RxeVAtHrw1fc1DmSK3C6mLxLurTCOfx3DxgheytY4H0svhETlDtOFPAInaickE6jXNTzLULRDqAEG9IEAdVD0dJ8cIC9bijsakzpu7bnjIqple8vFNAeIyfYS0HKMmb/+WhcqQKDCyJpZFb99Psin5w57jEmWun1mOihBJTAqFQBiKcMmnkGzrc08Kdz8ysI3jew8QSdX41saSMvy0uBKfbY/mkInqRHyCOq3Of63OQgfDHLVXgSMkrxY78Z9dPB5G9CfW6U1fPiNf0/JXTQ/NOqJBXZ1mGg1lRUWx8ZiM5XufHHjF2e3TmbsJGnTr6LD89SSRMGH+oeKAU6QJRwaMUqIJT058L/3M4T3glXKGqjxFPWMKhJJAwPCxISYwkhHOJcKts+MsYMBphTV4O0nig1Hc3TQOUG/7tMm2e7wl9akS+plkimu/1t87FEt55bhmLUsZDIpCLTpP99xYyqiaevLJWB/6VBnH9bU6SnN5VxxclTWzzYRz5qco/0fPFSdWvNhATaoQDPkEDDhn73mD4WQYAXpBl8Ry5y5KLmG3T4871KaBrxmumtMWkSflco6+Atc2fsCOB7PwJ26vbJ2pK++KkOMDPIb3tNhDNrxizOjcBhmH+jWTybk0XjFaOkHC0YUW48jzwFrL4Zgdc024LFr6MkbzWVl8BJmzWP/TNNjiD5E X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Mar 17, 2024 at 12:07:40PM +0800, Zhaoyang Huang wrote: > Could it be this scenario, where folio comes from pte(thread 0), local > fbatch(thread 1) and page cache(thread 2) concurrently and proceed > intermixed without lock's protection? Actually, IMO, thread 1 also > could see the folio with refcnt==1 since it doesn't care if the page > is on the page cache or not. > > madivise_cold_and_pageout does no explicit folio_get thing since the > folio comes from pte which implies it has one refcnt from pagecache Mmm, no. It's implicit, but madvise_cold_or_pageout_pte_range() does guarantee that the folio has at least one refcount. Since we get the folio from vm_normal_folio(vma, addr, ptent); we know that there is at least one mapcount on the folio. refcount is always >= mapcount. Since we hold pte_offset_map_lock(), we know that mapcount (and therefore refcount) cannot be decremented until we call pte_unmap_unlock(), which we don't do until we have called folio_isolate_lru(). Good try though, took me a few minutes of looking at it to convince myself that it was safe. Something to bear in mind is that if the race you outline is real, failing to hold a refcount on the folio leaves the caller susceptible to the VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio); if the other thread calls folio_put(). I can't understand any of the scenarios you outline below. Please try again without relying on indentation. > #thread 0(madivise_cold_and_pageout) #1 > (lru_add_drain->fbatch_release_pages) > #2(read_pages->filemap_remove_folios) > refcnt == 1(represent page cache) > > refcnt==2(another one represent LRU) > folio comes from page cache > folio_isolate_lru > release_pages > filemap_free_folio > > > refcnt==1(decrease the one of page cache) > > folio_put_testzero == true > > > > list_add(folio->lru, pages_to_free) //current folio will break LRU's > integrity since it has not been deleted > > In case of gmail's wrap, split above chart to two parts > > #thread 0(madivise_cold_and_pageout) #1 > (lru_add_drain->fbatch_release_pages) > refcnt == 1(represent page cache) > > refcnt==2(another one represent LRU) > folio_isolate_lru release_pages > > folio_put_testzero == true > > > > list_add(folio->lru, pages_to_free) > > //current folio will break LRU's integrity since it has not been > deleted > > #1 (lru_add_drain->fbatch_release_pages) > #2(read_pages->filemap_remove_folios) > refcnt==2(another one represent LRU) > folio comes from page cache > release_pages > filemap_free_folio > > refcnt==1(decrease the one of page cache) > folio_put_testzero == true > > list_add(folio->lru, pages_to_free) > //current folio will break LRU's integrity since it has not been deleted > > > > > #0 folio_isolate_lru #1 release_pages > > > BUG_ON(!folio_refcnt) > > > if (folio_put_testzero()) > > > folio_get(folio) > > > if (folio_test_clear_lru())