From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D680C7EE31 for ; Fri, 27 Jun 2025 06:23:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D21FC8D0014; Fri, 27 Jun 2025 02:23:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD2768D0013; Fri, 27 Jun 2025 02:23:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC1208D0014; Fri, 27 Jun 2025 02:23:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A87468D0013 for ; Fri, 27 Jun 2025 02:23:48 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3992B1D720C for ; Fri, 27 Jun 2025 06:23:48 +0000 (UTC) X-FDA: 83600189736.13.BF0A65B Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf15.hostedemail.com (Postfix) with ESMTP id 53E01A0006 for ; Fri, 27 Jun 2025 06:23:46 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kjdIanWz; spf=pass (imf15.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751005426; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=mtGJ3hLLX6kLkhpS+Roj43p3ubAD6EYXh2l2qlYKI+A=; b=hSoMmMhWJJ7xzJEouBY42X3+5xzTZTh05Ft845oBq2scbfj5BdBfXx7nAh5eop/oMZTL+G jFJRUzWqqTObeU+gaCxqGk428ZZeCxjq6HDhkT40wtZcEimZc96sg1HfPxCLRZEMePDKgW h6cByvlfKKGwnNL0vrircc9h8M0DnAA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751005426; a=rsa-sha256; cv=none; b=nE35O39F9XiJqXWHWJ35rhOX28duEzBVDjhjtxoWde6fQ8cbwDDyBl7uG7k7rpBHOoKs1P stB8Y4ysavbaCO+020guPhvzKaVacoP4hF0ZKJBYN9OfJVufTCb82TaM3qWfCeRMP/Hfcp qnR0JoptL4O9DD+k5I3RYA38qudQ7qo= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kjdIanWz; spf=pass (imf15.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-453634d8609so12759745e9.3 for ; Thu, 26 Jun 2025 23:23:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005425; x=1751610225; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=mtGJ3hLLX6kLkhpS+Roj43p3ubAD6EYXh2l2qlYKI+A=; b=kjdIanWzVkiavQkRL+YF1mbnIsusjV+wcgScsbr8lMUW5/m5D7f1GUzeObI1IwC2md G9/jECmSbZNLjg+bjFcYgGS2UEdsaYZ8bIwp2aQ1YHqGz03rP0ISBWSgUBOX9nR/mMok jK4WP2CHZRtN9maSKeulFKM+3SGoixTMXuHYPrOEk6hHIfv4iayKXl1PYbo4t3ezpQF3 B9BkLa8JvJlKhtj4tGoHGd6NwWPBIhzYvnQByNpR6DL2tx2IJZyJOtMkRZJPiNYoOGNC yozZcn1plJkQ2xRaKambAdvuU1/aGgSyTw/1EmUGNCW7qvW8r+gZ1z6m5UMbc+7c1cA1 tXpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005425; x=1751610225; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mtGJ3hLLX6kLkhpS+Roj43p3ubAD6EYXh2l2qlYKI+A=; b=gntYxk6ddBtThZMzDNMLU8v0/Kg0FkUUodCcWfe3q4yTu1sbtPdMs9D8id7c2wC1RA x9QsjGdCBU0MDyAYzPULbxfOJxz/Ve1cqWJGzqVVxI6wuPTmSNkHXjJmlGhCOm1zULY/ aRhLBb4cQi7hQjpB5J7vSZYkIpzCXqqgCEQeyZRBqykfiPWR9kowEHy/Oto06lBtxaxt T80fAht3vf1myRhS6SqvMPsI9Ug/0B9wIXfgBf6irhvqH/z/VfAyq7fu9CtOAPRChDvg zJ8krGenl/5vfOTHBY1omCC04WerD0EPHAOKbo1/08dPamm8mpNtqHr49ejDcOJ0dQ4F 0gbA== X-Forwarded-Encrypted: i=1; AJvYcCVuRWan5neezbcJtboleAn2A+/YRV0pWO6qVFLE1kFBo3hymGfujVpJQoJJzeTEKsnCc12vShNxhQ==@kvack.org X-Gm-Message-State: AOJu0YxvkyuiIusy6z4/VIeI90fF3qg/wLgwcHFJO7eeLVba5x3QSlAS h8EP5YGbf9hHhCJauH1bDaVTUtGooU4HhffI4SlEpxuGXn2AQI2szLpk X-Gm-Gg: ASbGncsG9I4QxfAryaBy5byoNILa2Azn3dpTYCMAKq8fqQe/rdSQH5NEnz59mN5MjkQ 1kLVYbzirUdE9dX/7Qreo98yYi5YOPaJ25nLpQBfBUAwI5Nv0YgDymOzYeaA0Eia4qKx/CFJVfo 0XPdDA3+rrcd5O3uYlyUlmeJs5WRMRTgzY+ZWEoZ4BB5W543pTXfeoyySKteZlOqOg0Tlqrg0px uhQBEcp3S4qbaUXqLA7T6lfWubTvtb2gWBBjulzBLRYKkyQR27dYSD8HYVSq7l3tuvuh3aJRDqi smtGhJjFECyPzUe4YG3dwq0ouTJDGNNQ7shS7B8EY4sehf4L X-Google-Smtp-Source: AGHT+IEoFmH9LFHUYF3P7bOEbto60ZCDPCXGpSefiJHXbVDjvZl7uEcOX8LVwFH/mVmusY3MGMlK3g== X-Received: by 2002:a05:600c:c4ac:b0:442:d9fc:7de with SMTP id 5b1f17b1804b1-4538ee85615mr17039465e9.22.1751005424277; Thu, 26 Jun 2025 23:23:44 -0700 (PDT) Received: from EBJ9932692.tcent.cn ([2a09:0:1:2::302c]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a892e5f8b6sm1792296f8f.91.2025.06.26.23.23.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:23:43 -0700 (PDT) From: Lance Yang X-Google-Original-From: Lance Yang To: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, huang.ying.caritas@gmail.com, zhengtangquan@oppo.com, riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com, mingzhe.yang@ly.com, stable@vger.kernel.org, Barry Song , Lance Yang Subject: [PATCH v2 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Date: Fri, 27 Jun 2025 14:23:19 +0800 Message-ID: <20250627062319.84936-1-lance.yang@linux.dev> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 53E01A0006 X-Stat-Signature: 8pj3fmai8wxztkpbxbtrbceb6rmoih68 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1751005426-653235 X-HE-Meta: U2FsdGVkX18km8BvcXX1tpQy//OeQNbRJGq5kPcOUuijedFGUCZksV5fymegfWngT0dqNincPPm61zRpAJdcmlFYUTwJM4jBAvxyop/+DVRoI+QlQZyQBSsbSlu2eDnn0Da7yuxOfljSHGIP2xXqrHunZl/4SpoSx+NrbXNQmT8DFm4AJ3c5siQmV73U9HQ4A4ByEY+jToyguJS4Wux0LiQu8sI4e/N/1Q2ETsrPvf8N0b1/Iv5ZiEO/ZlcjCy5P9zcPK2Q4Joj4GrQahjznIju8zDmWCEG9sCC0NAbNllyobQqFL7dwMiw+c/t58ONxgEA/lLYKCtkFBlABU4+uBj2m0ChU0GSOUdixfdIxl5HAVVykRn7tPeRsTsJKMTGkAZdb9PbXs6tmEazPUf/06QG0jkp/l9gRDkGPk/AH4aT4Lmkmu6FSXT3SxOP/jhsgboph/NhF7z6oGfhmIVhl/WoXPfwz6LsSLEl0jmN/XKgZy8Oevz/HD5faK/TrX5m0GWDgp/Pw3rshWpZ7RCn4c03e0mgDEGNGNh38klZNyBeatjmQIDKCGM12hEZ6De0rA65XyrjXAh3HNIwmKwCJXm2x+85maqQmL/EQp/q+N/4uWYyyEP1i1YyGb1BsCgWOOjHoE3nGkP9nUswL+2mkzrTUlT73Yg/ixWUECiQLN1BY7CWurtpGcJJVx7D5UvJ7EtyA8bjFZY208sdhqxRqDnSuqh2IPVP3DQTidJ4ujeCncf8PvKMgsMiWTvpBlPI63J+YcRpVAYKPL2B5181QPGoY3re437ihs4tha+0lAdethBE0PusFO7ztERyDbtcEdPUkIEMmR3W4q5XlAPzqQ1mhN/BgwOySi84qtpixAroKnpSsQNLNLN1wWiqCpZKfuDBiPKmZyEzC8fG3awfbmg6wwlg5DslCTGELa38TtQTd1K2SSVg8xr1hV8whzc5sqjkFEyyWYYGNfw5pbZ3 BYrc5nRY 14FipJyOCUKQRoVPvo8talqeNlPC/i7opzCRX+GbM7L6EOIFJQ9uQqOAlxYw+aoD+ec/HzCRk1R29oDsK5K+j+lEasJ0sXv6mwIja/ZLk0eJLExxX9DHVL46SSc1RaEAbgZyhxiKw31+GIa3tIiliFLWfecq4MK4hWF0DNNG9LkhjAZB9yWXQ5KPMiZvM/AUzB/58FBlItpLyMT7cnRDi6DmYPZsVHAAc6TwaBW6IjOip4ztFjESiQvWVlhsqGzk+AL8mPyCPYnd4SYx72tCoJKFZpz3lnbmQcwC0EM1uaDoN93AWRGpJir7dwNISG2IfE5GkhmT+iG6fvFEhCf0h8liR3UOjVqk21OB3rpCcHI75jQYMSgOvbcudcnXTepXJSirthKQoHP1SMaOr3WVmEmNehu6gqSblg/aADORgEQ+aBVTFSwwXjATblMntbVxjji9dgPg3fYBfEm3xaROra0fqBu1Cl35KpWtONd4i1StpqPHl3XXGCws1CzyDL29zwDZ1xU9k8DwRFT+I+ypW+6LVa35xXPVyVqmbejjhW3FNIcLStYSxSo+XGXoaAJb7Y26bp3RLkouYCYI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Lance Yang As pointed out by David[1], the batched unmap logic in try_to_unmap_one() can read past the end of a PTE table if a large folio is mapped starting at the last entry of that table. It would be quite rare in practice, as MADV_FREE typically splits the large folio ;) So let's fix the potential out-of-bounds read by refactoring the logic into a new helper, folio_unmap_pte_batch(). The new helper now correctly calculates the safe number of pages to scan by limiting the operation to the boundaries of the current VMA and the PTE table. In addition, the "all-or-nothing" batching restriction is removed to support partial batches. The reference counting is also cleaned up to use folio_put_refs(). [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") Cc: Suggested-by: David Hildenbrand Suggested-by: Barry Song Signed-off-by: Lance Yang --- v1 -> v2: - Update subject and changelog (per Barry) - https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index fb63d9256f09..1320b88fab74 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page, #endif } -/* We support batch unmapping of PTEs for lazyfree large folios */ -static inline bool can_batch_unmap_folio_ptes(unsigned long addr, - struct folio *folio, pte_t *ptep) +static inline unsigned int folio_unmap_pte_batch(struct folio *folio, + struct page_vma_mapped_walk *pvmw, + enum ttu_flags flags, pte_t pte) { const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; - int max_nr = folio_nr_pages(folio); - pte_t pte = ptep_get(ptep); + unsigned long end_addr, addr = pvmw->address; + struct vm_area_struct *vma = pvmw->vma; + unsigned int max_nr; + + if (flags & TTU_HWPOISON) + return 1; + if (!folio_test_large(folio)) + return 1; + /* We may only batch within a single VMA and a single page table. */ + end_addr = pmd_addr_end(addr, vma->vm_end); + max_nr = (end_addr - addr) >> PAGE_SHIFT; + + /* We only support lazyfree batching for now ... */ if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) - return false; + return 1; if (pte_unused(pte)) - return false; - if (pte_pfn(pte) != folio_pfn(folio)) - return false; + return 1; - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, - NULL, NULL) == max_nr; + return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, + NULL, NULL, NULL); } /* @@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (pte_dirty(pteval)) folio_mark_dirty(folio); } else if (likely(pte_present(pteval))) { - if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && - can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) - nr_pages = folio_nr_pages(folio); + nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval); end_addr = address + nr_pages * PAGE_SIZE; flush_cache_range(vma, address, end_addr); @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_remove_rmap(folio); } else { folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); - folio_ref_sub(folio, nr_pages - 1); } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); - folio_put(folio); - /* We have already batched the entire folio */ - if (nr_pages > 1) + folio_put_refs(folio, nr_pages); + + /* + * If we are sure that we batched the entire folio and cleared + * all PTEs, we can just optimize and stop right here. + */ + if (nr_pages == folio_nr_pages(folio)) goto walk_done; continue; walk_abort: -- 2.49.0