From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 909DF3BFACE for ; Mon, 29 Jun 2026 08:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782720641; cv=none; b=ZQbckiXahf8aAm56goQ0nqZf7SgeY8IhDN/bApFHXnkJ4IIlmWq96xnFsB9vTf7PcScR9ep6/yileJH2FfnACn00ht0aSQco0djojqYEGW16Mr+fmbo1iZD/V8WNNvM+wqDLEtpPt/Tz1qqi1sdGF6wLUsGBFtGI+056MNMPfTU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782720641; c=relaxed/simple; bh=+LByNfskc+x8hqHIhkCSzUPCEP23i5n6yTX+LXEdGyI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=r3eqJR6211BTh1Vzj3SgKiBiDdZe2NFTlhhpV5Z/2+rBVCurL3tM8lWyC4qyEKT1Gqhy/BMpxo5sZdFA/fXBHbpBbp4c96ydFOMU2hfIgoJaumqLFIm6YmyNr/W3wk1LNZzjAk3q5BMW+AD31Lq0P16VPtk8jDE1BCxUPwr2pq8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MMX/5Qza; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=hzfE9sHD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MMX/5Qza"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="hzfE9sHD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1782720636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YfZ+eLRL94nE3BApW8xdPyWC54V3q52Zz68GpVEJjh8=; b=MMX/5QzauUULnhcGIrhD7j33jyswOE5J62VVZL3qcdOIcs6YIkJpOKtZRBcEq5bfIiIBo6 wsaOXfuPudxpiTnr6mWIw1cc/RlHzZiHCvbxiY18Ms5Po3ryr+6DRzfdkqjQWmrKaPBwq3 lz/oF/vmTVRKNJuNq83YjonUEO8PsvU= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-557-3aE9GsQzPBiQkVxUN4g75g-1; Mon, 29 Jun 2026 04:10:35 -0400 X-MC-Unique: 3aE9GsQzPBiQkVxUN4g75g-1 X-Mimecast-MFC-AGG-ID: 3aE9GsQzPBiQkVxUN4g75g_1782720634 Received: by mail-pj1-f70.google.com with SMTP id 98e67ed59e1d1-37c9127e316so2023509a91.1 for ; Mon, 29 Jun 2026 01:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1782720634; x=1783325434; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=YfZ+eLRL94nE3BApW8xdPyWC54V3q52Zz68GpVEJjh8=; b=hzfE9sHDnihj4wUT7D+hxLXZPn92tQ6zgbNXg1xnVMM+ya+3iIpjK3lC4o/4AMhIXK OewUPicLnXM2KH/z6T3FTVZGu3eokffSCOymQtXu1LfywRbnXB8PXbrGS7YXXpe38HAj LJpo3ZCiLbw+TRtNMTDp8nsp+T82FsYbzoh4x95L7azxI4Cnq1gY9ts/xGYtzj6A5suW y5jfSdmG9Oi8Z656Ama0EoFTQQyZlkii9sZQI36soj93DgSBTs0UgW9e9R7SNyya3Bu0 HjRfG+IOFgz5w+idtpELFWunne4dJup3M9VQvP1Eqlu0dE9TuQx4arJ1NCzui8PfE1m7 hbSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782720634; x=1783325434; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YfZ+eLRL94nE3BApW8xdPyWC54V3q52Zz68GpVEJjh8=; b=mVTGOSv2JWM66hOuwu4bZsaExDYsRyTjByjajcNE0KDZiw9WEiIPi73bdH95uZRlVb YKJ4bT3jT+RYipsY9btND7hUHoZvdO377fpHJEDjKVp5FEUcEXbkIhQRWpyQYON8QKrb OVssadXTpZX1d81WpqS0WmBQNJWXyrQ/t5S22cihmUK86ldV3ZyOibLIwH5YOPdoOOcJ 6xLh5k1s31h+fRhbMM29r5Dvgh519vLjeLpF0J3i7HQr9fJGWX8aczSFcohnkNkgaQxp eR+2bGEBQ98ceO+ww1eQus9hmXkVZbVXMHyDXIxivpe/AL7ANO95PqbKEFzEiJAFYZ5Y wFoA== X-Gm-Message-State: AOJu0Ywt6/ajg12iQxRVdlQ+6+usIx2RzdetHQSB8GuecUJ/KYHqx1+Y M1oryAqYsmwWJPc4FCZ8z4LefV8Ks3kduuP6L9nX/oHS70+IYohgA1LepoIlSUPnx/Os4nshxgC 0i88z0QLvV+NpFWLN30DazLEn5FCBOMv83wJmP2viDrO8hR5slzeRQMnVewiyER5mdg== X-Gm-Gg: AfdE7cntQVW07szxn9L9auoKDO/3obVyGyt6b7F6IqSo2uLKdeNS+fvLKJsfjfxV/+/ l1IeyVI8KZV/hMrBC3zJehr+MSvxj9MTSBS1JwgPzXRF6N0WURkdfO5motr4LMLnQ7ZUqTj3qlj C8SU3fx2dXt804HC2rWQzpBCZl8Tb3dX65KEZDWePIX/O1n9z6t7SALNLZW2YrGft6/R/QMwRUJ LPxzEVFD+nrW2QALp2V7zviJgE6EvVGw8wq5QeJnRDDbLaRy0MdXIF0rc6L0bt28iCmQWMBNBCL M2BSGk3GIo1XhgYo0lLOYu4kbYUZI3+eYMlVpLeuS2vyq8iQ47q4GDp+3oUvgxHNW5e/O9BTCBL 7QyMUUuu7Nv4plKpkfNOwjulzDwfHMljP X-Received: by 2002:a17:90b:1a8b:b0:369:7f25:cec0 with SMTP id 98e67ed59e1d1-37df9d55f51mr14591052a91.0.1782720633632; Mon, 29 Jun 2026 01:10:33 -0700 (PDT) X-Received: by 2002:a17:90b:1a8b:b0:369:7f25:cec0 with SMTP id 98e67ed59e1d1-37df9d55f51mr14591002a91.0.1782720633020; Mon, 29 Jun 2026 01:10:33 -0700 (PDT) Received: from redhat.com (IGLD-80-230-85-71.inter.net.il. [80.230.85.71]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-37ec8b2fea5sm2343101a91.0.2026.06.29.01.10.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jun 2026 01:10:32 -0700 (PDT) Date: Mon, 29 Jun 2026 04:10:06 -0400 From: "Michael S. Tsirkin" To: Andi Kleen Cc: linux-kernel@vger.kernel.org, David Hildenbrand , Miaohe Lin , Naoya Horiguchi , Andrew Morton , Oscar Salvador , Hidehiro Kawai , Rik van Riel , Vlastimil Babka , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Christoph Lameter , David Rientjes , Roman Gushchin , Harry Yoo , Hao Li , Kiryl Shutsemau , Byungchul Park , linux-mm@kvack.org, linux-cxl@vger.kernel.org, David Hildenbrand Subject: Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops Message-ID: <20260629033608-mutt-send-email-mst@kernel.org> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sun, Jun 28, 2026 at 07:11:58PM -0700, Andi Kleen wrote: > On Sun, Jun 28, 2026 at 05:45:22PM -0400, Michael S. Tsirkin wrote: > > This series fixes the race by: > > > > 1. Having memory_failure() call synchronize_rcu() + retry after > > setting HWPoison, so that any in-flight non-atomic RMW that > > read the old flags value completes before we proceed. > > > > 2. Wrapping all non-atomic page flag operations in > > rcu_read_lock/rcu_read_unlock (CONFIG_MEMORY_FAILURE only), > > so that synchronize_rcu() actually drains them. > > It wouldn't surprise me if your underlying performance assumptions > -- an non contended atomic is cheaper than a rcu_read_lock/unlock -- > are not true in various CPU/kernel configuration combinations. > > Modern CPUs have fast atomics when uncontended in normal circumstances. > But it probably doesn't matter much either way because the difference > shouldn't be very much. Hmm. It's a bit silly that I didn't try. Seemed clear to me, but, on this old xeon... insns/iter cycles/iter ------------------------------------------------------- base 12238 +/- 1.0 17889 +/- 97.9 rcu_read_lock 12251 +/- 7.3 17991 +/-191.6 atomic ops 12233 +/- 1.9 17733 +/-136.5 The diff in the noise. And old, slow CPUs maybe don't have MF at all. So maybe just atomics instead of all this mess. > It seems very complicated for something that > could be much simpler. > > But I guess it's fine. > > -Andi Indeed. David already said he's gonnu look at this himself, but here's what I tested, maybe helpful for whoever wants to try this approach: Signed-off-by: Michael S. Tsirkin diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 7223f6f4e2b4..8f0940cf2f29 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -404,6 +404,44 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n) #define FOLIO_HEAD_PAGE 0 #define FOLIO_SECOND_PAGE 1 +/* + * When CONFIG_MEMORY_FAILURE is enabled, non-atomic page flag operations + * must be atomic to prevent clobbering concurrent TestSetPageHWPoison() + * in memory_failure(). Otherwise, use the cheaper non-atomic versions. + */ +#ifdef CONFIG_MEMORY_FAILURE +#define __pf_set_bit set_bit +#define __pf_clear_bit clear_bit + +static __always_inline void page_flags_clear(unsigned long *flags, + unsigned long mask) +{ + atomic_long_andnot(mask, (atomic_long_t *)flags); +} + +static __always_inline void page_flags_set(unsigned long *flags, + unsigned long mask) +{ + atomic_long_or(mask, (atomic_long_t *)flags); +} + +#else /* !CONFIG_MEMORY_FAILURE */ +#define __pf_set_bit __set_bit +#define __pf_clear_bit __clear_bit + +static __always_inline void page_flags_clear(unsigned long *flags, + unsigned long mask) +{ + *flags &= ~mask; +} + +static __always_inline void page_flags_set(unsigned long *flags, + unsigned long mask) +{ + *flags |= mask; +} +#endif /* CONFIG_MEMORY_FAILURE */ + /* * Macros to create function definitions for page flags */ @@ -421,11 +459,11 @@ static __always_inline void folio_clear_##name(struct folio *folio) \ #define __FOLIO_SET_FLAG(name, page) \ static __always_inline void __folio_set_##name(struct folio *folio) \ -{ __set_bit(PG_##name, folio_flags(folio, page)); } +{ __pf_set_bit(PG_##name, folio_flags(folio, page)); } #define __FOLIO_CLEAR_FLAG(name, page) \ static __always_inline void __folio_clear_##name(struct folio *folio) \ -{ __clear_bit(PG_##name, folio_flags(folio, page)); } +{ __pf_clear_bit(PG_##name, folio_flags(folio, page)); } #define FOLIO_TEST_SET_FLAG(name, page) \ static __always_inline bool folio_test_set_##name(struct folio *folio) \ @@ -458,12 +496,12 @@ static __always_inline void ClearPage##uname(struct page *page) \ #define __SETPAGEFLAG(uname, lname, policy) \ __FOLIO_SET_FLAG(lname, FOLIO_##policy) \ static __always_inline void __SetPage##uname(struct page *page) \ -{ __set_bit(PG_##lname, &policy(page, 1)->flags.f); } +{ __pf_set_bit(PG_##lname, &policy(page, 1)->flags.f); } #define __CLEARPAGEFLAG(uname, lname, policy) \ __FOLIO_CLEAR_FLAG(lname, FOLIO_##policy) \ static __always_inline void __ClearPage##uname(struct page *page) \ -{ __clear_bit(PG_##lname, &policy(page, 1)->flags.f); } +{ __pf_clear_bit(PG_##lname, &policy(page, 1)->flags.f); } #define TESTSETFLAG(uname, lname, policy) \ FOLIO_TEST_SET_FLAG(lname, FOLIO_##policy) \ @@ -806,7 +844,7 @@ static inline bool PageUptodate(const struct page *page) static __always_inline void __folio_mark_uptodate(struct folio *folio) { smp_wmb(); - __set_bit(PG_uptodate, folio_flags(folio, 0)); + __pf_set_bit(PG_uptodate, folio_flags(folio, 0)); } static __always_inline void folio_mark_uptodate(struct folio *folio) @@ -1162,14 +1200,14 @@ static __always_inline void ClearPageAnonExclusive(struct page *page) { VM_BUG_ON_PGFLAGS(!PageAnonNotKsm(page), page); VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); - clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f); + __pf_clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f); } static __always_inline void __ClearPageAnonExclusive(struct page *page) { VM_BUG_ON_PGFLAGS(!PageAnon(page), page); VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); - __clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f); + __pf_clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f); } #ifdef CONFIG_MMU diff --git a/include/linux/mm.h b/include/linux/mm.h index 06bbe9eba636..931dfc84319f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2343,7 +2343,7 @@ int folio_xchg_last_cpupid(struct folio *folio, int cpupid); static inline void page_cpupid_reset_last(struct page *page) { - page->flags.f |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT; + page_flags_set(&page->flags.f, LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT); } #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */ @@ -2503,8 +2503,8 @@ static inline struct zone *folio_zone(const struct folio *folio) #ifdef SECTION_IN_PAGE_FLAGS static inline void set_page_section(struct page *page, unsigned long section) { - page->flags.f &= ~(SECTIONS_MASK << SECTIONS_PGSHIFT); - page->flags.f |= (section & SECTIONS_MASK) << SECTIONS_PGSHIFT; + page_flags_clear(&page->flags.f, SECTIONS_MASK << SECTIONS_PGSHIFT); + page_flags_set(&page->flags.f, (section & SECTIONS_MASK) << SECTIONS_PGSHIFT); } static inline unsigned long memdesc_section(memdesc_flags_t mdf) @@ -2719,14 +2719,14 @@ static inline bool folio_is_longterm_pinnable(struct folio *folio) static inline void set_page_zone(struct page *page, enum zone_type zone) { - page->flags.f &= ~(ZONES_MASK << ZONES_PGSHIFT); - page->flags.f |= (zone & ZONES_MASK) << ZONES_PGSHIFT; + page_flags_clear(&page->flags.f, ZONES_MASK << ZONES_PGSHIFT); + page_flags_set(&page->flags.f, (zone & ZONES_MASK) << ZONES_PGSHIFT); } static inline void set_page_node(struct page *page, unsigned long node) { - page->flags.f &= ~(NODES_MASK << NODES_PGSHIFT); - page->flags.f |= (node & NODES_MASK) << NODES_PGSHIFT; + page_flags_clear(&page->flags.f, NODES_MASK << NODES_PGSHIFT); + page_flags_set(&page->flags.f, (node & NODES_MASK) << NODES_PGSHIFT); } static inline void set_page_links(struct page *page, enum zone_type zone, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 970e077019b7..100226f59490 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3624,8 +3624,8 @@ static void __split_folio_to_order(struct folio *folio, int old_order, * unreferenced sub-pages of an anonymous THP: we can simply drop * PG_anon_exclusive (-> PG_mappedtodisk) for these here. */ - new_folio->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; - new_folio->flags.f |= (folio->flags.f & + page_flags_clear(&new_folio->flags.f, PAGE_FLAGS_CHECK_AT_PREP); + page_flags_set(&new_folio->flags.f, folio->flags.f & ((1L << PG_referenced) | (1L << PG_swapbacked) | (1L << PG_swapcache) | diff --git a/mm/memremap.c b/mm/memremap.c index 053842d45cb1..194ca2f04a87 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -494,7 +494,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, * blindly clear bits which could have set my order field here, * including page head. */ - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ + page_flags_clear(&new_page->flags.f, 0xffUL); /* Clear possible order, page head */ #ifdef NR_PAGES_IN_LARGE_FOLIO /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d49c254174da..a3447124a725 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1359,7 +1359,7 @@ __always_inline bool __free_pages_prepare(struct page *page, int i; if (compound) { - page[1].flags.f &= ~PAGE_FLAGS_SECOND; + page_flags_clear(&page[1].flags.f, PAGE_FLAGS_SECOND); #ifdef NR_PAGES_IN_LARGE_FOLIO folio->_nr_pages = 0; #endif @@ -1373,7 +1373,7 @@ __always_inline bool __free_pages_prepare(struct page *page, continue; } } - (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_flags_clear(&(page + i)->flags.f, PAGE_FLAGS_CHECK_AT_PREP); } } if (folio_test_anon(folio)) { @@ -1392,7 +1392,7 @@ __always_inline bool __free_pages_prepare(struct page *page, } page_cpupid_reset_last(page); - page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_flags_clear(&page->flags.f, PAGE_FLAGS_CHECK_AT_PREP); page->private = 0; reset_page_owner(page, order); page_table_check_free(page, order); diff --git a/mm/slub.c b/mm/slub.c index a2bf3756ca7d..a55199f642d3 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -617,7 +617,7 @@ static inline void slab_set_pfmemalloc(struct slab *slab) static inline void __slab_clear_pfmemalloc(struct slab *slab) { - __clear_bit(SL_pfmemalloc, &slab->flags.f); + __pf_clear_bit(SL_pfmemalloc, &slab->flags.f); } /*