From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6387E2701C4 for ; Wed, 10 Jun 2026 07:35:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781076949; cv=none; b=MoeECuPaPNOh5Ykg4X0oD/z/IxMRm1ZgOhbTtH3XZOpU6jGlMukeBhF78+FEPen2oM83p1EzlCEEtEkQ7+pIEk1pxgAfh6NXOAhaKenMj21sphZSP3vZfKrWh5AmZovmkhg5iP2dgwuKzgM2FmtMRpq24qJ7ojhdU+JNPjAg4ak= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781076949; c=relaxed/simple; bh=RyGue1uoNkTxGhwWeHgIqe/l8UCU/NN1IJGM8emsVyM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KcAqL8Jb3p/dQKLJZF0gy3qxhv9MwEKr+krwyYawCX72ah2zfZKs/WBSrrByH6HZtaLAflaQjM8CuXZEHUQlE3vFrwXpvyCPbQBCF3vkN7tZh6UCBX107ibQA87Nuq9DJkD8NCbdNu/P71MqLtROkUUKl/biEX5k0CqlKykP6lk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=igh6hm9f; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=Nm9lvmjm; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="igh6hm9f"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="Nm9lvmjm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1781076947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UWlNJsELOJGMZ+FI7OsgS8pm8RymJ32pYzeA2rVYtbA=; b=igh6hm9f7nUIR9lkZKRgsGien6hCh8vzSIoAhLv+OVq8vaKCiRAhX5vF+80kCOBRwkQBCZ G2Ge4z9WoJUaXEB8iy2T1wPOnFDq1ZwcARK5HDY4dOD6FeWpXjtrItTCuLKDQeoZT5Oh6Y JQ7Hcc/ldg2M0lxgjeOra5uaSHj/Dys= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-517-gI-XBkWgOeW_dC4-rLJ3gQ-1; Wed, 10 Jun 2026 03:35:46 -0400 X-MC-Unique: gI-XBkWgOeW_dC4-rLJ3gQ-1 X-Mimecast-MFC-AGG-ID: gI-XBkWgOeW_dC4-rLJ3gQ_1781076945 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-490cc1ae292so30146455e9.1 for ; Wed, 10 Jun 2026 00:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1781076945; x=1781681745; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UWlNJsELOJGMZ+FI7OsgS8pm8RymJ32pYzeA2rVYtbA=; b=Nm9lvmjmkfHDBmtqgX8JCdmAhPtTaCGcex3ZX110hsVVTqH+nkjh8M+r5f5cUTusWI bqqBayMJsiB+e/yefaNz4ofNv1GtkThNl6JzjW0yD48+wuJCs5cg2wT36XDP1ayYsfOz CMWnTsBSX4GTDMb0h8AgrRhcw5hF34I30H2owkUhG2AWDYpr6ZluZ89RRifhWYm9M3PS HAQSyWsyW/wKWTuNfNKuB5TmM+f9Hs79W00kIy4vDA0PeJk7BL6iDhQ2283MIpEricka ze9a5MEIDLiocxTTqgyEvaPX5OUHnyTnPYnVOBDHncy/+y4bpTkFqm8KToWedXmQObwO oP7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781076945; x=1781681745; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UWlNJsELOJGMZ+FI7OsgS8pm8RymJ32pYzeA2rVYtbA=; b=kaPCQ5wXNzrbee92zdjHbIOx9r7NZ8YBVGUF21xdaJ0S7kD8J1IppyRmZw+AJLnXSM vVsRlV77Uh32bRwhLYTf8R1NXUXKDp0Pbgc5E02xKIzxAy/CkuWlSi6BbStcluRPESYS RyMzJ1wO5x+OqHdRQdwkuABMphfJg7JtEuOqbns85tvVCua8KovIS8fyRzv0KyK9eIJc JWJEpjpnIakwunW0e8PJZr1D3VQPmKZ1XqSLGOtlh9PVVzQ/tnquSy0wZs80RnrLl5Pe eGOKLZyPwG8LHkZES/xI2LqcYi/pbvXX7Ksm4zOhDs+GBPNBOzftRaUhc8+Vva1bRszi yRNg== X-Forwarded-Encrypted: i=1; AFNElJ+jtEh5Ia0PX20eHdQUPOaixoThFEa/gNprsnfW5qMznXt0jDEcO1gFWDNpibERD+0uzzIVVs76gKEYFNM=@vger.kernel.org X-Gm-Message-State: AOJu0YyJfTy+YseEg5vwbgIP+ueeaH/WxWCZlqJSyBf57ZZUG8/wkyQE xIOFy9ka0cAAMe4FNJ/KXTbclkARVIYTK8qcgpbZPcACu8iRGFGfjFD42tZ95sz6hWFDgvJncpz l8q1buX6LQivg9WS68GobdPO9Qnu+duJRhpWG05JSzhcKYeBCc4DctMtkrx5LeKyuyA== X-Gm-Gg: Acq92OEGeLGN+pY9Fa3PsHfqd4z1cxP1YV4huhqbnYzgDFPVFb/qEikDdlAhbQdLhFZ T+papsOKAgj3+v3s1e75tm5hvtCVsjierN+GJPaEfu7jEDKvek0KHgyE4iiS2T+WE/ccjVlkbWU d1l3JCW5jtSbDfFIHoUhU9j5s6ZxrKSB7OqL1n4GTo2UWDRVJ1xYILhOo9wZHyoJEa0QXl+MZNn PCVEXkqA1wp6SGhNs0JQk79cp1jn6+ldhScYJJXrR0TP22SrS2+0m5FARFo92d7fH5vxOEOhgIW QhsVjNvX8kgteSmhCaOnD87SjBvm0TIliqmCkOI+0Yv5S4J4K4JQofJpFCBmeBKtoDWFmzTxg5J n8x9tcJxehnCsvTqqGTYAETbUEDx98qmVqr66siHgCuPbV4ucHLdAfA== X-Received: by 2002:a05:600c:3588:b0:490:4b89:5372 with SMTP id 5b1f17b1804b1-490c2cf5433mr343576415e9.11.1781076944748; Wed, 10 Jun 2026 00:35:44 -0700 (PDT) X-Received: by 2002:a05:600c:3588:b0:490:4b89:5372 with SMTP id 5b1f17b1804b1-490c2cf5433mr343575795e9.11.1781076944181; Wed, 10 Jun 2026 00:35:44 -0700 (PDT) Received: from redhat.com (IGLD-80-230-85-71.inter.net.il. [80.230.85.71]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490bc391aaasm627725935e9.1.2026.06.10.00.35.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jun 2026 00:35:43 -0700 (PDT) Date: Wed, 10 Jun 2026 03:35:37 -0400 From: "Michael S. Tsirkin" To: Miaohe Lin Cc: Zi Yan , "David Hildenbrand (Arm)" , Andrew Morton , linux-kernel@vger.kernel.org, Jason Wang , Xuan Zhuo , Eugenio =?iso-8859-1?Q?P=E9rez?= , Muchun Song , Oscar Salvador , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Hugh Dickins , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Christoph Lameter , David Rientjes , Roman Gushchin , Harry Yoo , Axel Rasmussen , Yuanchu Xie , Wei Xu , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , virtualization@lists.linux.dev, linux-mm@kvack.org, Andrea Arcangeli , Naoya Horiguchi Subject: Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Message-ID: <20260610033441-mutt-send-email-mst@kernel.org> References: <20260609111020.e88f51a7b6ebc37360d66fdc@linux-foundation.org> <8c1f468e-b50a-487a-a267-8d1ea5a61c87@kernel.org> <38C84F23-E881-4DB2-86BA-93F39D44AE1B@nvidia.com> <20260609162437-mutt-send-email-mst@kernel.org> <4BA276D9-9EB9-4E2A-8A05-657ACACFF227@nvidia.com> <20260609165829-mutt-send-email-mst@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: > On 2026/6/10 5:00, Michael S. Tsirkin wrote: > > On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > >> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > >> > >>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >>>> > >>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>>>> > >>>>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" wrote: > >>>>>>> > >>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>>>> update to page->flags can race with non-atomic flag operations > >>>>>>>> that run under zone->lock in the buddy allocator. > >>>>>>>> > >>>>>>>> In particular, __free_pages_prepare() does: > >>>>>>>> > >>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>>>> > >>>>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>>>> is set and the write happens after. Will only get worse if/when > >>>>>>>> we add more non-atomic flag operations. > >>>>>>>> > >>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>>>> around ClearPageHWPoison in the retry path. This > >>>>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>>>> (hardware memory errors). > >>>>>>>> > >>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>>>> in this file operate on pages already removed from the buddy > >>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>>>> need zone->lock protection. > >>>>>>> > >>>>>>> Sashiko is saying this doesn't do anything "Because > >>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>>>> > >>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >>>>>> > >>>>>> Battle of the bots: it's right. > >>>>> > >>>>> Yep, __free_pages_prepare() changes the page flag without holding > >>>>> zone->lock. > >>>> > >>>> __free_pages_prepare() works on frozen pages and assumes no one else > >>>> touches the input page. To avoid this race, memory_failure() might > >>>> want to try_get_page() before TestClearPageHWPoison(), but I am not > >>>> sure if that works along with memory failure flow. > >>>> > >>>> Best Regards, > >>>> Yan, Zi > >>> > >>> > >>> > >>> Actually memory failure already plays with this down the road no? > >>> > >>> So maybe it's enough to just SetPageHWPoison afterwards again? > >>> > >>> > >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>> index ee42d4361309..4758fea94a96 100644 > >>> --- a/mm/memory-failure.c > >>> +++ b/mm/memory-failure.c > >>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > >>> if (!res) { > >>> if (is_free_buddy_page(p)) { > >>> if (take_page_off_buddy(p)) { > >>> + SetPageHWPoison(p); > >>> page_ref_inc(p); > >>> res = MF_RECOVERED; > >>> } else { > >>> > >>> > >>> and maybe in a bunch of other places in there? > >> > >> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), > >> just set it again here? > > > > Yea. > > > >> Why not do it after get_hwpoison_page(), since that > >> is the expected page flag? > > > > It's still in the buddy at that point right? I'm worried buddy might > > poke at flags. > > Since __free_pages_prepare() executes entirely locklessly, the only way to ensure > HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure > pages are not on the way to buddy... > > Thanks. > . Right so here after take_page_off_buddy it's ok, for example. -- MST