From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2F2D3644C6 for ; Wed, 1 Jul 2026 08:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782893900; cv=none; b=Q1xQVd8c1cnMWQsI12ugY5T7P/nzMB2Y9TNlq/dInpDJ5SasNrKX1mPvX613jes1wUGtwq4S6iy3YhWm74mBtybb0jOGPo3QqJo4r2KOY9TVVD1d7rzlN5hl5j7B8JVQMqNrnXpt3YdiS79UvkTwUAgX9R+i1z6jsTxjB5bS4p4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782893900; c=relaxed/simple; bh=+4KHaDE8WyQsJBn+HCZedPT6aLi5fPXZfGOgH74DULY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=EmtdLJZQlmonJi/4F+pdicvkE7OT2FNJewDC/KBFGrX1v+DyAd6Zl50Z2yoGIf5HjOquMWGTXZnQZvVxHl1CZzp2jfzjL5Jhj/ULTrmuHbl3kj5GpNuSO6fKx+4Pw2Hg9Rk6FQhzuRwWhZaR8Ofi5e/fVwVla6CQxYGbAbPrNkE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Rv+M82k2; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=dmSI+o5J; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Rv+M82k2"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="dmSI+o5J" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1782893897; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LmfLTmWYgd/3XHF3IWN+KkX93s3f/MCewHSC/ez+FP8=; b=Rv+M82k2KUEAEzOHf+DZ+/Ku1vpr4Dp+G8ilzN/ekRunUPNI31QfJeDxg7rjXyL0yf3Tf1 2X/h2TtnA5e4WeuPIawsH2crKQzc7I8KS2wuXVLFJtPCM+lwP5g7bJ6tK3LVUBJVOwXUN2 eru0q9oO9x6TizBGLId+DVlJqCrPNNo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-215-N3SjSF41OL-jz5v63yQ1hw-1; Wed, 01 Jul 2026 04:18:16 -0400 X-MC-Unique: N3SjSF41OL-jz5v63yQ1hw-1 X-Mimecast-MFC-AGG-ID: N3SjSF41OL-jz5v63yQ1hw_1782893895 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-493c36b290cso185075e9.3 for ; Wed, 01 Jul 2026 01:18:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1782893895; x=1783498695; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=LmfLTmWYgd/3XHF3IWN+KkX93s3f/MCewHSC/ez+FP8=; b=dmSI+o5JmSf+ey8GGtSWuNDLsEBbT5Ty/EYMw45Ia3iT/ItVjthXrJGRQNcjXkV45f rKGDbzSF7Q6/gKtlztym2v+yj9ft9Gvy9fh7MFz+GuIHAb1ThGZEFjEBHpdOwCHm9Ucd 04m/cZiYisOpLbBigUeGfDz71Qxs/an97yvHVRX9tmSY6hw5bk8pWiTmEWj/ZLvfXx4n BEQsG8snsNWI4P0IZTQ1sr5RRTj0UO3JHylQnUdqY+CWECeUHsaxBbyfdi+LUfD4g02p x07vpW/68woaUOY4FVDCJAcx6unacHpktsFpnU9dojNnSmkHqGNZsnXYATRsFQDWBEVr /4wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782893895; x=1783498695; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LmfLTmWYgd/3XHF3IWN+KkX93s3f/MCewHSC/ez+FP8=; b=RLvvN5ccLxmDa8Z5EIXotyFIMh9NteqprfgdMV8xw/3ZHfY2RtCSW/X3l+Ej/W6cM8 bzc+XgahHLG1YAkVAlXUbay0v+noZ+0cuwKpV2F6gBssKNndOF0c5Au6QV1iOPJ+HXhA I0FpYmI+zldPEssZk88kgcMXDVc1ebKvqdroXBXhZFJurOjvh6FlHuLRyHleHwJetKlV wqqFSVq7nBAZm13nKMFPEVfiyUdahKydwaimHsZb1IqrY9QA4O7McQ8p1RcwA/tJNNhD ZoToU71A+lhQTI1hVkhd4EUPFZ7Z25In9Ergpdg+8+VaaDTc554PWn26Rrc2ZcP2g7Df e0mg== X-Gm-Message-State: AOJu0Yy59U3b0fPTG4HMZrkXBjU/A2mWWW97u2eNrzpL/QjHPokXqRVD 0gRAnAMCKpXtAueL4pFtjOhqFHHGFcIA4bDnU52MQJtiO5PnA6CZgFbvjDVnIvEFByeULJzuKhC pG43PH8LI2HN9T/UjUVq53ZhyMzAmHDsUS5ePPUvpIk3l9eEs5IMw+zaU25COUqWZxw== X-Gm-Gg: AfdE7ckKpt3MVQH3QKR9JItuhsOVqbM95kGgIDrWSCF3D/bCALtZT+5xTCBhsnikAjd rjI57za1h0RkJo3kXXPiqrz6cdBqxlEz5Hh5FiGSHMPn78uvkRZqqiJ81Y1czrH/qG0kd8D+Rsz h/YtiCqYqHxH9SmeMGD/5Tt9kzVrZ1BorvP1n/2SrAUdjyxVwPziTli+N0AtTJwxwvqhhXpqIz0 i70MC/NZacGQErKJhzIAOknJcHJZ/2gcSuY3Bhv0qVa8149l/CAyCdRvRECR0pZhl+JK6rw3Is9 VOMYWgLjrCPDALGfVAPKzgenyjXeRCEepx5/1NTAhC7Py9JxPsSBoLHlvKBVUAga1UjsZKMXL62 cZF2HPKvAVdYEylDyWnvWBfF+1zen4xfl X-Received: by 2002:a5d:6e46:0:b0:472:6602:3347 with SMTP id ffacd0b85a97d-47759754d45mr731609f8f.43.1782893895127; Wed, 01 Jul 2026 01:18:15 -0700 (PDT) X-Received: by 2002:a5d:6e46:0:b0:472:6602:3347 with SMTP id ffacd0b85a97d-47759754d45mr731560f8f.43.1782893894571; Wed, 01 Jul 2026 01:18:14 -0700 (PDT) Received: from redhat.com (IGLD-80-230-85-71.inter.net.il. [80.230.85.71]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-47567a6f351sm14961756f8f.36.2026.07.01.01.18.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jul 2026 01:18:13 -0700 (PDT) Date: Wed, 1 Jul 2026 04:18:08 -0400 From: "Michael S. Tsirkin" To: "David Hildenbrand (Arm)" Cc: linux-kernel@vger.kernel.org, Miaohe Lin , Naoya Horiguchi , Andrew Morton , Oscar Salvador , Andi Kleen , Hidehiro Kawai , Rik van Riel , Vlastimil Babka , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Christoph Lameter , David Rientjes , Roman Gushchin , Harry Yoo , Hao Li , Kiryl Shutsemau , Byungchul Park , linux-mm@kvack.org, linux-cxl@vger.kernel.org Subject: Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops Message-ID: <20260701041112-mutt-send-email-mst@kernel.org> References: <0b5f8b4b-d7dc-4b79-9555-a5b36265f3a9@kernel.org> <20260629030657-mutt-send-email-mst@kernel.org> <4f5ba5d6-246c-4430-9737-e8dd8e4c5142@kernel.org> <20260629092856-mutt-send-email-mst@kernel.org> <54c8cbee-9b26-458c-93ba-5aa594f5d1e8@kernel.org> <20260629174225-mutt-send-email-mst@kernel.org> <20260630174852-mutt-send-email-mst@kernel.org> <2f884bfa-3cd5-4fba-8aa4-c2e68890ab64@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2f884bfa-3cd5-4fba-8aa4-c2e68890ab64@kernel.org> On Wed, Jul 01, 2026 at 10:08:45AM +0200, David Hildenbrand (Arm) wrote: > >> Cheers, > >> > >> David > > > > Yay. I did that + dropped the extra lock/unlock and now it's in the noise in > > my testing. needs much more testing of course. > > Cool. I'd expect that latency-sensitive workloads (PREEMPT_RT) would not want to > have hwpoison handling either way, so using the no_resched variants at these > places might be doable. > > > > > If you want me to post (including addressing your other feedback) let me > > know. > > > > Let's first discuss the options. We essentially have the following one so far: > > 1) Ignore the problem > > It's been there forever ... but I am not quite happy about that. > > 2) Use atomics everywhere > > The easiest+cleanest, but as measured, the performance hit is real. > > 3) Keep retrying for a couple of times > > The big problem is "how long". A CPU in a hypervisor might be stalled for quite > a while (20s? can be longer). So on this idea. It might not matter. What I had in mind is: 1. run the current logic 2. add page to a list of pages to check, then invoke e.g. call_rcu_tasks (or call_rcu_tasks_rude) maybe 3. in the callback, recheck and if poison cleared, go back to 1 4. otherwise everyone will see the bit set, remove from list we are done it seems to not regress anything, and for the rare race, we set the bit eventually. > 4) Disable preemption around non-atomic updates + synchronize_rcu() loop > > I think it should work, but I don't like the possibly endless retry loop. (well, > it would never be an endless loop in practice) > > Is there a problem with synchronize_rcu() latency, given that it can take in bad > scenarios a couple of seconds? (grace periods can be large ... but also very short) > > 4) disable local interrupts around non-atomic updates + let all CPUs perform > atomic setting/clearing of the bit through smp_call_function_many(). > > Disabling local interrupts is way too expensive. :( > > 5) disable preemption around non-atomic updates + let all CPUs perform > atomic setting/clearing of the bit through schedule_on_each_cpu() > > Mixture of 3) + 4), but schedule_on_each_cpu() can also possibly take a long > time (as long as we cannot schedule on a CPU ...). > > We could likely build something that remembers all pages with bits-to-set / > bits-to-clear, to then kick off a per-cpu work ... and once all CPUs processed a > page it was fully processed ... needs much more thought. > > 6) stop_machine() > > Big hammer. I think we'd still need to disable preemption around non-atomic updates. > > 7) Move hwpoison bit out of page->flags > > Use a sparse bitmap. Quite invasive, and any hwpoison bit checking code would > get more expensive. > > > > Did I mess something up / forget something? > > > Or look into call_rcu_tasks/call_rcu_tasks_rude which might work without > > changes in mm. > > Looking into call_rcu_tasks() (the first time) the semantics are interesting > ("involuntary preemption is not a Tasks RCU quiescent state"). > > I'm curious how this interacts with random page allocations (on the IRQ path?), > and which design you envision given that we only get a single callback (who > prevents code to re-enter). > > > > > Or if someone else is gonnu work on this, absolutely fine too. > > I can likely let someone work on that, but I think we should first figure out > what can really be done. (and likely when we send it out, it should be an RFC) > > -- > Cheers, > > David