From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83C50386C39 for ; Tue, 21 Apr 2026 09:20:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776763206; cv=none; b=Q/QWiAx5cKKh/u86S/vthVeqJi0B5KVyWrPn0Q07gYP+4r8/jFZR2OSBabk/r+scGarq7B1R3mg2A2uEHXcJo1DvqjNtH391Dn0ESw96WpTe1wxLme4Vr++pkpOc/DOEv3RhyFWnt6MUUT/0Q8uA0Q6cxJ9veAcZUEJzy1r/awI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776763206; c=relaxed/simple; bh=5t2aGVmy5DxnRS3CfWC8ST77lL0UBpxIy+nXjKCP6Tc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=VsXvvdEVfTraa0WFAjvCGiwA7B0G+L9SD9SGG/wvsZThI8UJS5nr3cLrwkP/g5kCp2OI37qNLnJ1fx0cu0LVRWL2Ugwro281gAMHH+AQcjl3QpnGWuhL4bXAKbYz4r0Oc9TM4i2woRn7hIQJP6rOcCKmQl+rnMOjHZsDBpTnqOY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Ho+r5RI1; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Ho+r5RI1" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1776763194; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=0DsPOPl9WLRD+9guVWfxhRP62svhcatI0unY0NjcWUA=; b=Ho+r5RI16W38YH013lV7xGX5ITUHJv1iz7HrCJBNr2VMgfr3p0bzT7AswiJg3ll+xLB5epAD3WBC+iHKI+sG3YungYl6NRdu8wUaUQuRXgyON/mr7X1WVFnc1HuEmYmp6ABeOIEuY8QeAhP2iy+u+1SBRDqvECEaR9qtKzjB2dg= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R721e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam011083073210;MF=ying.huang@linux.alibaba.com;NM=1;PH=DS;RN=27;SR=0;TI=SMTPD_---0X1SYQHf_1776763179; Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0X1SYQHf_1776763179 cluster:ay36) by smtp.aliyun-inc.com; Tue, 21 Apr 2026 17:19:52 +0800 From: "Huang, Ying" To: John Hubbard Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Alistair Popple , Axel Rasmussen , Yuanchu Xie , Wei Xu , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , LKML , linux-mm@kvack.org Subject: Re: [RFC PATCH 0/2] mm/migrate: wait for folio refcount during longterm pin migration In-Reply-To: <20260410032333.400406-1-jhubbard@nvidia.com> (John Hubbard's message of "Thu, 9 Apr 2026 20:23:31 -0700") References: <20260410032333.400406-1-jhubbard@nvidia.com> Date: Tue, 21 Apr 2026 17:19:36 +0800 Message-ID: <87h5p4isbb.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Hi, John, John Hubbard writes: > Hi, > > This adds a bounded sleep to migration so that FOLL_LONGTERM pinning can > wait for transient folio references to drain, instead of failing after a > fixed number of retries. The wait uses a one-second timeout. An Is the one-second timeout appropriate for all users? Do some users prefer fail-fast behavior instead? If so, should we add another FOLL flag to support a timed wait? > alternative approach would be to call wait_var_event_killable() with no > timeout, but that doesn't match as well with migration's "this will > probably work" API. In other words, a short sleeping wait is more > appropriate here. > > When migrating pages for FOLL_LONGTERM pinning, migration can fail with > -EAGAIN if a folio has unexpected references. These references are often > transient, but the current retry loop gives up too quickly. This series > adds wait_var_event_timeout() at the retry points, paired with > wake_up_var() in folio_put() to wake the sleeper as soon as the refcount > drops. > > The wake_up_var() calls in folio_put() are gated behind a static key, > disabled by default, so non-migration workloads pay zero cost. > migrate_pages() enables the key on entry when the reason is > MR_LONGTERM_PIN, and disables it on exit. > > Toggling the key is not free. folio_put() is static inline, so every > compilation unit that calls it gets its own patch site (roughly 500 in > vmlinux, plus modules). On x86, jump label patching is batched (256 > sites per batch, 3 IPI rounds per batch), so enabling the key costs > 6-9 IPI broadcasts, a few hundred microseconds on a large machine. > That cost is paid twice per migrate_pages() call. Migration itself > spends several milliseconds per batch on LRU isolation, TLB flushes, > and page copies. Concurrent longterm-pin migrations after the first > just do an atomic_inc (no patching). > > Matthew Brost offered to performance-test this series [1], as Intel has > tests that stress migration and good metrics to catch regressions. > > [1] https://lore.kernel.org/all/aX+oUorOWPt1xbgw@lstrano-desk.jf.intel.com/ > > John Hubbard (2): > mm: wake up folio refcount waiters on folio_put() > mm/migrate: wait for folio refcount during longterm pin migration > > include/linux/mm.h | 8 ++++++++ > mm/migrate.c | 30 ++++++++++++++++++++++++++++++ > mm/swap.c | 10 +++++++++- > 3 files changed, 47 insertions(+), 1 deletion(-) > > > base-commit: 9a9c8ce300cd3859cc87b408ef552cd697cc2ab7 --- Best Regards, Huang, Ying