From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E320AC5B552 for ; Wed, 4 Jun 2025 09:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 597D58D000B; Wed, 4 Jun 2025 05:41:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 548408D0007; Wed, 4 Jun 2025 05:41:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4103C8D000B; Wed, 4 Jun 2025 05:41:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2144A8D0007 for ; Wed, 4 Jun 2025 05:41:58 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 93721140D53 for ; Wed, 4 Jun 2025 09:41:57 +0000 (UTC) X-FDA: 83517226674.20.30CE9D6 Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf29.hostedemail.com (Postfix) with ESMTP id 9FC69120009 for ; Wed, 4 Jun 2025 09:41:55 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iRLvdqB1; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749030115; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hnQsK/o/49pB3f16MebW2ISOxj3qfybQBix1xvSOEaM=; b=g4181Mt5dSFikYasN4So0NFBjRHvP5fKKnNyPRbf38xRFKtHuJDXQH/kZ8KKPslE1fULA1 UfCMqDeY641GhYbgMDi9N457gbP1BzAeR7ywR0ij4USilI0p3gFQU5ocDjD6vLQiYeooAd c1g/Z1kVhBtov51ZLxTvEIK26yPpvnk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iRLvdqB1; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749030115; a=rsa-sha256; cv=none; b=fvjU1hYNrLQegpE/D+ZeTlFjfpPG/qkrw9r+8aXLkLh/e40mlhITOt9ZW2aFChyOFW9UAJ OHV0wAq8xfHhCvRyM9Tr2QKUnfvKWQyyoMRAQy/RxP0gM3gn9+WRxZ7+O29uMY2sfNiWMt ozGVntU8CE7XsJFMi4hc4smziMOQxfo= Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-32a80cc8c88so5341611fa.0 for ; Wed, 04 Jun 2025 02:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749030114; x=1749634914; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hnQsK/o/49pB3f16MebW2ISOxj3qfybQBix1xvSOEaM=; b=iRLvdqB1oSyRnnsCQZubjRAT1Y10h99tvIIDtxHRTlVkJAwzAc3EIL/HHYNf14xwKv nrSo7jHZoRZ2KSn+U4nIiVXC5q6bujvZohNyk8ix4tn1CcA5ODtBRMrdpbJd0AwhiR18 0sUM7Z0tzXGMuyiFxpsojFJrOR1KmYKJjKQKKi/Dc4Tvc3Dc4EvPLDK0bl9JXPCazGTA Bo2I0dpIpTQk7GQHYprrFps6dPztsRAEmXfBVvsvc6yHLt8LeySmuVfd1nMzhezQowIF M727v3JvkJs7d4AA52fkuaSsBnIgbxei7uhQ7bT1mXv9THqaOCMqSx37TTjzpZyLlhJE eM3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749030114; x=1749634914; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hnQsK/o/49pB3f16MebW2ISOxj3qfybQBix1xvSOEaM=; b=cdHl0GrqMyA1gHfu6+fPjjLuYjNY0Kva9uUuKrqtYkd3GiB5jgyeTyCIDJ8X3p7bTT l+CzSq1H32iEvs5v7nIjuhMGLJP6Jz3EBQVUfwa4Imu0dLxRf2NpAw1vekMcUYL0DnVe 36GOf9ECb1cx8nS3kPjKRWZuCvhLkhdlzk4c4dm7ooLqDXyqnvLu4Ff1oqL+xatOnVfs +5EJPmfLLftCOsXH/WAStIIhmHFZL6Oc91kTmFfVGNJ5dXLvze674wreYeVAbaaRf3md +FXPB3xJVN6trl8/96+5kS2v7c3qgsI14vY+A5yOAfnDwcUJ9F0u2WvdDbosrf37zDyi HOiw== X-Forwarded-Encrypted: i=1; AJvYcCV4/IIxGBZPR1mVzSUsF9q4K9nAwZ915tZgdmiAyAhekgUunzZVmCr57t5KDS9BvkDhN1BFmD/SQQ==@kvack.org X-Gm-Message-State: AOJu0Yxy2Ujs5F4hHjVClYl8U5c39CZ0HBjZlQ4zarUcesH9yMh5rXEo xNHUWFQwyaDur6lFwPZ5Es1Dk4fSD87L/y+WPT7eRM41v8J9ZZXQSqKQO5PYVlqlyXe1NrAYLi9 dEuVxC4yHRNrpO++djI5Xv0YQK54Do9g= X-Gm-Gg: ASbGncvc4iFmndf2d0CsEi1uiLvDjc0cyAWkHBxvzTlFnFn5B/X4FRhjGHvXXIR1Oi5 xpMaVv3agFma6NA2ZOH9KRyum8k6avr55N2BOPhXCkYYlK4xQfhM8pNFtwErVjWOn85wkxwAGNj 6xj9vMBWAz3VhnOILvtbGKbSWe9J5A5FLIQQ== X-Google-Smtp-Source: AGHT+IGYeHhzkVSBoUBsvRT2RuWdEvwbXQBjr6L6XX72+sTFuBY3PIU6KSQyRXLF/F5D5Ett9DsQDXPsgMbzu8xETLc= X-Received: by 2002:a2e:bc1f:0:b0:32a:5e84:d853 with SMTP id 38308e7fff4ca-32ac731b864mr1977581fa.12.1749030113421; Wed, 04 Jun 2025 02:41:53 -0700 (PDT) MIME-Version: 1.0 References: <4e2305d6-b067-4963-b16a-367a254d22c1@nvidia.com> <20250526074845.GA2848800@tiffany> <20250526093258.GA3489925@tiffany> <20250526111744epcms1p89d664f5cebd1e690730f32b66c24e3c0@epcms1p8> <20250528012329.GA1545287@tiffany> <20250528033626.GA1607193@tiffany> <223cf8fc-7743-497f-893c-37ac689af002@redhat.com> In-Reply-To: From: Zhaoyang Huang Date: Wed, 4 Jun 2025 17:41:41 +0800 X-Gm-Features: AX0GCFtsEcUkyUErXECONFwDVy1bM0_J5XFMUMk12n55pf2KMOwMiBx29al2y4k Message-ID: Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased To: David Hildenbrand Cc: Hyesoo Yu , jaewon31.kim@samsung.com, John Hubbard , "zhaoyang.huang@unisoc.com" , "surenb@google.com" , "Steve.Kang@unisoc.com" , Jaewon Kim , "linux-mm@kvack.org" , Jang-Hyuck Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 6fyudxk6niqam4m14qyxni4jf5th75un X-Rspamd-Queue-Id: 9FC69120009 X-Rspamd-Server: rspam11 X-HE-Tag: 1749030115-878373 X-HE-Meta: U2FsdGVkX18iNorheMT5mrFx7ODYqBeW/2AqbYukbUIS+9QFagUWK0SVWGuwFSa6cXPjhkO/Q8p4RclTTrvFtqp1e450Pbzjy/J1o++LQXRTDZYWufEZQwf/X07UZO4igxcSCI5YZeJNLh/WFQjPxXpB7zDYnHa+riyBNeagubDwUBNhcTJub2nCTyr3xX757SCEZrs8lZirNh1vrZLgTGtMRiaYci+I9b2kMSiTeXhDN4NWmQgLU59zADW/p3J+mI46akgqdhQzNHWb1C0MpYTG3TkDYDn5OSDH0XJG7NAoR7HiRPq2blMUHC0bERjb1Gu6N2Po+GcJLrcL2MVbJnpSgxkyT2DhKeJ5PDE8PUuAZcGYRM88I5qhu/7M8vcboUTQh1L25uLcGmwDwvM/zDBL0f1D6WTRXCXC/u6HRFY49Zcgwu0R0+lxjrdLtB8MqRk7AYARsLSo3BXrnC/W9VswHQL7djk3b1YTwGyPQt/dDwlfz7GXwqT3HJmAJhQu9cNmPLV1ivLAUBJ5t2AtUdIKXzbU3rjCkfhngqPIyvIvB4bJuOpbfu/QmvW4+G8VT6O3+vkrpcTk6Dd1DFgR3l96wdKCtkKaarLMpKC9jL97i8jPemO8+fNxUR80JIYWHsbPlYD+u/9Pz/bcqU0AYgFXVz7DO89wfHS1otSNUp/vop1fHpENlfAWvQtcCy5muIWR0Kr0HT0GNDaQpdazq1SwFBldr/Xk4AJTVA4ek5Qv/zCQ+HHQSV+Zc/A+1dzylb6jn6S+jCxNNgwA6Ax7RKGGP3Wn038I2Moeq3p21rPbxpFnZfoyRPYmX/MKVcfMVA/55OCglOye4++lLOYTkVrX8ik7HbLUcu3FjhGUzHDv6O6K3jNY98ntj2G+TNYhZThpFUl72GpFlHmbxqkL2s63hIefnBcBFQ2pOBkvBemnr4TiKMgD1R1Yj792AL3zu82GSasNCJ8ZOhwQGoe oYhpQI/p 6OnulswPuwnUgzr7PX9s60BRQOH58ISTvExnvVZ//KVqb4ctmkxsyzwqrVNiEZNhgaaBi8sBj6sYOitep4EA/BwhQpGIycxxxp8AR/MRAK62GTgie9QAzYHKP4NBj83BU8sqbVXlghP9o+0Xc+tY2hCqn2QHT7rroORg+nqfYok/PqpOxJtheeAHBCpwaQdew/erk7DG6vPBwiRderEvTXdRfuAeTCF6/uFSrAI+OyhWl6l/lZgQjKavBqPXkP/g/nbWgRQ0Dg10z+rhfs6xCVTgUhC4j4whoewwVfRCV+FB4TUosVKLRuUVn/6iOGSVuGA21yyMYzw2zGGmpgj66ex6Kpi+N0CcCCtJBuKP5jP+cutq3FAlibNthJS7nmehRtM/HHjw/5o9d5+2Q6BmaHXTXAuyuzrOusjfrECAPNrDGd3zwlx+wKfQK60V0am6SulYNZBZxFs0X5bSZYtoCs/Q0Kj574XJjSq24bH9SOAjWXMOOUWyXZGyY4XC9KTq67vVv2lganfU9PD/3K8E63xOlmA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 4, 2025 at 5:12=E2=80=AFPM David Hildenbrand = wrote: > > On 04.06.25 03:04, Zhaoyang Huang wrote: > > On Tue, Jun 3, 2025 at 9:12=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 28.05.25 12:59, Zhaoyang Huang wrote: > >>> On Wed, May 28, 2025 at 3:55=E2=80=AFPM David Hildenbrand wrote: > >>>> > >>>> On 28.05.25 05:36, Hyesoo Yu wrote: > >>>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote: > >>>>>> On Wed, May 28, 2025 at 9:25=E2=80=AFAM Hyesoo Yu wrote: > >>>>>>> > >>>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote: > >>>>>>> > >>>>>>> Hello, Zhaoyang. > >>>>>>> > >>>>>>> I don't believe commit 1aaf8c was just intended to prevent an inf= inite loop. > >>>>>>> The commit was introduced to allow pinning CMA memory in the pKVM= on AOSP. > >>>>>>> > >>>>>>> That leads me to question whether the assumption that CMA can be = long-term pinned is actually valid. > >>>>>> That depends on the user of CMA, yes for my scenario since it work= ed > >>>>>> for the guest os. For common scenario such as the file/anon mappin= g, > >>>>>> the page will be judged as unpinnable for long-term and be migrate= d > >>>>>> out of CMA area. > >>>>> > >>>>> Your scenario and the common scenarios can not be distinguished fro= m the kernel API's perspective. > >>>>> Even in common cases, the page may be in a non-LRU state temporiari= ly, and in such situations, > >>>>> pinning CMA can lead to bugs - we've encountered multiple issues be= cause of this. > >>>>> > >>>> > >>>> Right. We just disallow long-term pinning CMA pages, because we don'= t > >>>> know who the real owner is that would be okay with long-term pinning= them. > >>>> > >>>>>>> > >>>>>>> In my opinion, it might be more appropriate to revert that commit= 1aaf8c and instead ensure > >>>>>>> that pKVM avoids using CMA for memory that requires long-term pin= ning through GUP ? > >>>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over > >>>>>> non-LRU CMA pages. > >>>>> > >>>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_L= ONGTERM. > >>>>> > >>>>> * In the CMA case: long term pins in a CMA region would unnecessari= ly fragment > >>>>> * that region. And so, CMA attempts to migrate the page before pin= ning, when > >>>>> * FOLL_LONGTERM is specified. > >>>>> > >>>>> Given this, would it make sense to avoid using FOLL_LONGTERM in thi= s code path ? > >>>> > >>>> If something is unbounded in time, FOLL_LONGTERM is the right thing = to use. > >>>> > >>>>>>> > >>>>>>> Alternatively, instead of changing the current logic that prevent= s longterm GUP from pinning CMA, > >>>>>>> it would be better to propose a new patch that specifically addre= sses the pKVM scenario like adding new FOLL_flags ? > >>>>>> I don't think so. pin_user_pages is an exported API which can't ma= ke > >>>>>> assumptions over the caller. > >>>>> > >>>>> My point is not to base the patch on assumptions about the caller, > >>>>> but to define a clear mechanism that ensures safe behavior in the i= ntended scenario. > >>>>> > >>>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpi= nnable pages. > >>>> > >>>> Not sure which exact semantics you have in mind. But failing if we w= ould > >>>> have to migrate might be ok. Not sure if the caller should worry abo= ut > >>>> that, though: the caller should not have to worry about page placeme= nt > >>>> in general. > >>> With going over the whole thread, I think the root cause is > >>> collect_longterm_unpinnable_folios() hit the race window between > >>> lru_add_drain_all() and folio_isolate_lru() by chance and returned > >>> with ret=3D0 which finally have the CMA page pinned, right? However, = I > >>> find the proposed patch below will fail the PKVM > >>> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA > >>> pages never go to LRU which will have the __gup_longterm_locked loop > >>> in do while(ret =3D=3D -EAGAIN) as it did before 1aaf8c. I think the = key > >>> point is to find a way to distinguish the temporary(on the way to LRU= ) > >>> and permanent CMA pages within collect_longterm_unpinnable_folios. > >>> > >>> static long > >>> check_and_migrate_movable_pages_or_folios(struct pages_or_folios *= pofs) > >>> { > >>> + bool any_unpinnable; > >>> LIST_HEAD(movable_folio_list); > >>> > >>> - collect_longterm_unpinnable_folios(&movable_folio_list, pofs)= ; > >>> - if (list_empty(&movable_folio_list)) > >>> - return 0; > >>> + any_unpinnable =3D > >>> collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > >>> + if (list_empty(&movable_folio_list)) { > >>> + if (any_unpinnable) > >>> + pofs_unpin(pofs); > >>> + return any_unpinnable ? -EAGAIN : 0; > >>> + } > >>> > >> So, what's the status of that? We should fix it upstream (*not* caring > >> about controversial out-of-tree pkvm issues). > > Leaving aside the pkvm issue, we should also care about the CMA pages > > mapping to VM by special driver which are intended to be long term > > pinned (actually they are fetched by cma_alloc and then mapped to VM > > instead of alloc_pages during normal page fault). > > Is there any such "special driver" in the tree? Not that I know of. However, pin_user_pages is exported symbol which could be used for ko, should we make it be capable of dealing with this scenario? > > > Could we distinguish > > them by the patch below based on 1aaf8c122, that is, this kind of > > pages is not on page cache and have equaled refcnt to mapcount > > No, not like that. We'd need some proper indication that this page was > allocated by the CMA area owner, and that owner agrees that the folio > can be long-term pinned (maybe that agreement is by mapping it into user > space, tbd). I think the key point is to distinguish the cma pages which are allocated from fallback of GFP_MOVABLE during common page faults from the ones which got from cma_alloc within the special driver's vm_ops->fault. > > Will you send the fix or should I do it? Discussing about broken use > cases that do no apply upstream is not particularly helpful when we're > dealing with a real upstream bug. I would like to ask for your help on this since I have no further ideas. Th= anks > > -- > Cheers, > > David / dhildenb >