From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EE88C36002 for ; Mon, 7 Apr 2025 02:58:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1DF246B0008; Sun, 6 Apr 2025 22:58:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 18D386B000A; Sun, 6 Apr 2025 22:58:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07C366B000C; Sun, 6 Apr 2025 22:58:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DD7376B0008 for ; Sun, 6 Apr 2025 22:58:02 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D2E43140A28 for ; Mon, 7 Apr 2025 02:58:02 +0000 (UTC) X-FDA: 83305738404.25.DFA265B Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf16.hostedemail.com (Postfix) with ESMTP id DB42518000C for ; Mon, 7 Apr 2025 02:58:00 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NNqMJer0; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf16.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743994681; a=rsa-sha256; cv=none; b=K7fhkrNpDjrdyZQPkTvXlCRF8VsR2V7avjVj0MBhxMitPZi1iiOGcr15VdMlryG2cMpj+Z Sq2xiMqAz+Q6hwrSUq1520VYzFzFlcSE6ANEJEQQ//h+9M1iql93DQkMkQffFkfn5o+NRS /kjDdyOkOqoz9GwX0tlqafBrW2HmCy0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NNqMJer0; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf16.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743994681; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZBuX9pTkYEYQIsGttm8YDZUcTpLqDztcPCUqME/aBCU=; b=8fidik3WxMGtp6hnTz3UJtfJTrB8b99scDHf11OmgYZVPWD46yOi66urvvv3o9oEoqzKiz nwnu1TozoGcGmrs5SK+eMOhoL5voVM+qtl3UQxtklOMJMm6X4K7kVFPTnCzOMoObg7u1EC gAHf1SOhgfiSJW6ABGEIHCDv1lMBu2M= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1743994678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZBuX9pTkYEYQIsGttm8YDZUcTpLqDztcPCUqME/aBCU=; b=NNqMJer0m9dJRGlkPVa7Uctlck1sAw997K2mD9x+TyavG6JzsL0AcggRwKsdQsUmoOYaLj Q52AiJUZ01CVfwXjWJeu3sLUAU3ovtaldVFlrie4RtLBrCgWJiLCGllF9xNkW0rksJ0OD8 eUiqvMAS3SdXCsGAkLwU2m1lCuVvwvc= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC PATCH 0/6] Deep talk about folio vmap X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Mon, 7 Apr 2025 10:57:13 +0800 Cc: bingbu.cao@linux.intel.com, Christoph Hellwig , Matthew Wilcox , Gerd Hoffmann , Vivek Kasireddy , Sumit Semwal , =?utf-8?Q?Christian_K=C3=B6nig?= , Andrew Morton , Uladzislau Rezki , Shuah Khan , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, opensource.kernel@vivo.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <20250327092922.536-1-link@vivo.com> <20250404090111.GB11105@lst.de> <9A899641-BDED-4773-B349-56AF1DD58B21@linux.dev> <43DD699A-5C5D-429B-A2B5-61FBEAE2E252@linux.dev> To: Huan Yang X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DB42518000C X-Stat-Signature: hpwasbba98o9tiwrz443n8hsniznk1nm X-Rspam-User: X-HE-Tag: 1743994680-171911 X-HE-Meta: U2FsdGVkX18B+HtyGpwmCdv99R4Hz7ppIvhcW0pfV/luc/C9coHT81uDNnDH6Y5zlqQabpNrNlSoB4ZJCzpyXG402Yc3usnw19+yOlhLld3t1LdqCUdBMhJU9NBMyKjS4+jYioYG8EPfgj+zUQ9fNK4D1aDxS71Q9FFttwKhNhlE9Bl57AVg748wQ8zA/5QpOBTd0ncmxwacKpPbyE2s7zUZG4FmL28AYwI8L9urLsY2MpDsR2q6U37Ge65lTuMDVM+WeAGiQVBnj3h77vHEQBY3PnbEYQoy3pS4GXFRgY3c4BMBJYB9QXMt/Ls9ijNUAQq5xWm/xzqykDNk3dYJkVGG5aYRD7KPOeJPN9NthkMiAbuJ8VqyTko+7X2iDYgQtnpulwbd3DUfx5YG/LDWrOAfeSyFfJcP4gducBjKmIOwjsdEvcC7VqDgo+EVu1KBbVMJw85el+a/znY1B4OwweuYHn6znt3SzyeToymWKMATraoX7mLRaQqiaeNoHVuxkwIMReZa9UUuchSCmYTM14V17HLrjlPD3pQO2iTC3zBrySyju/M10ICrXdzdJclhWTn9BaYEsWFAsk+cqzBoTrTjNB8tiwXdGWfYZGGkiio5tCHsJ1DVRKly2kks5uYDDjIR5nR8RJbIkMJlgYCpAr3x+YLtgDiJy1hIXOZvkEZ9aQ1zuu9z0C+Iw9n4XN0jMj8daAVFQKCXwnen8p4pimncaZsPjN71OvlLcXpMsqvWNH4icBOwzeizki3ew5KbvH6JjdHU/aB2+U8ykhxhnJCmgeyNGw/K4h1lmqCHIuSMeohZLC5xh5S20GNGkU6ZaolVDqKfNmBOj7F5welxyz7a54Vhdto1mgXCY/UW4O/bQzGgvXIv8sxXBQKAzeKZyLYl8OUSQTEtVsbm/v9u7CTKJz2TDf20t7J524L5ZmSKaqHy6x7X9wrYgebjYu0tLnEgbBD4UK2PBSVnPUb cDPFjoMB B9SOybzFun34Rpi8LKQjjlwiIrpavffy9pDaNVg0oPf2g4rnBC9Q/P0qDs7IXao689Q5JaDpN/HOtTFLXFpXA95ZkF3a/LFJBKd5nq2LSRrE7M/Ru+HSyUiOB3iBIHghTJWzChmQnUyTejiIbCmyjGkx4S4yfinNlbqNX/mZcj0H2ZA8062vfXHmLPmHOb5jLcJD1+RG5Hfa9nwTmWEosl8Wfvd7PMu6vDZZSeXVnNafHwWjD2YFP852kAMuGRpyLPJfcw2f2hM0reN9s//HNuoj6q4buo+lsiGGzSPg/nKT1M9U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Apr 7, 2025, at 09:59, Huan Yang wrote: >=20 >=20 > =E5=9C=A8 2025/4/4 18:07, Muchun Song =E5=86=99=E9=81=93: >>=20 >>> On Apr 4, 2025, at 17:38, Muchun Song wrote: >>>=20 >>>=20 >>>=20 >>>> On Apr 4, 2025, at 17:01, Christoph Hellwig wrote: >>>>=20 >>>> After the btrfs compressed bio discussion I think the hugetlb = changes that >>>> skip the tail pages are fundamentally unsafe in the current kernel. >>>>=20 >>>> That is because the bio_vec representation assumes tail pages do = exist, so >>>> as soon as you are doing direct I/O that generates a bvec starting = beyond >>>> the present head page things will blow up. Other users of bio_vecs = might >>>> do the same, but the way the block bio_vecs are generated are very = suspect >>>> to that. So we'll first need to sort that out and a few other = things >>>> before we can even think of enabling such a feature. >>>>=20 >>> I would like to express my gratitude to Christoph for including me = in the >>> thread. I have carefully read the cover letter in [1], which = indicates >>> that an issue has arisen due to the improper use of `vmap_pfn()`. = I'm >>> wondering if we could consider using `vmap()` instead. In the HVO = scenario, >>> the tail struct pages do **exist**, but they are read-only. I've = examined >>> the code of `vmap()`, and it appears that it only reads the struct = page. >>> Therefore, it seems feasible for us to use `vmap()` (I am not a = expert in >>> udmabuf.). Right? >> I believe my stance is correct. I've also reviewed another thread in = [2]. >> Allow me to clarify and correct the viewpoints you presented. You = stated: >> " >> So by HVO, it also not backed by pages, only contains folio head, = each >> tail pfn's page struct go away. >> " >> This statement is entirely inaccurate. The tail pages do not cease to = exist; >> rather, they are read-only. For your specific use-case, please use = `vmap()` >> to resolve the issue at hand. If you wish to gain a comprehensive = understanding >=20 > I see the document give a simple graph to point: >=20 > +-----------+ ---virt_to_page---> +-----------+ mapping to = +-----------+ > | | | 0 | = -------------> | 0 | > | | +-----------+ +-----------+ > | | | 1 | = -------------> | 1 | > | | +-----------+ +-----------+ > | | | 2 | = ----------------^ ^ ^ ^ ^ ^ > | | +-----------+ | | | | | > | | | 3 | = ------------------+ | | | | > | | +-----------+ | | | | > | | | 4 | = --------------------+ | | | > | PMD | +-----------+ | | | > | level | | 5 | = ----------------------+ | | > | mapping | +-----------+ | | > | | | 6 | = ------------------------+ | > | | +-----------+ | > | | | 7 | = --------------------------+ > | | +-----------+ > | | > | | > | | > +-----------+ >=20 > If I understand correct, each 2-7 tail's page struct is freed, so if I = just need map page 2-7, can we use vmap do >=20 > something correctly? The answer is you can. It is essential to distinguish between virtual address (VA) and physical address (PA). The VAs of tail struct pages aren't freed but remapped to the physical page mapped by the VA of the head struct page (since contents of those tail physical pages are the same). Thus, the freed pages are the physical pages mapped by original tail struct pages, not their virtual addresses. Moreover, while it is possible to read the virtual addresses of these tail struct pages, any write operations are prohibited since it is within the realm of acceptability that the kernel is expected to perform write operations solely on the head struct page of a compound head and conduct read operations only on the tail struct pages. BTW, folio infrastructure is also based on this assumption. Thanks, Muchun. >=20 > Or something I still misunderstand, please correct me. >=20 > Thanks, >=20 > Huan Yang >=20 >> of the fundamentals of HVO, I kindly suggest a thorough review of the = document >> in [3]. >>=20 >> [2] = https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@vivo.com= /#t >> [3] Documentation/mm/vmemmap_dedup.rst >>=20 >>> [1] = https://lore.kernel.org/linux-mm/20250327092922.536-1-link@vivo.com/T/#m05= 5b34978cf882fd44d2d08d929b50292d8502b4 >>>=20 >>> Thanks, >>> Muchun. >>>=20 >>=20