From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 528FDC3601A for ; Mon, 7 Apr 2025 06:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D35C6B0008; Mon, 7 Apr 2025 02:44:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 787226B000A; Mon, 7 Apr 2025 02:44:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64B466B000C; Mon, 7 Apr 2025 02:44:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 46EF06B0008 for ; Mon, 7 Apr 2025 02:44:07 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 37350BA9CD for ; Mon, 7 Apr 2025 06:44:08 +0000 (UTC) X-FDA: 83306308176.04.1B81A75 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) by imf21.hostedemail.com (Postfix) with ESMTP id E76041C0006 for ; Mon, 7 Apr 2025 06:44:03 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DSQiM1yG; spf=temperror (imf21.hostedemail.com: error in processing during lookup of muchun.song@linux.dev: DNS error) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744008246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e/CZwjzLThDPBcpzNJbPn8jjn4kcP3tilyZ15dYYTMw=; b=lZHcVHhBPeONPNYP8oMQAluuT8I4+mb8tBJA5MHj+b0oI0b+RUyVbWCnfLrk3IOIMUaobe OAJ/+dearKq0HSEYqkPMk0i3GbU1TScAQ5kW0gYnWnpnmAYokWEGH3Ol3yObi6BmBiE8o8 UcdblHCdr8lXADtxhfcMRtYugDVGKsA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=DSQiM1yG; spf=temperror (imf21.hostedemail.com: error in processing during lookup of muchun.song@linux.dev: DNS error) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744008246; a=rsa-sha256; cv=none; b=tjgOzYg9P2Ps1Fva8VwRXYsqczr3Q3XlwQZ5lAbRjqpBbH3cjP89+gBdtYvpNWyxc1MsVO +FWeCCRVS9VXUmbKeynkxzQ5h31WKBLIXNtk1bLeWDsWi5MAfDA+h+8sWDd3XY5rMhBpF+ 6CZobZDamXKQ4CdERDfOCqm7GDwCcoY= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1744008241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e/CZwjzLThDPBcpzNJbPn8jjn4kcP3tilyZ15dYYTMw=; b=DSQiM1yGCPxf+jlCgxKzo6DIH9Mb2RciRD3k0uOS3Z6dngdRX8F18kW9ovmnjkk1INtygE H6W9NuTR8vMLMJ8CCg6sJWxbCU9/cm9OO6cDoxKn+Be/aHqiA9BSCu4etl3bzLZY2mtgGB lT5blpH8gSSw/x69NF6HOoa2yrXWk3A= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC PATCH 0/6] Deep talk about folio vmap X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Mon, 7 Apr 2025 14:43:20 +0800 Cc: bingbu.cao@linux.intel.com, Christoph Hellwig , Matthew Wilcox , Gerd Hoffmann , Vivek Kasireddy , Sumit Semwal , =?utf-8?Q?Christian_K=C3=B6nig?= , Andrew Morton , Uladzislau Rezki , Shuah Khan , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, opensource.kernel@vivo.com Content-Transfer-Encoding: quoted-printable Message-Id: <35D26C00-952F-481C-8345-E339F0ED770B@linux.dev> References: <20250327092922.536-1-link@vivo.com> <20250404090111.GB11105@lst.de> <9A899641-BDED-4773-B349-56AF1DD58B21@linux.dev> <43DD699A-5C5D-429B-A2B5-61FBEAE2E252@linux.dev> <6f76a497-248b-4f92-9448-755006c732c8@vivo.com> To: Huan Yang X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E76041C0006 X-Stat-Signature: iozuw9q8de6a43w5pcua49bp46zh1xpg X-HE-Tag: 1744008243-399174 X-HE-Meta: U2FsdGVkX1+6kZpQle+XxsKMq7IVmkczkNdguJqp2aDLdTtbfKySQU8hB0WWrh3LFeWaiNr32NeT/YIiWHpHGiEXK5QK1D4ZYGjDFwHhpOYYcOCdMn6wp8WfwETAiGtteUgp3lXnjYGGNVAb5ufIt9XXwGPAYGtOEb/sfwoD3elPSTNx6kig1J0NtFvEXIIxuyXOYEKl4UCzi2wGeZDYM6mZAeG02M4rlsH2z3Bpe/IUTLTbR98KR+cbp/OqixCHeso3mPZcaJaF4fv5t6bila52UoP/wzsonRlPch4nqrz/nq9tnFZFqZWkdbX+lNDREmiEWe/Gq/LOWHS5c2C4eA93SH/4q7RU5AQ14/jpAnjblvyzNGn47XJoXAX25OS9wyQKcTVdLpwWpM28w9YK3laUN/s4VPrFJ5wJ0vV3B8ANuQpBr5njV++acz0rk1tkfZubNxY+wCq090UK5sW5JDM/Dv1qaSztzo0Rq+KvJKqiMaBo4101Oi1Fcpu2kwE/pMDg+oZUePSnwq65NdWrduKSardJ9tJKyd8u9ouKePKYhrxQMFv+KTrH5YzI7bmALSzqg1c9rhBDOuVYgiryXG35r6oS15AjhVyI3jfd8a/RfR+4Cp9A5Haw4N8g2Mv0vMfQmx/zHvGGGs+/eQ5rpI2GpHEqjoB0GNe6U0YdPcCa0RC2/cqwcwyf/PXNmPzHWnEo2qOnB5iNziW3zGS+6x6pCfr0rh5xf4OIGtFcSK/F0+yRvxLinfc3+xuxHWQITOUE8f58dofDT2E7Q3ZTA4DKRK3+o9/OV1/sZFmlnDWbJVRgLSmqIUB43xGEs/VtDwCcsaKG5IZbsDzXkeO5HjJSxtOY6kkyUfnq1Q327tPMUyK2b5wXc2pSkpxT6pxD9GEjZpx64L6hg46H6hAOMV88Oi8WeDF+0fA4SDmBwnNdYDwDBDUaNpJEEskx565ZwzWG42VlW3PTX7u6LlN wHPmEMbZ pbzU9IA3qOYSzaZ7k56KO5m2sSXlihV8lg9WY8G3GPq+rzZWefshnTJeYPAGH4sxDQf30LmCAiK0rVbEW/7Zl6zdNzeoSNR+S9M4ycFJqotGnUwqA+Llh5TV6p2FdrxKLRX1G6UC7iF1OTgvJPQtZEBib0LhcdOUvl1sI0ii2xYLVJm0/1kGk9OoVJSPx902KkbpRkt09VKYuPUNZsmM2mnaeUNzQCevkRAxjXQdoVu228rOWZ8YMl7rpWNmUbGC8+h3lDN99Dwilf3OdTutYQndqX/0D0usXVZ/j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Apr 7, 2025, at 11:37, Muchun Song wrote: >=20 >=20 >=20 >> On Apr 7, 2025, at 11:21, Huan Yang wrote: >>=20 >>=20 >> =E5=9C=A8 2025/4/7 10:57, Muchun Song =E5=86=99=E9=81=93: >>>=20 >>>> On Apr 7, 2025, at 09:59, Huan Yang wrote: >>>>=20 >>>>=20 >>>> =E5=9C=A8 2025/4/4 18:07, Muchun Song =E5=86=99=E9=81=93: >>>>>> On Apr 4, 2025, at 17:38, Muchun Song = wrote: >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>> On Apr 4, 2025, at 17:01, Christoph Hellwig wrote: >>>>>>>=20 >>>>>>> After the btrfs compressed bio discussion I think the hugetlb = changes that >>>>>>> skip the tail pages are fundamentally unsafe in the current = kernel. >>>>>>>=20 >>>>>>> That is because the bio_vec representation assumes tail pages do = exist, so >>>>>>> as soon as you are doing direct I/O that generates a bvec = starting beyond >>>>>>> the present head page things will blow up. Other users of = bio_vecs might >>>>>>> do the same, but the way the block bio_vecs are generated are = very suspect >>>>>>> to that. So we'll first need to sort that out and a few other = things >>>>>>> before we can even think of enabling such a feature. >>>>>>>=20 >>>>>> I would like to express my gratitude to Christoph for including = me in the >>>>>> thread. I have carefully read the cover letter in [1], which = indicates >>>>>> that an issue has arisen due to the improper use of `vmap_pfn()`. = I'm >>>>>> wondering if we could consider using `vmap()` instead. In the HVO = scenario, >>>>>> the tail struct pages do **exist**, but they are read-only. I've = examined >>>>>> the code of `vmap()`, and it appears that it only reads the = struct page. >>>>>> Therefore, it seems feasible for us to use `vmap()` (I am not a = expert in >>>>>> udmabuf.). Right? >>>>> I believe my stance is correct. I've also reviewed another thread = in [2]. >>>>> Allow me to clarify and correct the viewpoints you presented. You = stated: >>>>> " >>>>> So by HVO, it also not backed by pages, only contains folio = head, each >>>>> tail pfn's page struct go away. >>>>> " >>>>> This statement is entirely inaccurate. The tail pages do not cease = to exist; >>>>> rather, they are read-only. For your specific use-case, please use = `vmap()` >>>>> to resolve the issue at hand. If you wish to gain a comprehensive = understanding >>>> I see the document give a simple graph to point: >>>>=20 >>>> +-----------+ ---virt_to_page---> +-----------+ mapping to = +-----------+ >>>> | | | 0 | = -------------> | 0 | >>>> | | +-----------+ +-----------+ >>>> | | | 1 | = -------------> | 1 | >>>> | | +-----------+ +-----------+ >>>> | | | 2 | = ----------------^ ^ ^ ^ ^ ^ >>>> | | +-----------+ | | | | | >>>> | | | 3 | = ------------------+ | | | | >>>> | | +-----------+ | | | | >>>> | | | 4 | = --------------------+ | | | >>>> | PMD | +-----------+ | | | >>>> | level | | 5 | = ----------------------+ | | >>>> | mapping | +-----------+ | | >>>> | | | 6 | = ------------------------+ | >>>> | | +-----------+ | >>>> | | | 7 | = --------------------------+ >>>> | | +-----------+ >>>> | | >>>> | | >>>> | | >>>> +-----------+ >>>>=20 >>>> If I understand correct, each 2-7 tail's page struct is freed, so = if I just need map page 2-7, can we use vmap do >>>>=20 >>>> something correctly? >>> The answer is you can. It is essential to distinguish between = virtual >>=20 >> Thanks for your reply, but I still can't understand it. For example, = I need vmap a hugetlb HVO folio's >>=20 >> 2-7 page: >>=20 >> struct page **pages =3D kvmalloc(sizeof(*pages), 6, GFP_KENREL); >>=20 >> for (i =3D 2; i < 8; ++i) >>=20 >> pages[i] =3D folio_page(folio, i); //set 2-7 range page into = pages, >>=20 >> void *vaddr =3D vmap(pages, 6, 0, PAGE_KERNEL); >>=20 >> For no HVO pages, this can work. If HVO enabled, do "pages[i] =3D = folio_page(folio, i);" just >>=20 >> got the head page? and how vmap can correctly map each page? >=20 > Why do you think folio_page(folio, i) (i =E2=89=A0 0) returns the head = page? > Is it speculation or tested? Please base it on the actual situation > instead of indulging in wild thoughts. By the way, in case you truly struggle to comprehend the fundamental aspects of HVO, I would like to summarize for you the user-visible behaviors in comparison to the situation where HVO is disabled. HVO Status Tail Page Structures Head Page Structures Enabled Read-Only (RO) Read-Write (RW) Disabled Read-Write (RW) Read-Write (RW) The sole distinction between the two scenarios lies in whether the tail page structures are allowed to be written or not. Please refrain from getting bogged down in the details of the implementation of HVO. Thanks, Muchun. >=20 > Thanks, > Muchun. >=20 >>=20 >> Please correct me. :) >>=20 >> Thanks, >>=20 >> Huan Yang >>=20 >>> address (VA) and physical address (PA). The VAs of tail struct pages >>> aren't freed but remapped to the physical page mapped by the VA of = the >>> head struct page (since contents of those tail physical pages are = the >>> same). Thus, the freed pages are the physical pages mapped by = original >>> tail struct pages, not their virtual addresses. Moreover, while it >>> is possible to read the virtual addresses of these tail struct = pages, >>> any write operations are prohibited since it is within the realm of >>> acceptability that the kernel is expected to perform write = operations >>> solely on the head struct page of a compound head and conduct read >>> operations only on the tail struct pages. BTW, folio infrastructure >>> is also based on this assumption. >>>=20 >>> Thanks, >>> Muchun. >>>=20 >>>> Or something I still misunderstand, please correct me. >>>>=20 >>>> Thanks, >>>>=20 >>>> Huan Yang >>>>=20 >>>>> of the fundamentals of HVO, I kindly suggest a thorough review of = the document >>>>> in [3]. >>>>>=20 >>>>> [2] = https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@vivo.com= /#t >>>>> [3] Documentation/mm/vmemmap_dedup.rst >>>>>=20 >>>>>> [1] = https://lore.kernel.org/linux-mm/20250327092922.536-1-link@vivo.com/T/#m05= 5b34978cf882fd44d2d08d929b50292d8502b4 >>>>>>=20 >>>>>> Thanks, >>>>>> Muchun.