* Re: [PATCH v2] Documentation: Refactored watchdog old doc
From: Guenter Roeck @ 2026-04-10 17:27 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Sunny Patel, Wim Van Sebroeck, Shuah Khan, linux-watchdog,
linux-doc, linux-kernel
In-Reply-To: <87ik9y229e.fsf@trenco.lwn.net>
On Fri, Apr 10, 2026 at 10:45:01AM -0600, Jonathan Corbet wrote:
> Guenter Roeck <linux@roeck-us.net> writes:
>
> > On Fri, Apr 10, 2026 at 12:58:11PM +0530, Sunny Patel wrote:
> >> Good Point. So again revisited the watchdog core
> >> api and list out the deprecated one and marked
> >> as deprecated in doc and also mentioned it just
> >> for legacy driver and not for newer one.
> >>
> >> As someof the legacy driver still have reference
> >> to old api so just marked as deprecated in doc.
> >>
> >> Also checked with other watchdog related api
> >> which are deprecated in driver but still present
> >> in doc but didn't find any.
> >>
> >> ---
> >
> > The above would show up as commit message, there is no change log, and
> > this e-mail was sent as response to v1. And I can see that without even
> > looking at the patch itself.
> >
> > That makes me wonder what Documentation/process/submitting-patches.rst
> > is useful for. No one seems to bother reading it. We might as well
> > just remove it.
>
> It's good to point people at.
>
> I do think it needs a serious rewrite to, among other things, turn it
> into less of an intimidating tome. On my list of things to do. Now if
> I could only buy a larger drive to hold that whole list...
>
Let's have some fun:
1st AI prompt:
You are an experienced Linux kernel developer and an AI prompt expert.
Read linux/Documentation/process/submitting-patches.rst and generate an
AI prompt file named review.md which can be used by an AI agent to review
a patch submission and determine if it follows the guidance in
submitting-patches.rst.
2nd AI prompt:
Using @review.md, review the patch in index.html, which targets the
repository in the linux/ directory. Provide review output in review.log.
Result is below (raw file as generated by the AI agent, not reformatted).
Guenter
---
Status: NEEDS WORK
Summary:
The patch updates the watchdog API documentation by marking deprecated ioctls and adding descriptions for features like MAGICCLOSE and PRETIMEOUT. While the technical content is useful, the patch requires revisions to the commit
message and documentation text to fix typos, grammatical errors, and formatting issues.
Detailed Findings:
1. Commit Message & Description:
- The description starts with conversational filler ("Good Point. So again revisited...") which should be removed.
- The message is not written in the imperative mood (e.g., use "Mark WDIOC_GETTEMP as deprecated" instead of "Revisited... and marked...").
- Typos in the commit message: "someof" (some of), "Reierate" (Reiterate).
- The subject line "Documentation: Refactored watchdog old doc" should use the imperative mood: "Documentation: Refactor watchdog documentation".
2. Technical Content & Style:
- Several typos in the documentation text:
- "reboot tiemout" -> "reboot timeout" (Multiple occurrences).
- "characted 'V'" -> "character 'V'".
- "small number if legacy drivers" -> "small number of legacy drivers".
- Grammatical/Formatting issues:
- Missing space: "WDIOS_DISABLECARD,this" -> "WDIOS_DISABLECARD, this".
- Inconsistent capitalization: "USE WDIOC_SETPRETIMEOUT" should be "Use WDIOC_SETPRETIMEOUT".
- "New Drivers" should be "New drivers".
- The patch results in "No newline at end of file" for Documentation/watchdog/watchdog-api.rst.
3. Canonical Patch Format:
- The patch follows the basic canonical format (Subject, From, Sign-off, separators, changelog, and diffstat are all present).
Actionable Advice:
1. Rewrite the commit message to be professional and in the imperative mood. Remove any conversational references to previous discussions.
2. Perform a spell-check on the documentation changes. Specifically, fix "tiemout", "characted", and "if" (where "of" was intended).
3. Ensure proper spacing after punctuation (e.g., after the comma in the WDIOS_DISABLECARD description).
4. Add a newline at the end of Documentation/watchdog/watchdog-api.rst.
5. Use consistent sentence-case for instructions (e.g., "Use" instead of "USE").
^ permalink raw reply
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: Aaron Tomlin @ 2026-04-10 16:59 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Sebastian Andrzej Siewior, linux-doc, linux-kernel,
Christoph Hellwig, Frederic Weisbecker, Jens Axboe, Ming Lei,
Thomas Gleixner, Valentin Schneider, Waiman Long, Peter Zijlstra,
John Ogness
In-Reply-To: <87wlygb3wd.fsf@trenco.lwn.net>
[-- Attachment #1: Type: text/plain, Size: 833 bytes --]
On Thu, Apr 09, 2026 at 08:32:34AM -0600, Jonathan Corbet wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > I stumbled upon "isolcpus=managed_irq" which is the last piece which
> > can only be handled by isolcpus= and has no runtime knob. I knew roughly
> > what managed interrupts should do but I lacked some details how it is
> > used and what the managed_irq sub parameter means in practise.
> >
> > This documents what we have as of today and how it works. I added some
> > examples how the parameter affects the configuration. Did I miss
> > something?
>
> There's been a lot of silence on this one... should I pick this one up,
> or are there other plans for it...?
>
> Thanks,
>
> jon
Hi Jonathan,
Sorry.
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v2] Documentation: Refactored watchdog old doc
From: Jonathan Corbet @ 2026-04-10 16:45 UTC (permalink / raw)
To: Guenter Roeck, Sunny Patel
Cc: Wim Van Sebroeck, Shuah Khan, linux-watchdog, linux-doc,
linux-kernel
In-Reply-To: <fe3de980-e918-47ae-862a-969a5b117ae0@roeck-us.net>
Guenter Roeck <linux@roeck-us.net> writes:
> On Fri, Apr 10, 2026 at 12:58:11PM +0530, Sunny Patel wrote:
>> Good Point. So again revisited the watchdog core
>> api and list out the deprecated one and marked
>> as deprecated in doc and also mentioned it just
>> for legacy driver and not for newer one.
>>
>> As someof the legacy driver still have reference
>> to old api so just marked as deprecated in doc.
>>
>> Also checked with other watchdog related api
>> which are deprecated in driver but still present
>> in doc but didn't find any.
>>
>> ---
>
> The above would show up as commit message, there is no change log, and
> this e-mail was sent as response to v1. And I can see that without even
> looking at the patch itself.
>
> That makes me wonder what Documentation/process/submitting-patches.rst
> is useful for. No one seems to bother reading it. We might as well
> just remove it.
It's good to point people at.
I do think it needs a serious rewrite to, among other things, turn it
into less of an intimidating tome. On my list of things to do. Now if
I could only buy a larger drive to hold that whole list...
jon
^ permalink raw reply
* Re: [PATCH v12 2/2] hwmon: add support for MCP998X
From: Guenter Roeck @ 2026-04-10 16:22 UTC (permalink / raw)
To: Victor Duicu
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
linux-hwmon, devicetree, linux-kernel, linux-doc, marius.cristea
In-Reply-To: <20260403-add-mcp9982-hwmon-v12-2-b3bfb26ff136@microchip.com>
On Fri, Apr 03, 2026 at 04:32:17PM +0300, Victor Duicu wrote:
> Add driver for Microchip MCP998X/33 and MCP998XD/33D
> Multichannel Automotive Temperature Monitor Family.
>
> Signed-off-by: Victor Duicu <victor.duicu@microchip.com>
Applied.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH v9 0/2] Add support for Microchip EMC1812
From: Guenter Roeck @ 2026-04-10 15:51 UTC (permalink / raw)
To: Marius Cristea, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet
Cc: linux-hwmon, devicetree, linux-kernel, linux-doc, Conor Dooley
In-Reply-To: <20260403-hw_mon-emc1812-v9-0-1a798f31cf2e@microchip.com>
On 4/3/26 05:39, Marius Cristea wrote:
> This is the hwmon driver for EMC1812/13/14/15/33 multichannel Low-Voltage
> Remote Diode Sensor Family. The chips in the family have one internal
> and different numbers of external channels, ranging from 1 (EMC1812) to
> 4 channels (EMC1815).
> Reading diodes in anti-parallel connection is supported by EMC1814, EMC1815
> and EMC1833.
>
> Signed-off-by: Marius Cristea <marius.cristea@microchip.com>
Sashiko still reports numberous issues which I consider valid:
https://sashiko.dev/#/patchset/20260403-hw_mon-emc1812-v9-0-1a798f31cf2e%40microchip.com
Please fix.
Thanks,
Guenter
> ---
> Changes in v9:
> - improve the wording in the Documentation/hwmon/emc1812.rst file
> - add const to variables in the driver
> - initialize the EXT2_BETA_CONFIG only for the pats that support it
> - update the writeble regmap table to exclude read-only registers
> - Link to v8: https://lore.kernel.org/r/20260310-hw_mon-emc1812-v8-0-bc155727e0d2@microchip.com
>
> Changes in v8:
> - remove "address scan" from emc1812.rst documentation
> - change the second dimension of emc1812_limit_regs_low[][] to 2
> - clamp input value before doing math on it to avoid overflow
> - use rounding instead of truncation for 8 bits limit registers
> - fix misleading comment when HW ID is not recognized
> - Link to v7: https://lore.kernel.org/r/20260223-hw_mon-emc1812-v7-0-51e2676f4e20@microchip.com
>
> Changes in v7:
> - driver
> - fix an overflow emc1812_set_hyst
> - remove unused parameter in emc1812_set_temp
> - devicetree binding:
> - remove unneeded restrictions not to bloating the binding
> - Link to v6: https://lore.kernel.org/r/20260212-hw_mon-emc1812-v6-0-e37e9b38d898@microchip.com
>
> Changes in v6:
> - driver
> - fix an overflow when writing more then 191875 to limits stored on 8
> bits register
> - remove "i2c_set_clientdata" from probe
> - fix discrepancy where writing 16ms and reading it back returns 15ms
> at update interval
> - skip setting the ideality factor for channels that are not available
> on the device
> - devicetree binding:
> - change the way interrupts are described/used
> - add "microchip,enable-anti-parallel"
> - rewrite "allOf" section to be more clear
> - Link to v5: https://lore.kernel.org/r/20260205-hw_mon-emc1812-v5-0-232835aefe8f@microchip.com
>
> Changes in v5:
> - fix calculation in emc1812_get_limit_temp
> - use i2c_get_match_data cover the case when the driver is instantiated
> via I2C ID table.
> - replace dev_info with dev_warn
> - remove some unnecessary truncation on 8 bits
> - remove clamping when reading the temerature with hyst
> - not change the conversion rate at probe time
> - use a generic define to remove duplicate channel_info entries
> - Link to v4: https://lore.kernel.org/r/20260127-hw_mon-emc1812-v4-0-6bf636b54847@microchip.com
>
> Changes in v4:
> - fix file permissions for read only properties
> - fix calculation when the limits are written
> - remove the temp_min_hyst because the part doesn't support it
> - Link to v3: https://lore.kernel.org/r/20251218-hw_mon-emc1812-v3-0-a123ada7b859@microchip.com
>
> Changes in v3:
> - remove mesages that are not helpfull
> - fix an issue related to NULL labels
> - fix sign/unsign calculation
> - replace E2BIG with EINVAL
> - use BIT() to create mask
> - Link to v2: https://lore.kernel.org/r/20251121-hw_mon-emc1812-v2-0-5b2070f8b778@microchip.com
>
> Changes in v2:
> - update the interrupt section from yaml file
> - update index.rst
> - remove fault condition from internal sensor
> - remove unused members from structures
> - update the driver to work on systems without device tree or
> firmware nodes
> - add missing include files
> - make NULL labels to be not visible
> - corect sign/unsign calculations
> - corect possible underflow for limits
> - Link to v1: https://lore.kernel.org/r/20251029-hw_mon-emc1812-v1-0-be4fd8af016a@microchip.com
>
> ---
> Marius Cristea (2):
> dt-bindings: hwmon: temperature: add support for EMC1812
> hwmon: temperature: add support for EMC1812
>
> .../bindings/hwmon/microchip,emc1812.yaml | 184 ++++
> Documentation/hwmon/emc1812.rst | 67 ++
> Documentation/hwmon/index.rst | 1 +
> MAINTAINERS | 8 +
> drivers/hwmon/Kconfig | 11 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/emc1812.c | 965 +++++++++++++++++++++
> 7 files changed, 1237 insertions(+)
> ---
> base-commit: d2b2fea3503e5e12b2e28784152937e48bcca6ff
> change-id: 20251002-hw_mon-emc1812-f1b806487d10
>
> Best regards,
^ permalink raw reply
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: Thomas Gleixner @ 2026-04-10 15:32 UTC (permalink / raw)
To: Jonathan Corbet, Sebastian Andrzej Siewior, linux-doc,
linux-kernel
Cc: Aaron Tomlin, Christoph Hellwig, Frederic Weisbecker, Jens Axboe,
Ming Lei, Valentin Schneider, Waiman Long, Peter Zijlstra,
John Ogness
In-Reply-To: <87wlygb3wd.fsf@trenco.lwn.net>
On Thu, Apr 09 2026 at 08:32, Jonathan Corbet wrote:
>> This documents what we have as of today and how it works. I added some
>> examples how the parameter affects the configuration. Did I miss
>> something?
>
> There's been a lot of silence on this one... should I pick this one up,
> or are there other plans for it...?
Looks good to me.
Acked-by: Thomas Gleixner <tglx@kernel.org>
^ permalink raw reply
* Re: [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map
From: Nikita Kalyazin @ 2026-04-10 15:30 UTC (permalink / raw)
To: Ackerley Tng, Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgGb95C3uRxPy2_Uj7SmTW45hNNdJxi5RR209s5KYcHgBw@mail.gmail.com>
On 23/03/2026 21:15, Ackerley Tng wrote:
> "Kalyazin, Nikita" <kalyazin@amazon.co.uk> writes:
>
>>
>> [...snip...]
>>
>> static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>> {
>> struct inode *inode = file_inode(vmf->vma->vm_file);
>> struct folio *folio;
>> vm_fault_t ret = VM_FAULT_LOCKED;
>> + int err;
>>
>> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>> return VM_FAULT_SIGBUS;
>> @@ -418,6 +454,14 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>> folio_mark_uptodate(folio);
>> }
>>
>> + if (kvm_gmem_no_direct_map(folio_inode(folio))) {
>> + err = kvm_gmem_folio_zap_direct_map(folio);
>> + if (err) {
>> + ret = vmf_error(err);
>> + goto out_folio;
>> + }
>> + }
>> +
>> vmf->page = folio_file_page(folio, vmf->pgoff);
>>
>
> Sashiko pointed out that kvm_gmem_populate() might try and write to
> direct-map-removed folios, but I think that's handled because populate
> will first try and GUP folios, which is already blocked for
> direct-map-removed folios.
As far as I can see, it is a valid issue as populate only GUPs the
source pages, not gmem. I think this is similar to what was discussed
about TDX at some point and decided to exclude TDX support [1]. I
followed the same path and excluded SEV-SNP in the patch 8 [2]. I kept
your and David's "Reviewed-by:" for that patch, but please let me know
if this makes you change your minds.
[1] https://lore.kernel.org/kvm/aWpcDrGVLrZOqdcg@google.com
[2] https://lore.kernel.org/kvm/20260410151746.61150-9-kalyazin@amazon.com
>
>> out_folio:
>> @@ -528,6 +572,9 @@ static void kvm_gmem_free_folio(struct folio *folio)
>> kvm_pfn_t pfn = page_to_pfn(page);
>> int order = folio_order(folio);
>>
>> + if (kvm_gmem_folio_no_direct_map(folio))
>> + kvm_gmem_folio_restore_direct_map(folio);
>> +
>> kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
>> }
>>
>
> Sashiko says to invalidate then restore direct map, I think in this case
> it doesn't matter since if the folio needed invalidation, it must be
> private, and the host shouldn't be writing to the private pages anyway.
>
> One benefit of retaining this order (restore, invalidate) is that it
> opens the invalidate hook to possibly do something regarding memory
> contents?
>
> Or perhaps we should just take the suggestion (invalidate, restore) and
> align that invalidate should not touch memory contents.
>
>> @@ -591,6 +638,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>> /* Unmovable mappings are supposed to be marked unevictable as well. */
>> WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
>>
>> + if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
>> + mapping_set_no_direct_map(inode->i_mapping);
>> +
>> GMEM_I(inode)->flags = flags;
>>
>> file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
>> @@ -803,13 +853,22 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>> }
>>
>> r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>> + if (r)
>> + goto out_unlock;
>>
>> + if (kvm_gmem_no_direct_map(folio_inode(folio))) {
>> + r = kvm_gmem_folio_zap_direct_map(folio);
>> + if (r)
>> + goto out_unlock;
>> + }
>> +
>>
>> [...snip...]
>>
>
> Preparing a folio used to involve zeroing, but that has since been
> refactored out, so I believe zapping can come before preparing.
>
> Similar to the above point on invalidation: perhaps we should take the
> suggestion to zap then prepare
>
> + And align that preparation should not touch memory contents
> + Avoid needing to undo the preparation on zapping failure (.free_folio
> is not called on folio_put(), it is only called folio on removal from
> filemap).
I reordered both, thanks.
^ permalink raw reply
* Re: [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map
From: Nikita Kalyazin @ 2026-04-10 15:29 UTC (permalink / raw)
To: David Hildenbrand (Arm), Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
ackerleytng@google.com, yosry@kernel.org, ajones@ventanamicro.com,
maobibo@loongson.cn, tabba@google.com, prsampat@amd.com,
wu.fei9@sanechips.com.cn, mlevitsk@redhat.com,
jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <50bfaeb5-551e-403f-bd00-a7d8b6bbf6e2@kernel.org>
On 23/03/2026 18:05, David Hildenbrand (Arm) wrote:
> On 3/17/26 15:12, Kalyazin, Nikita wrote:
>> From: Patrick Roy <patrick.roy@linux.dev>
>>
>> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
>> ioctl. When set, guest_memfd folios will be removed from the direct map
>> after preparation, with direct map entries only restored when the folios
>> are freed.
>>
>> To ensure these folios do not end up in places where the kernel cannot
>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
>> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
>>
>> Note that this flag causes removal of direct map entries for all
>> guest_memfd folios independent of whether they are "shared" or "private"
>> (although current guest_memfd only supports either all folios in the
>> "shared" state, or all folios in the "private" state if
>> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
>> entries of also the shared parts of guest_memfd are a special type of
>> non-CoCo VM where, host userspace is trusted to have access to all of
>> guest memory, but where Spectre-style transient execution attacks
>> through the host kernel's direct map should still be mitigated. In this
>> setup, KVM retains access to guest memory via userspace mappings of
>> guest_memfd, which are reflected back into KVM's memslots via
>> userspace_addr. This is needed for things like MMIO emulation on x86_64
>> to work.
>>
>> Direct map entries are zapped right before guest or userspace mappings
>> of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
>> kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where
>> a gmem folio can be allocated without being mapped anywhere is
>> kvm_gmem_populate(), where handling potential failures of direct map
>> removal is not possible (by the time direct map removal is attempted,
>> the folio is already marked as prepared, meaning attempting to re-try
>> kvm_gmem_populate() would just result in -EEXIST without fixing up the
>> direct map state). These folios are then removed form the direct map
>> upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later.
>>
>> Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
>
> I you changed this patch significantly, you should likely add a
>
> Co-developed-by: Nikita Kalyazin <kalyazin@amazon.com>
>
> above your sob.
>
> (applies to other patches as well, please double check)
Added.
>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> Documentation/virt/kvm/api.rst | 21 ++++++-----
>> include/linux/kvm_host.h | 3 ++
>> include/uapi/linux/kvm.h | 1 +
>> virt/kvm/guest_memfd.c | 67 ++++++++++++++++++++++++++++++++--
>> 4 files changed, 79 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 032516783e96..8feec77b03fe 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6439,15 +6439,18 @@ a single guest_memfd file, but the bound ranges must not overlap).
>> The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
>> specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
>>
>> - ============================ ================================================
>> - GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
>> - descriptor.
>> - GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
>> - KVM_CREATE_GUEST_MEMFD (memory files created
>> - without INIT_SHARED will be marked private).
>> - Shared memory can be faulted into host userspace
>> - page tables. Private memory cannot.
>> - ============================ ================================================
>> + ============================== ================================================
>> + GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
>> + descriptor.
>> + GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
>> + KVM_CREATE_GUEST_MEMFD (memory files created
>> + without INIT_SHARED will be marked private).
>> + Shared memory can be faulted into host userspace
>> + page tables. Private memory cannot.
>> + GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will unmap the memory
>> + backing it from the kernel's address space
>> + before passing it off to userspace or the guest.
>> + ============================== ================================================
>>
>> When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ce8c5fdf2752..c95747e2278c 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -738,6 +738,9 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
>> if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
>> flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
>>
>> + if (!kvm || kvm_arch_gmem_supports_no_direct_map(kvm))
>> + flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
>> +
>> return flags;
>> }
>> #endif
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 80364d4dbebb..d864f67efdb7 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1642,6 +1642,7 @@ struct kvm_memory_attributes {
>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>> #define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
>> #define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
>> +#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2)
>>
>> struct kvm_create_guest_memfd {
>> __u64 size;
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 651649623448..c9344647579c 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -7,6 +7,7 @@
>> #include <linux/mempolicy.h>
>> #include <linux/pseudo_fs.h>
>> #include <linux/pagemap.h>
>> +#include <linux/set_memory.h>
>>
>> #include "kvm_mm.h"
>>
>> @@ -76,6 +77,35 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>> return 0;
>> }
>>
>> +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
>> +
>> +static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
>> +{
>> + return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
>> +}
>> +
>> +static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
>> +{
>> + u64 gmem_flags = GMEM_I(folio_inode(folio))->flags;
>> + int r = 0;
>> +
>> + if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
>
> The function is only called when
>
> kvm_gmem_no_direct_map(folio_inode(folio))
>
> Does it really make sense to check for GUEST_MEMFD_FLAG_NO_DIRECT_MAP again?
>
> If, at all, it should be a warning if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is
> not set?
>
> Further, kvm_gmem_folio_zap_direct_map() uses the folio lock to
> synchronize, right? Might be worth pointing that out somehow (e.g.,
> lockdep check if possible).
Added a WARN_ON. I couldn't find a way to have a lockdep check here.
>
>> + goto out;
^ permalink raw reply
* Re: [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map
From: Nikita Kalyazin @ 2026-04-10 15:28 UTC (permalink / raw)
To: Ackerley Tng, David Hildenbrand (Arm), Kalyazin, Nikita,
kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgEXp6busURR20cazeG2DQWdU5=ZaJv21OcSq+mhVKwJ4g@mail.gmail.com>
On 23/03/2026 20:47, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>>
>> [...snip...]
>>
>>> +static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
>>> +{
>>> + u64 gmem_flags = GMEM_I(folio_inode(folio))->flags;
>>> + int r = 0;
>>> +
>>> + if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
>>
>> The function is only called when
>>
>> kvm_gmem_no_direct_map(folio_inode(folio))
>>
>> Does it really make sense to check for GUEST_MEMFD_FLAG_NO_DIRECT_MAP again?
>>
>
> Good point that GUEST_MEMFD_FLAG_NO_DIRECT_MAP was already checked in
> the caller. I think we can drop this second check.
Dropped, thanks.
>
>> If, at all, it should be a warning if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is
>> not set?
>>
>> Further, kvm_gmem_folio_zap_direct_map() uses the folio lock to
>> synchronize, right? Might be worth pointing that out somehow (e.g.,
>> lockdep check if possible).
>>
>>> + goto out;
>>> +
>>> + r = folio_zap_direct_map(folio);
>>> + if (!r)
>>> + folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
>>> +
>>> +out:
>>> + return r;
>>> +}
>>> +
>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
>>> +{
^ permalink raw reply
* Re: [PATCH v11 05/16] mm/gup: drop local variable in gup_fast_folio_allowed
From: Nikita Kalyazin @ 2026-04-10 15:27 UTC (permalink / raw)
To: Ackerley Tng, David Hildenbrand (Arm), Kalyazin, Nikita,
kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgHqJGwmAfS8TuGBbUoQehpqY9GdtjUS=+Hc1ViK79RL4w@mail.gmail.com>
On 23/03/2026 20:22, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>> On 3/17/26 15:11, Kalyazin, Nikita wrote:
>>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>>
>>> Move the check for pinning closer to where the result is used.
>>> No functional changes.
>>>
>>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>>> ---
>>> mm/gup.c | 23 ++++++++++++-----------
>>> 1 file changed, 12 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 5856d35be385..869d79c8daa4 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2737,18 +2737,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>>> */
>>> static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>>> {
>>> - bool reject_file_backed = false;
>>> struct address_space *mapping;
>>> unsigned long mapping_flags;
>>>
>>> - /*
>>> - * If we aren't pinning then no problematic write can occur. A long term
>>> - * pin is the most egregious case so this is the one we disallow.
>>> - */
>>> - if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
>>> - (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
>>> - reject_file_backed = true;
>>> -
>>> /* We hold a folio reference, so we can safely access folio fields. */
>>> if (WARN_ON_ONCE(folio_test_slab(folio)))
>>> return false;
>>> @@ -2793,8 +2784,18 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>>> */
>>> if (secretmem_mapping(mapping))
>>> return false;
>>> - /* The only remaining allowed file system is shmem. */
>>> - return !reject_file_backed || shmem_mapping(mapping);
>>> +
>>> + /*
>>> + * If we aren't pinning then no problematic write can occur. A writable
>>> + * long term pin is the most egregious case, so this is the one we
>>> + * allow only for ...
>>> + */
>>> + if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
>>> + (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
>>> + return true;
>>> +
>>> + /* ... hugetlb (which we allowed above already) and shared memory. */
>>> + return shmem_mapping(mapping);
>>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>
>> I'm wondering if it would be a good idea to check for a hugetlb mapping
>> here instead of having the folio_test_hugetlb() check above.
>>
>
> I think it's nice that hugetlb folios are determined immediately to be
> eligible for GUP-fast regardless of whether the folio is file-backed or
> not.
Ok, I'll leave it as is for now.
>
>> Something to ponder about :)
>>
>> --
>> Cheers,
>>
>> David
^ permalink raw reply
* Re: [PATCH v11 04/16] mm/gup: drop secretmem optimization from gup_fast_folio_allowed
From: Nikita Kalyazin @ 2026-04-10 15:27 UTC (permalink / raw)
To: David Hildenbrand (Arm), Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
ackerleytng@google.com, yosry@kernel.org, ajones@ventanamicro.com,
maobibo@loongson.cn, tabba@google.com, prsampat@amd.com,
wu.fei9@sanechips.com.cn, mlevitsk@redhat.com,
jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek, Vlastimil Babka, Dan Williams, Alistair Popple
In-Reply-To: <76f14d3e-c8fb-4380-9d12-2375f199b53f@kernel.org>
On 23/03/2026 18:31, David Hildenbrand (Arm) wrote:
> On 3/17/26 15:11, Kalyazin, Nikita wrote:
>> From: Patrick Roy <patrick.roy@linux.dev>
>>
>> This drops an optimization in gup_fast_folio_allowed() where
>> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
>> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
>> by default"), so the secretmem check did not actually end up elided in
>> most cases anymore anyway.
>>
>> This is in preparation of the generalization of handling mappings where
>> direct map entries of folios are set to not present. Currently,
>> mappings that match this description are secretmem mappings
>> (memfd_secret()). Later, some guest_memfd configurations will also fall
>> into this category.
>>
>> Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> mm/gup.c | 11 +----------
>> 1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 8e7dc2c6ee73..5856d35be385 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2739,7 +2739,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>> {
>> bool reject_file_backed = false;
>> struct address_space *mapping;
>> - bool check_secretmem = false;
>> unsigned long mapping_flags;
>>
>> /*
>> @@ -2751,14 +2750,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>> reject_file_backed = true;
>>
>> /* We hold a folio reference, so we can safely access folio fields. */
>> -
>> - /* secretmem folios are always order-0 folios. */
>> - if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
>> - check_secretmem = true;
>> -
>> - if (!reject_file_backed && !check_secretmem)
>> - return true;
>> -
>
> The AI review says that this will force all small folios through the
> mapping check (which we obviously need later :) ).
>
> It brings up two cases where page->mapping is not set up:
>
> 1) ZONE_DEVICE pages (like Device DAX and PCI P2PDMA)
>
> 2) large shmem folios in the swap cache
>
>
> 2) doesn't make sense, because the folio cannot be mapped in user space
> when that happens.
>
> I am also skeptical about 1), especially as large folios are also
> supported for device dax and would be problematic here.
> __dev_dax_pte_fault() clearly sets folio->mapping through dax_set_mapping().
>
>
> If 1) is ever a case we could allow them by checking for
> folio_is_zone_device(). But I am not sure if that is really required.
> Sounds weird.
I went ahead and added a check for folio_is_zone_device().
>
>
> --
> Cheers,
>
> David
^ permalink raw reply
* Re: [PATCH v11 03/16] mm/secretmem: make use of folio_{zap, restore}_direct_map
From: Nikita Kalyazin @ 2026-04-10 15:26 UTC (permalink / raw)
To: Ackerley Tng, Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgEBdi49ZkfGo0xmM+J1yzKOzfT2ThAXEN=S0j7vC7Fu3w@mail.gmail.com>
On 23/03/2026 18:46, Ackerley Tng wrote:
> "Kalyazin, Nikita" <kalyazin@amazon.co.uk> writes:
>
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> mm/secretmem.c | 8 ++------
>> 1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/secretmem.c b/mm/secretmem.c
>> index fd29b33c6764..27b176af8fc4 100644
>> --- a/mm/secretmem.c
>> +++ b/mm/secretmem.c
>> @@ -53,7 +53,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>> struct inode *inode = file_inode(vmf->vma->vm_file);
>> pgoff_t offset = vmf->pgoff;
>> gfp_t gfp = vmf->gfp_mask;
>> - unsigned long addr;
>> struct folio *folio;
>> vm_fault_t ret;
>> int err;
>> @@ -72,7 +71,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>> goto out;
>> }
>>
>> - err = set_direct_map_invalid_noflush(folio_address(folio));
>> + err = folio_zap_direct_map(folio);
>> if (err) {
>> folio_put(folio);
>> ret = vmf_error(err);
>> @@ -87,7 +86,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>> * already happened when we marked the page invalid
>> * which guarantees that this call won't fail
>> */
>> - set_direct_map_default_noflush(folio_address(folio));
>> + folio_restore_direct_map(folio);
>> folio_put(folio);
>> if (err == -EEXIST)
>> goto retry;
>> @@ -95,9 +94,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>> ret = vmf_error(err);
>> goto out;
>> }
>> -
>> - addr = (unsigned long)folio_address(folio);
>> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>> }
>>
>> vmf->page = folio_file_page(folio, vmf->pgoff);
>> --
>> 2.50.1
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Thank you.
^ permalink raw reply
* Re: [PATCH v11 03/16] mm/secretmem: make use of folio_{zap, restore}_direct_map
From: Nikita Kalyazin @ 2026-04-10 15:26 UTC (permalink / raw)
To: David Hildenbrand (Arm), Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
ackerleytng@google.com, yosry@kernel.org, ajones@ventanamicro.com,
maobibo@loongson.cn, tabba@google.com, prsampat@amd.com,
wu.fei9@sanechips.com.cn, mlevitsk@redhat.com,
jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <187fa189-b6d7-4ba0-98a4-7a525cbaf4f9@kernel.org>
On 23/03/2026 17:53, David Hildenbrand (Arm) wrote:
> On 3/17/26 15:11, Kalyazin, Nikita wrote:
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>
> Describe your change :)
>
> Ans also worth mentioning that we now flush the TLB even though
> filemap_add_folio() failed -- which shouldn't matter in practice I guess.
>
> With that
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Done, thanks.
>
> --
> Cheers,
>
> David
^ permalink raw reply
* Re: [PATCH v11 02/16] set_memory: add folio_{zap, restore}_direct_map helpers
From: Nikita Kalyazin @ 2026-04-10 15:25 UTC (permalink / raw)
To: Ackerley Tng, Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgEFBexkZCjOMFHJRQFHOpiUezD2jbfDVFrGhYXODdpMjg@mail.gmail.com>
On 23/03/2026 18:43, Ackerley Tng wrote:
> "Kalyazin, Nikita" <kalyazin@amazon.co.uk> writes:
>
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>> Let's provide folio_{zap,restore}_direct_map helpers as preparation for
>> supporting removal of the direct map for guest_memfd folios.
>> In folio_zap_direct_map(), flush TLB to make sure the data is not
>> accessible.
>>
>> The new helpers need to be accessible to KVM on architectures that
>> support guest_memfd (x86 and arm64).
>>
>> Direct map removal gives guest_memfd the same protection that
>> memfd_secret does, such as hardening against Spectre-like attacks
>> through in-kernel gadgets.
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> include/linux/set_memory.h | 13 ++++++++++++
>> mm/memory.c | 42 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 55 insertions(+)
>>
>> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
>> index 1a2563f525fc..24caea2931f9 100644
>> --- a/include/linux/set_memory.h
>> +++ b/include/linux/set_memory.h
>> @@ -41,6 +41,15 @@ static inline int set_direct_map_valid_noflush(const void *addr,
>> return 0;
>> }
>>
>> +static inline int folio_zap_direct_map(struct folio *folio)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline void folio_restore_direct_map(struct folio *folio)
>> +{
>> +}
>> +
>> static inline bool kernel_page_present(struct page *page)
>> {
>> return true;
>> @@ -57,6 +66,10 @@ static inline bool can_set_direct_map(void)
>> }
>> #define can_set_direct_map can_set_direct_map
>> #endif
>> +
>> +int folio_zap_direct_map(struct folio *folio);
>> +void folio_restore_direct_map(struct folio *folio);
>> +
>> #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
>>
>> #ifdef CONFIG_X86_64
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 07778814b4a8..cab6bb237fc0 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -78,6 +78,7 @@
>> #include <linux/sched/sysctl.h>
>> #include <linux/pgalloc.h>
>> #include <linux/uaccess.h>
>> +#include <linux/set_memory.h>
>>
>> #include <trace/events/kmem.h>
>>
>> @@ -7478,3 +7479,44 @@ void vma_pgtable_walk_end(struct vm_area_struct *vma)
>> if (is_vm_hugetlb_page(vma))
>> hugetlb_vma_unlock_read(vma);
>> }
>> +
>> +#ifdef CONFIG_ARCH_HAS_SET_DIRECT_MAP
>> +/**
>> + * folio_zap_direct_map - remove a folio from the kernel direct map
>> + * @folio: folio to remove from the direct map
>> + *
>> + * Removes the folio from the kernel direct map and flushes the TLB. This may
>> + * require splitting huge pages in the direct map, which can fail due to memory
>> + * allocation.
>> + *
>> + * Return: 0 on success, or a negative error code on failure.
>> + */
>> +int folio_zap_direct_map(struct folio *folio)
>> +{
>> + const void *addr = folio_address(folio);
>> + int ret;
>> +
>> + ret = set_direct_map_valid_noflush(addr, folio_nr_pages(folio), false);
>> + flush_tlb_kernel_range((unsigned long)addr,
>> + (unsigned long)addr + folio_size(folio));
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_FOR_MODULES(folio_zap_direct_map, "kvm");
>> +
>> +/**
>> + * folio_restore_direct_map - restore the kernel direct map entry for a folio
>> + * @folio: folio whose direct map entry is to be restored
>> + *
>> + * This may only be called after a prior successful folio_zap_direct_map() on
>> + * the same folio. Because the zap will have already split any huge pages in
>> + * the direct map, restoration here only updates protection bits and cannot
>> + * fail.
>> + */
>> +void folio_restore_direct_map(struct folio *folio)
>> +{
>> + WARN_ON_ONCE(set_direct_map_valid_noflush(folio_address(folio),
>> + folio_nr_pages(folio), true));
>> +}
>> +EXPORT_SYMBOL_FOR_MODULES(folio_restore_direct_map, "kvm");
>> +#endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
>> --
>> 2.50.1
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
>
> I also took a look at Sashiko's [1] comments and I think that the
> highmem folio issues should be the responsibility of the caller to
> check.
Thank you.
>
> [1] https://sashiko.dev/#/patchset/20260317141031.514-1-kalyazin%40amazon.com
^ permalink raw reply
* Re: [PATCH v11 02/16] set_memory: add folio_{zap, restore}_direct_map helpers
From: Nikita Kalyazin @ 2026-04-10 15:25 UTC (permalink / raw)
To: David Hildenbrand (Arm), Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
ackerleytng@google.com, yosry@kernel.org, ajones@ventanamicro.com,
maobibo@loongson.cn, tabba@google.com, prsampat@amd.com,
wu.fei9@sanechips.com.cn, mlevitsk@redhat.com,
jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <54f9b5a7-b8a9-486a-9c12-a910f5287947@kernel.org>
On 23/03/2026 17:51, David Hildenbrand (Arm) wrote:
> On 3/17/26 15:10, Kalyazin, Nikita wrote:
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>> Let's provide folio_{zap,restore}_direct_map helpers as preparation for
>> supporting removal of the direct map for guest_memfd folios.
>> In folio_zap_direct_map(), flush TLB to make sure the data is not
>> accessible.
>>
>> The new helpers need to be accessible to KVM on architectures that
>> support guest_memfd (x86 and arm64).
>>
>> Direct map removal gives guest_memfd the same protection that
>> memfd_secret does, such as hardening against Spectre-like attacks
>> through in-kernel gadgets.
>
> Maybe mention that there might be a double TLB flush on some
> architectures, but that that is something to figure out later. Same
> behavior in secretmem code where this will be used next.
Added, thanks.
>
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> include/linux/set_memory.h | 13 ++++++++++++
>> mm/memory.c | 42 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 55 insertions(+)
>>
>> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
>> index 1a2563f525fc..24caea2931f9 100644
>> --- a/include/linux/set_memory.h
>> +++ b/include/linux/set_memory.h
>> @@ -41,6 +41,15 @@ static inline int set_direct_map_valid_noflush(const void *addr,
>> return 0;
>> }
>>
>> +static inline int folio_zap_direct_map(struct folio *folio)
>> +{
>> + return 0;
>
> Should we return -ENOSYS here or similar?
I'm not very certain about it because set_direct_map_* return 0 in this
case. Do we want them to behave differently?
>
>> +}
>> +
>> +static inline void folio_restore_direct_map(struct folio *folio)
>> +{
>> +}
>> +
>> static inline bool kernel_page_present(struct page *page)
>> {
>> return true;
>> @@ -57,6 +66,10 @@ static inline bool can_set_direct_map(void)
>> }
>> #define can_set_direct_map can_set_direct_map
>> #endif
>> +
>> +int folio_zap_direct_map(struct folio *folio);
>> +void folio_restore_direct_map(struct folio *folio);
>> +
>> #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
>>
>> #ifdef CONFIG_X86_64
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 07778814b4a8..cab6bb237fc0 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -78,6 +78,7 @@
>> #include <linux/sched/sysctl.h>
>> #include <linux/pgalloc.h>
>> #include <linux/uaccess.h>
>> +#include <linux/set_memory.h>
>>
>> #include <trace/events/kmem.h>
>>
>> @@ -7478,3 +7479,44 @@ void vma_pgtable_walk_end(struct vm_area_struct *vma)
>> if (is_vm_hugetlb_page(vma))
>> hugetlb_vma_unlock_read(vma);
>> }
>> +
>> +#ifdef CONFIG_ARCH_HAS_SET_DIRECT_MAP
>> +/**
>> + * folio_zap_direct_map - remove a folio from the kernel direct map
>> + * @folio: folio to remove from the direct map
>> + *
>> + * Removes the folio from the kernel direct map and flushes the TLB. This may
>> + * require splitting huge pages in the direct map, which can fail due to memory
>> + * allocation.
>
> Best to mention
>
> "So far, only order-0 folios are supported." and then ...
>
>> + *
>> + * Return: 0 on success, or a negative error code on failure.
>> + */
>> +int folio_zap_direct_map(struct folio *folio)
>> +{
>> + const void *addr = folio_address(folio);
>> + int ret;
>> +
>
> if (folio_test_large(folio))
> return -EINVAL;
Added, thanks.
>
>
> With that,
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Thank you.
>
> --
> Cheers,
>
> David
^ permalink raw reply
* Re: [PATCH v11 01/16] set_memory: set_direct_map_* to take address
From: Nikita Kalyazin @ 2026-04-10 15:25 UTC (permalink / raw)
To: Ackerley Tng, Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <CAEvNRgFUXsO4HLVvyjfU=UX9cO6UrRppyTu1X0_+6SXLhDEN=w@mail.gmail.com>
On 23/03/2026 18:00, Ackerley Tng wrote:
> "Kalyazin, Nikita" <kalyazin@amazon.co.uk> writes:
>
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>> This is to avoid excessive conversions folio->page->address when adding
>> helpers on top of set_direct_map_valid_noflush() in the next patch.
>>
>
> I can't take credit for what Sashiko [1] spotted.
>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> arch/arm64/include/asm/set_memory.h | 7 ++++---
>> arch/arm64/mm/pageattr.c | 19 +++++++++----------
>> arch/loongarch/include/asm/set_memory.h | 7 ++++---
>> arch/loongarch/mm/pageattr.c | 25 +++++++++++--------------
>> arch/riscv/include/asm/set_memory.h | 7 ++++---
>> arch/riscv/mm/pageattr.c | 17 +++++++++--------
>> arch/s390/include/asm/set_memory.h | 7 ++++---
>> arch/s390/mm/pageattr.c | 13 +++++++------
>> arch/x86/include/asm/set_memory.h | 7 ++++---
>> arch/x86/mm/pat/set_memory.c | 23 ++++++++++++-----------
>> include/linux/set_memory.h | 9 +++++----
>> kernel/power/snapshot.c | 4 ++--
>> mm/execmem.c | 6 ++++--
>> mm/secretmem.c | 6 +++---
>> mm/vmalloc.c | 11 +++++++----
>> 15 files changed, 89 insertions(+), 79 deletions(-)
>>
>>
>> [...snip...]
>>
>> diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
>> index f5e910b68229..9e08905d3624 100644
>> --- a/arch/loongarch/mm/pageattr.c
>> +++ b/arch/loongarch/mm/pageattr.c
>> @@ -198,32 +198,29 @@ bool kernel_page_present(struct page *page)
>> return pte_present(ptep_get(pte));
>> }
>>
>> -int set_direct_map_default_noflush(struct page *page)
>> +int set_direct_map_default_noflush(const void *addr)
>> {
>> - unsigned long addr = (unsigned long)page_address(page);
>> -
>> - if (addr < vm_map_base)
>> + if ((unsigned long)addr < vm_map_base)
>> return 0;
>>
>> - return __set_memory(addr, 1, PAGE_KERNEL, __pgprot(0));
>> + return __set_memory((unsigned long)addr, 1, PAGE_KERNEL, __pgprot(0));
>> }
>>
>> -int set_direct_map_invalid_noflush(struct page *page)
>> +int set_direct_map_invalid_noflush(const void *addr)
>> {
>> - unsigned long addr = (unsigned long)page_address(page);
>> -
>> - if (addr < vm_map_base)
>> + if ((unsigned long)addr < vm_map_base)
>> return 0;
>>
>> - return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
>> + return __set_memory((unsigned long)addr, 1, __pgprot(0),
>> + __pgprot(_PAGE_PRESENT | _PAGE_VALID));
>> }
>>
>> -int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>> +int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
>> + bool valid)
>> {
>> - unsigned long addr = (unsigned long)page_address(page);
>> pgprot_t set, clear;
>>
>> - if (addr < vm_map_base)
>> + if ((unsigned long)addr < vm_map_base)
>> return 0;
>>
>> if (valid) {
>> @@ -234,5 +231,5 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>> clear = __pgprot(_PAGE_PRESENT | _PAGE_VALID);
>> }
>>
>> - return __set_memory(addr, 1, set, clear);
>> + return __set_memory((unsigned long)addr, 1, set, clear);
>
> Sashiko also spotted that there is a hard-coded 1 here. Before this
> change, it was already hard-coded to 1. Not sure if this is a
> bug.
>
> Could this be addressed in a separate patch series?
Yes, I agree, it looks out of scope for this series.
>
>> }
>>
>> [...snip...]
>>
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>> index 40581a720fe8..6aea1f470fd5 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -2587,9 +2587,9 @@ int set_pages_rw(struct page *page, int numpages)
>> return set_memory_rw(addr, numpages);
>> }
>>
>> -static int __set_pages_p(struct page *page, int numpages)
>> +static int __set_pages_p(const void *addr, int numpages)
>> {
>> - unsigned long tempaddr = (unsigned long) page_address(page);
>> + unsigned long tempaddr = (unsigned long)addr;
>> struct cpa_data cpa = { .vaddr = &tempaddr,
>> .pgd = NULL,
>> .numpages = numpages,
>> @@ -2606,9 +2606,9 @@ static int __set_pages_p(struct page *page, int numpages)
>> return __change_page_attr_set_clr(&cpa, 1);
>> }
>>
>> -static int __set_pages_np(struct page *page, int numpages)
>> +static int __set_pages_np(const void *addr, int numpages)
>> {
>> - unsigned long tempaddr = (unsigned long) page_address(page);
>> + unsigned long tempaddr = (unsigned long)addr;
>> struct cpa_data cpa = { .vaddr = &tempaddr,
>> .pgd = NULL,
>> .numpages = numpages,
>> @@ -2625,22 +2625,23 @@ static int __set_pages_np(struct page *page, int numpages)
>> return __change_page_attr_set_clr(&cpa, 1);
>> }
>>
>
> I agree that in arch/x86/mm/pat/set_memory.c, __kernel_map_pages(), has
> calls to __set_pages_p() and __set_pages_np() that seems to have been
> missed out in this patch. Those calls still pass struct page *. Maybe
> that's because __kernel_map_pages() was guarded by
> CONFIG_DEBUG_PAGEALLOC, so if you were using an lsp-guided refactoring
> that call was missed.
Fixed, thanks!
>
> Should probably try a grep to see what else needs replacing :)
>
> [1] https://sashiko.dev/#/patchset/20260317141031.514-1-kalyazin%40amazon.com
>
>> -int set_direct_map_invalid_noflush(struct page *page)
>> +int set_direct_map_invalid_noflush(const void *addr)
>> {
>> - return __set_pages_np(page, 1);
>> + return __set_pages_np(addr, 1);
>> }
>>
>> -int set_direct_map_default_noflush(struct page *page)
>> +int set_direct_map_default_noflush(const void *addr)
>> {
>> - return __set_pages_p(page, 1);
>> + return __set_pages_p(addr, 1);
>> }
>>
>> -int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>> +int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
>> + bool valid)
>> {
>> if (valid)
>> - return __set_pages_p(page, nr);
>> + return __set_pages_p(addr, numpages);
>>
>> - return __set_pages_np(page, nr);
>> + return __set_pages_np(addr, numpages);
>> }
>>
>> #ifdef CONFIG_DEBUG_PAGEALLOC
>>
>> [...snip...]
>>
^ permalink raw reply
* Re: [PATCH v11 01/16] set_memory: set_direct_map_* to take address
From: Nikita Kalyazin @ 2026-04-10 15:24 UTC (permalink / raw)
To: David Hildenbrand (Arm), Kalyazin, Nikita, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, kevin.brodsky@arm.com,
ackerleytng@google.com, yosry@kernel.org, ajones@ventanamicro.com,
maobibo@loongson.cn, tabba@google.com, prsampat@amd.com,
wu.fei9@sanechips.com.cn, mlevitsk@redhat.com,
jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
dev.jain@arm.com, gor@linux.ibm.com, hca@linux.ibm.com,
palmer@dabbelt.com, pjw@kernel.org, shijie@os.amperecomputing.com,
svens@linux.ibm.com, thuth@redhat.com, wyihan@google.com,
yang@os.amperecomputing.com, Jonathan.Cameron@huawei.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, osalvador@suse.de,
pavel@kernel.org, rafael@kernel.org, vannapurve@google.com,
jackmanb@google.com, aneesh.kumar@kernel.org,
patrick.roy@linux.dev, Thomson, Jack, Itazuri, Takahiro,
Manwaring, Derek
In-Reply-To: <7b785e34-4d49-48d2-b438-afa1def3ec0f@kernel.org>
On 23/03/2026 17:44, David Hildenbrand (Arm) wrote:
> On 3/17/26 15:10, Kalyazin, Nikita wrote:
>> From: Nikita Kalyazin <kalyazin@amazon.com>
>>
>
> Just a nit while reading over it once more: restate what the patch
> subject says.
>
> Like "Let's convert set_direct_map_*() to take an address instead of a
> page to prepare for adding helpers that operate on folios; it will be
> more efficient to convert from a folio directly to an address without
> going through a page first."
Done, thanks.
>
>> This is to avoid excessive conversions folio->page->address when adding
>> helpers on top of set_direct_map_valid_noflush() in the next patch.
>>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>
> --
> Cheers,
>
> David
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Bernd Schubert @ 2026-04-10 15:24 UTC (permalink / raw)
To: John Groves, Joanne Koong
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
Alison Schofield, John Groves, Jonathan Corbet, Shuah Khan,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <adkDq0m5Wt9YhJ8A@groves.net>
On 4/10/26 16:46, John Groves wrote:
> On 26/04/06 10:43AM, Joanne Koong wrote:
>> On Tue, Mar 31, 2026 at 5:37 AM John Groves <john@jagalactic.com> wrote:
>>>
>>> From: John Groves <john@groves.net>
>>>
>>> NOTE: this series depends on the famfs dax series in Ira's for-7.1/dax-famfs
>>> branch [0]
>>>
>>> Changes v9 -> v10
>>> - Rebased to Ira's for-7.1/dax-famfs branch [0], which contains the required
>>> dax patches
>>> - Add parentheses to FUSE_IS_VIRTIO_DAX() macro, in case something bad is
>>> passed in as fuse_inode (thanks Jonathan's AI)
>>>
>>> Description:
>>>
>>> This patch series introduces famfs into the fuse file system framework.
>>> Famfs depends on the bundled dax patch set.
>>>
>>> The famfs user space code can be found at [1].
>>>
>>> Fuse Overview:
>>>
>>> Famfs started as a standalone file system, but this series is intended to
>>> permanently supersede that implementation. At a high level, famfs adds
>>> two new fuse server messages:
>>>
>>> GET_FMAP - Retrieves a famfs fmap (the file-to-dax map for a famfs
>>> file)
>>> GET_DAXDEV - Retrieves the details of a particular daxdev that was
>>> referenced by an fmap
>>>
>>> Famfs Overview
>>>
>>> Famfs exposes shared memory as a file system. Famfs consumes shared
>>> memory from dax devices, and provides memory-mappable files that map
>>> directly to the memory - no page cache involvement. Famfs differs from
>>> conventional file systems in fs-dax mode, in that it handles in-memory
>>> metadata in a sharable way (which begins with never caching dirty shared
>>> metadata).
>>>
>>> Famfs started as a standalone file system [2,3], but the consensus at
>>> LSFMM was that it should be ported into fuse [4,5].
>>>
>>> The key performance requirement is that famfs must resolve mapping faults
>>> without upcalls. This is achieved by fully caching the file-to-devdax
>>> metadata for all active files. This is done via two fuse client/server
>>> message/response pairs: GET_FMAP and GET_DAXDEV.
>>>
>>> Famfs remains the first fs-dax file system that is backed by devdax
>>> rather than pmem in fs-dax mode (hence the need for the new dax mode).
>>>
>>> Notes
>>>
>>> - When a file is opened in a famfs mount, the OPEN is followed by a
>>> GET_FMAP message and response. The "fmap" is the full file-to-dax
>>> mapping, allowing the fuse/famfs kernel code to handle
>>> read/write/fault without any upcalls.
>>>
>>> - After each GET_FMAP, the fmap is checked for extents that reference
>>> previously-unknown daxdevs. Each such occurrence is handled with a
>>> GET_DAXDEV message and response.
>>>
>>> - Daxdevs are stored in a table (which might become an xarray at some
>>> point). When entries are added to the table, we acquire exclusive
>>> access to the daxdev via the fs_dax_get() call (modeled after how
>>> fs-dax handles this with pmem devices). Famfs provides
>>> holder_operations to devdax, providing a notification path in the
>>> event of memory errors or forced reconfiguration.
>>>
>>> - If devdax notifies famfs of memory errors on a dax device, famfs
>>> currently blocks all subsequent accesses to data on that device. The
>>> recovery is to re-initialize the memory and file system. Famfs is
>>> memory, not storage...
>>>
>>> - Because famfs uses backing (devdax) devices, only privileged mounts are
>>> supported (i.e. the fuse server requires CAP_SYS_RAWIO).
>>>
>>> - The famfs kernel code never accesses the memory directly - it only
>>> facilitates read, write and mmap on behalf of user processes, using
>>> fmap metadata provided by its privileged fuse server. As such, the
>>> RAS of the shared memory affects applications, but not the kernel.
>>>
>>> - Famfs has backing device(s), but they are devdax (char) rather than
>>> block. Right now there is no way to tell the vfs layer that famfs has a
>>> char backing device (unless we say it's block, but it's not). Currently
>>> we use the standard anonymous fuse fs_type - but I'm not sure that's
>>> ultimately optimal (thoughts?)
>>>
>>> Changes v8 -> v9
>>> - Kconfig: fs/fuse/Kconfig:CONFIG_FUSE_FAMFS_DAX now depends on the
>>> new CONFIG_DEV_DAX_FSDEV (from drivers/dax/Kconfig) rather than
>>> just CONFIG_DEV_DAX and CONFIG_FS_DAX. (CONFIG_FUSE_FAMFS_DAX
>>> depends on those...)
>>>
>>> Changes v7 -> v8
>>> - Moved to inline __free declaration in fuse_get_fmap() and
>>> famfs_fuse_meta_alloc(), famfs_teardown()
>>> - Adopted FIELD_PREP() macro rather than manual bitfield manipulation
>>> - Minor doc edits
>>> - I dropped adding magic numbers to include/uapi/linux/magic.h. That
>>> can be done later if appropriate
>>>
>>> Changes v6 -> v7
>>> - Fixed a regression in famfs_interleave_fileofs_to_daxofs() that
>>> was reported by Intel's kernel test robot
>>> - Added a check in __fsdev_dax_direct_access() for negative return
>>> from pgoff_to_phys(), which would indicate an out-of-range offset
>>> - Fixed a bug in __famfs_meta_free(), where not all interleaved
>>> extents were freed
>>> - Added chunksize alignment checks in famfs_fuse_meta_alloc() and
>>> famfs_interleave_fileofs_to_daxofs() as interleaved chunks must
>>> be PTE or PMD aligned
>>> - Simplified famfs_file_init_dax() a bit
>>> - Re-ran CM's kernel code review prompts on the entire series and
>>> fixed several minor issues
>>>
>>> Changes v4 -> v5 -> v6
>>> - None. Re-sending due to technical difficulties
>>>
>>> Changes v3 [9] -> v4
>>> - The patch "dax: prevent driver unbind while filesystem holds device"
>>> has been dropped. Dan Williams indicated that the favored behavior is
>>> for a file system to stop working if an underlying driver is unbound,
>>> rather than preventing the unbind.
>>> - The patch "famfs_fuse: Famfs mount opt: -o shadow=<shadowpath>" has
>>> been dropped. Found a way for the famfs user space to do without the
>>> -o opt (via getxattr).
>>> - Squashed the fs/fuse/Kconfig patch into the first subsequent patch
>>> that needed the change
>>> ("famfs_fuse: Basic fuse kernel ABI enablement for famfs")
>>> - Many review comments addressed.
>>> - Addressed minor kerneldoc infractions reported by test robot.
>>>
>>> Changes v2 [7] -> v3
>>> - Dax: Completely new fsdev driver (drivers/dax/fsdev.c) replaces the
>>> dev_dax_iomap modifications to bus.c/device.c. Devdax devices can now
>>> be switched among 'devdax', 'famfs' and 'system-ram' modes via daxctl
>>> or sysfs.
>>> - Dax: fsdev uses MEMORY_DEVICE_FS_DAX type and leaves folios at order-0
>>> (no vmemmap_shift), allowing fs-dax to manage folio lifecycles
>>> dynamically like pmem does.
>>> - Dax: The "poisoned page" problem is properly fixed via
>>> fsdev_clear_folio_state(), which clears stale mapping/compound state
>>> when fsdev binds. The temporary WARN_ON_ONCE workaround in fs/dax.c
>>> has been removed.
>>> - Dax: Added dax_set_ops() so fsdev can set dax_operations at bind time
>>> (and clear them on unbind), since the dax_device is created before we
>>> know which driver will bind.
>>> - Dax: Added custom bind/unbind sysfs handlers; unbind return -EBUSY if a
>>> filesystem holds the device, preventing unbind while famfs is mounted.
>>> - Fuse: Famfs mounts now require that the fuse server/daemon has
>>> CAP_SYS_RAWIO because they expose raw memory devices.
>>> - Fuse: Added DAX address_space_operations with noop_dirty_folio since
>>> famfs is memory-backed with no writeback required.
>>> - Rebased to latest kernels, fully compatible with Alistair Popple
>>> et. al's recent dax refactoring.
>>> - Ran this series through Chris Mason's code review AI prompts to check
>>> for issues - several subtle problems found and fixed.
>>> - Dropped RFC status - this version is intended to be mergeable.
>>>
>>> Changes v1 [8] -> v2:
>>>
>>> - The GET_FMAP message/response has been moved from LOOKUP to OPEN, as
>>> was the pretty much unanimous consensus.
>>> - Made the response payload to GET_FMAP variable sized (patch 12)
>>> - Dodgy kerneldoc comments cleaned up or removed.
>>> - Fixed memory leak of fc->shadow in patch 11 (thanks Joanne)
>>> - Dropped many pr_debug and pr_notice calls
>>>
>>>
>>> References
>>>
>>> [0] - https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/
>>> [1] - https://famfs.org (famfs user space)
>>> [2] - https://lore.kernel.org/linux-cxl/cover.1708709155.git.john@groves.net/
>>> [3] - https://lore.kernel.org/linux-cxl/cover.1714409084.git.john@groves.net/
>>> [4] - https://lwn.net/Articles/983105/ (lsfmm 2024)
>>> [5] - https://lwn.net/Articles/1020170/ (lsfmm 2025)
>>> [6] - https://lore.kernel.org/linux-cxl/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com/
>>> [7] - https://lore.kernel.org/linux-fsdevel/20250703185032.46568-1-john@groves.net/ (famfs fuse v2)
>>> [8] - https://lore.kernel.org/linux-fsdevel/20250421013346.32530-1-john@groves.net/ (famfs fuse v1)
>>> [9] - https://lore.kernel.org/linux-fsdevel/20260107153244.64703-1-john@groves.net/T/#mb2c868801be16eca82dab239a1d201628534aea7 (famfs fuse v3)
>>>
>>>
>>> John Groves (10):
>>> famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
>>> famfs_fuse: Basic fuse kernel ABI enablement for famfs
>>> famfs_fuse: Plumb the GET_FMAP message/response
>>> famfs_fuse: Create files with famfs fmaps
>>> famfs_fuse: GET_DAXDEV message and daxdev_table
>>> famfs_fuse: Plumb dax iomap and fuse read/write/mmap
>>> famfs_fuse: Add holder_operations for dax notify_failure()
>>> famfs_fuse: Add DAX address_space_operations with noop_dirty_folio
>>> famfs_fuse: Add famfs fmap metadata documentation
>>> famfs_fuse: Add documentation
>>>
>>> Documentation/filesystems/famfs.rst | 142 ++++
>>> Documentation/filesystems/index.rst | 1 +
>>> MAINTAINERS | 10 +
>>> fs/fuse/Kconfig | 13 +
>>> fs/fuse/Makefile | 1 +
>>> fs/fuse/dir.c | 2 +-
>>> fs/fuse/famfs.c | 1180 +++++++++++++++++++++++++++
>>> fs/fuse/famfs_kfmap.h | 167 ++++
>>> fs/fuse/file.c | 45 +-
>>> fs/fuse/fuse_i.h | 116 ++-
>>> fs/fuse/inode.c | 35 +-
>>> fs/fuse/iomode.c | 2 +-
>>> fs/namei.c | 1 +
>>> include/uapi/linux/fuse.h | 88 ++
>>> 14 files changed, 1790 insertions(+), 13 deletions(-)
>>> create mode 100644 Documentation/filesystems/famfs.rst
>>> create mode 100644 fs/fuse/famfs.c
>>> create mode 100644 fs/fuse/famfs_kfmap.h
>>>
>>>
>>> base-commit: 2ae624d5a555d47a735fb3f4d850402859a4db77
>>> --
>>> 2.53.0
>>>
>>
>> Hi John,
>>
>> I’m curious to hear your thoughts on whether you think it makes sense
>> for the famfs-specific logic in this series to be moved to a bpf
>> program that goes through a generic fuse iomap dax layer.
>>
>> Based on [1], this gives feature-parity with the famfs logic in this
>> series. In my opinion, having famfs go through a generic fuse iomap
>> dax layer makes the fuse kernel code more extensible for future
>> servers that will also want to use dax iomap, and keeps the fuse code
>> cleaner by not having famfs-specific logic hardcoded in and having to
>> introduce new fuse uapis for something famfs-specific. In my
>> understanding of it, fuse is meant to be generic and it feels like
>> adding server-specific logic goes against that design philosophy and
>> sets a precedent for other servers wanting similar special-casing in
>> the future. I'd like to explore whether the bpf and generic fuse iomap
>> dax layer approach can preserve that philosophy while still giving
>> famfs the flexibility it needs.
>>
>> I think moving the famfs logic to bpf benefits famfs as well:
>> - Instead of needing to issue a FUSE_GET_FMAP request after a file is
>> opened, the server can directly populate the metadata map from
>> userspace with the mapping info when it processes the FUSE_OPEN
>> request, which gets rid of the roundtrip cost
>> - The server can dynamically update the metadata / bpf maps during
>> runtime from userspace if any mapping info needs to change
>> - Future code changes / updates for famfs are all server-side and can
>> be deployed immediately instead of needing to go through the upstream
>> kernel mailing list process
>> - Famfs updates / new releases can ship independently of kernel releases
>>
>> I'd appreciate the chance to discuss tradeoffs or if you'd rather
>> discuss this at the fuse BoF at lsf, that sounds great too.
>>
>> Thanks,
>> Joanne
>>
>
> Hi Joanne,
>
> I'm definitely up for discussing it, and talking before LSFMM would be
> good because then I'd have some time to think about before we discuss
> at LSFMM.
>
> I have not had a chance to really study this, in part since I've never even
> written a "hello world" BPF program.
>
> I'll ping off-list about times to talk.
>
> However...
>
> I would object vehemently to this sort of re-write prior to going upstream,
> as would users and vendors who need famfs now that the memory products it
> enables have started to ship.
>
> This work started over 3 years ago, initial patches over 2 years ago,
> community decision that it should go into fuse 2 years ago, first fuse
> patches a year ago.
>
> This implementation is pretty much exactly in line with expectation-setting
> starting two years ago. Famfs is a complicated orchestration between dax,
> fuse, ndctl (for daxctl), libfuse and the extensive famfs user space. Famfs
> has a fairly small kernel footprint, but its user space is much larger.
> This could set it back a year if we re-write now.
>
> Two things are true at once: I think this is a serious idea worth
> considering, and I think it's too late to make this sort of change before
> going upstream. Products need this enablement, and quite a long process has
> run in order to make it available in a timely fashion (which means soon
> now). I hope we can avoid making the perfect the enemy of the good.
>
> I believe the risk of merging famfs soon is quite low, because famfs will
> not affect anybody who doesn't use it. I hope we can run this discussion and
> analysis in parallel with merging the current implementation of famfs soon.
Hi John,
one question, assuming most of these things can be done with eBPF, would
you convert to eBPF after the merge?
(I also need to find the time to review at least all of your libfuse
changes, I feel guilty that still haven't done it.).
Thanks,
Bernd
^ permalink raw reply
* [PATCH v12 16/16] KVM: selftests: Test guest execution from direct map removed gmem
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add a selftest that loads itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) and triggers an MMIO exit when executed. This
exercises x86 MMIO emulation code inside KVM for guest_memfd-backed
memslots where the guest_memfd folios are direct map removed.
Particularly, it validates that x86 MMIO emulation code (guest page
table walks + instruction fetch) correctly accesses gmem through the VMA
that's been reflected into the memslot's userspace_addr field (instead
of trying to do direct map accesses).
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../selftests/kvm/set_memory_region_test.c | 52 +++++++++++++++++--
1 file changed, 48 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index 7fe427ff9b38..cb445d420e8c 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -602,6 +602,41 @@ static void test_mmio_during_vectoring(void)
kvm_vm_free(vm);
}
+
+static void guest_code_trigger_mmio(void)
+{
+ /*
+ * Read some GPA that is not backed by a memslot. KVM consider this
+ * as MMIO and tell userspace to emulate the read.
+ */
+ READ_ONCE(*((uint64_t *)MEM_REGION_GPA));
+
+ GUEST_DONE();
+}
+
+static void test_guest_memfd_mmio(void)
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu;
+ struct vm_shape shape = {
+ .mode = VM_MODE_DEFAULT,
+ .src_type = VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
+ };
+ pthread_t vcpu_thread;
+
+ pr_info("Testing MMIO emulation for instructions in gmem\n");
+
+ vm = __vm_create_shape_with_one_vcpu(shape, &vcpu, 0, guest_code_trigger_mmio);
+
+ virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 1);
+
+ pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
+
+ /* If the MMIO read was successfully emulated, the vcpu thread will exit */
+ pthread_join(vcpu_thread, NULL);
+
+ kvm_vm_free(vm);
+}
#endif
int main(int argc, char *argv[])
@@ -625,10 +660,19 @@ int main(int argc, char *argv[])
test_add_max_memory_regions();
#ifdef __x86_64__
- if (kvm_has_cap(KVM_CAP_GUEST_MEMFD) &&
- (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))) {
- test_add_private_memory_region();
- test_add_overlapping_private_memory_regions();
+ if (kvm_has_cap(KVM_CAP_GUEST_MEMFD)) {
+ uint64_t valid_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS);
+
+ if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM)) {
+ test_add_private_memory_region();
+ test_add_overlapping_private_memory_regions();
+ }
+
+ if ((valid_flags & GUEST_MEMFD_FLAG_MMAP) &&
+ (valid_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
+ test_guest_memfd_mmio();
+ else
+ pr_info("Skipping tests requiring GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_NO_DIRECT_MAP");
} else {
pr_info("Skipping tests for KVM_MEM_GUEST_MEMFD memory regions\n");
}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Use one of the padding fields in struct vm_shape to carry an enum
vm_mem_backing_src_type value, to give the option to overwrite the
default of VM_MEM_SRC_ANONYMOUS in __vm_create().
Overwriting this default will allow tests to create VMs where the test
code is backed by mmap'd guest_memfd instead of anonymous memory.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/kvm_util.h | 19 ++++++++++---------
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 1 +
4 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 056a003a63c0..48b6ee8223aa 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -215,7 +215,7 @@ enum vm_guest_mode {
struct vm_shape {
uint32_t type;
uint8_t mode;
- uint8_t pad0;
+ uint8_t src_type;
uint16_t pad1;
};
@@ -223,14 +223,15 @@ kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
#define VM_TYPE_DEFAULT 0
-#define VM_SHAPE(__mode) \
-({ \
- struct vm_shape shape = { \
- .mode = (__mode), \
- .type = VM_TYPE_DEFAULT \
- }; \
- \
- shape; \
+#define VM_SHAPE(__mode) \
+({ \
+ struct vm_shape shape = { \
+ .mode = (__mode), \
+ .type = VM_TYPE_DEFAULT, \
+ .src_type = VM_MEM_SRC_ANONYMOUS \
+ }; \
+ \
+ shape; \
})
extern enum vm_guest_mode vm_mode_default;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fa4a2fc236fe..824c94c64864 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -500,7 +500,7 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
if (is_guest_memfd_required(shape))
flags |= KVM_MEM_GUEST_MEMFD;
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, flags);
+ vm_userspace_mem_region_add(vm, shape.src_type, 0, 0, nr_pages, flags);
for (i = 0; i < NR_MEM_REGIONS; i++)
vm->memslots[i] = 0;
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c
index c3a9838f4806..d920880e4fc0 100644
--- a/tools/testing/selftests/kvm/lib/x86/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86/sev.c
@@ -164,6 +164,7 @@ struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code,
struct vm_shape shape = {
.mode = VM_MODE_DEFAULT,
.type = type,
+ .src_type = VM_MEM_SRC_ANONYMOUS,
};
struct kvm_vm *vm;
struct kvm_vcpu *cpus[1];
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 93e603d91311..8a4d5af53fab 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -165,6 +165,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
const struct vm_shape shape = {
.mode = VM_MODE_DEFAULT,
.type = vm_type,
+ .src_type = VM_MEM_SRC_ANONYMOUS,
};
struct kvm_vcpu *vcpu;
struct kvm_run *run;
--
2.50.1
^ permalink raw reply related
* [PATCH v12 14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Extend mem conversion selftests to cover the scenario that the guest can
fault in and write gmem-backed guest memory even if its direct map
removed. Also cover the new flag in guest_memfd_test.c tests.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
tools/testing/selftests/kvm/guest_memfd_test.c | 17 ++++++++++++++++-
.../kvm/x86/private_mem_conversions_test.c | 7 ++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index cc329b57ce2e..64c1200c182e 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -403,6 +403,17 @@ static void test_guest_memfd(unsigned long vm_type)
__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
GUEST_MEMFD_FLAG_INIT_SHARED);
+ if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
+ if (flags & GUEST_MEMFD_FLAG_MMAP)
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP |
+ GUEST_MEMFD_FLAG_MMAP);
+ if (flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP |
+ GUEST_MEMFD_FLAG_MMAP |
+ GUEST_MEMFD_FLAG_INIT_SHARED);
+ }
+
kvm_vm_free(vm);
}
@@ -445,10 +456,14 @@ static void test_guest_memfd_guest(void)
TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_INIT_SHARED,
"Default VM type should support INIT_SHARED, supported flags = 0x%x",
vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS));
+ TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP,
+ "Default VM type should support NO_DIRECT_MAP, supported flags = 0x%x",
+ vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS));
size = vm->page_size;
fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
- GUEST_MEMFD_FLAG_INIT_SHARED);
+ GUEST_MEMFD_FLAG_INIT_SHARED |
+ GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 1969f4ab9b28..8767cb4a037e 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -367,7 +367,7 @@ static void *__test_mem_conversions(void *__vcpu)
}
static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus,
- uint32_t nr_memslots)
+ uint32_t nr_memslots, uint64_t gmem_flags)
{
/*
* Allocate enough memory so that each vCPU's chunk of memory can be
@@ -394,7 +394,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
- memfd = vm_create_guest_memfd(vm, memfd_size, 0);
+ memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
for (i = 0; i < nr_memslots; i++)
vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
@@ -474,7 +474,8 @@ int main(int argc, char *argv[])
}
}
- test_mem_conversions(src_type, nr_vcpus, nr_memslots);
+ test_mem_conversions(src_type, nr_vcpus, nr_memslots, 0);
+ test_mem_conversions(src_type, nr_vcpus, nr_memslots, GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
return 0;
}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 13/16] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Allow selftests to configure their memslots such that userspace_addr is
set to a MAP_SHARED mapping of the guest_memfd that's associated with
the memslot. This setup is the configuration for non-CoCo VMs, where all
guest memory is backed by a guest_memfd whose folios are all marked
shared, but KVM is still able to access guest memory to provide
functionality such as MMIO emulation on x86.
Add backing types for normal guest_memfd, as well as direct map removed
guest_memfd.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/kvm_util.h | 18 ++++++
.../testing/selftests/kvm/include/test_util.h | 7 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 61 ++++++++++---------
tools/testing/selftests/kvm/lib/test_util.c | 8 +++
4 files changed, 65 insertions(+), 29 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 8b39cb919f4f..056a003a63c0 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -664,6 +664,24 @@ static inline bool is_smt_on(void)
void vm_create_irqchip(struct kvm_vm *vm);
+static inline uint32_t backing_src_guest_memfd_flags(enum vm_mem_backing_src_type t)
+{
+ uint32_t flags = 0;
+
+ switch (t) {
+ case VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP:
+ flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+ fallthrough;
+ case VM_MEM_SRC_GUEST_MEMFD:
+ flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ break;
+ default:
+ break;
+ }
+
+ return flags;
+}
+
static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
uint64_t flags)
{
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 8140e59b59e5..ea6de20ce8ef 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -152,6 +152,8 @@ enum vm_mem_backing_src_type {
VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
VM_MEM_SRC_SHMEM,
VM_MEM_SRC_SHARED_HUGETLB,
+ VM_MEM_SRC_GUEST_MEMFD,
+ VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
NUM_SRC_TYPES,
};
@@ -184,6 +186,11 @@ static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t)
return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
}
+static inline bool backing_src_is_guest_memfd(enum vm_mem_backing_src_type t)
+{
+ return t == VM_MEM_SRC_GUEST_MEMFD || t == VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP;
+}
+
static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t)
{
return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 5b0865683047..fa4a2fc236fe 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1046,6 +1046,33 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
alignment = 1;
#endif
+ if (guest_memfd < 0) {
+ if ((flags & KVM_MEM_GUEST_MEMFD) || backing_src_is_guest_memfd(src_type)) {
+ uint32_t guest_memfd_flags = backing_src_guest_memfd_flags(src_type);
+
+ TEST_ASSERT(!guest_memfd_offset,
+ "Offset must be zero when creating new guest_memfd");
+ guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
+ }
+ } else {
+ /*
+ * Install a unique fd for each memslot so that the fd
+ * can be closed when the region is deleted without
+ * needing to track if the fd is owned by the framework
+ * or by the caller.
+ */
+ guest_memfd = kvm_dup(guest_memfd);
+ }
+
+ if (guest_memfd >= 0) {
+ flags |= KVM_MEM_GUEST_MEMFD;
+
+ region->region.guest_memfd = guest_memfd;
+ region->region.guest_memfd_offset = guest_memfd_offset;
+ } else {
+ region->region.guest_memfd = -1;
+ }
+
/*
* When using THP mmap is not guaranteed to returned a hugepage aligned
* address so we have to pad the mmap. Padding is not needed for HugeTLB
@@ -1061,10 +1088,13 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
if (alignment > 1)
region->mmap_size += alignment;
- region->fd = -1;
- if (backing_src_is_shared(src_type))
+ if (backing_src_is_guest_memfd(src_type))
+ region->fd = guest_memfd;
+ else if (backing_src_is_shared(src_type))
region->fd = kvm_memfd_alloc(region->mmap_size,
src_type == VM_MEM_SRC_SHARED_HUGETLB);
+ else
+ region->fd = -1;
region->mmap_start = kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
@@ -1089,33 +1119,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
}
region->backing_src_type = src_type;
-
- if (guest_memfd < 0) {
- if (flags & KVM_MEM_GUEST_MEMFD) {
- uint32_t guest_memfd_flags = 0;
- TEST_ASSERT(!guest_memfd_offset,
- "Offset must be zero when creating new guest_memfd");
- guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
- }
- } else {
- /*
- * Install a unique fd for each memslot so that the fd
- * can be closed when the region is deleted without
- * needing to track if the fd is owned by the framework
- * or by the caller.
- */
- guest_memfd = kvm_dup(guest_memfd);
- }
-
- if (guest_memfd >= 0) {
- flags |= KVM_MEM_GUEST_MEMFD;
-
- region->region.guest_memfd = guest_memfd;
- region->region.guest_memfd_offset = guest_memfd_offset;
- } else {
- region->region.guest_memfd = -1;
- }
-
region->unused_phy_pages = sparsebit_alloc();
if (vm_arch_has_protected_memory(vm))
region->protected_phy_pages = sparsebit_alloc();
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 8a1848586a85..ce9fe0271515 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -306,6 +306,14 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
*/
.flag = MAP_SHARED,
},
+ [VM_MEM_SRC_GUEST_MEMFD] = {
+ .name = "guest_memfd",
+ .flag = MAP_SHARED,
+ },
+ [VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP] = {
+ .name = "guest_memfd_no_direct_map",
+ .flag = MAP_SHARED,
+ }
};
_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
"Missing new backing src types?");
--
2.50.1
^ permalink raw reply related
* [PATCH v12 12/16] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Have vm_mem_add() always set KVM_MEM_GUEST_MEMFD in the memslot flags if
a guest_memfd is passed in as an argument. This eliminates the
possibility where a guest_memfd instance is passed to vm_mem_add(), but
it ends up being ignored because the flags argument does not specify
KVM_MEM_GUEST_MEMFD at the same time.
This makes it easy to support more scenarios in which no vm_mem_add() is
not passed a guest_memfd instance, but is expected to allocate one.
Currently, this only happens if guest_memfd == -1 but flags &
KVM_MEM_GUEST_MEMFD != 0, but later vm_mem_add() will gain support for
loading the test code itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) if requested via a special
vm_mem_backing_src_type, at which point having to make sure the src_type
and flags are in-sync becomes cumbersome.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 24 +++++++++++++---------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 1959bf556e88..5b0865683047 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1090,21 +1090,25 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
region->backing_src_type = src_type;
- if (flags & KVM_MEM_GUEST_MEMFD) {
- if (guest_memfd < 0) {
+ if (guest_memfd < 0) {
+ if (flags & KVM_MEM_GUEST_MEMFD) {
uint32_t guest_memfd_flags = 0;
TEST_ASSERT(!guest_memfd_offset,
"Offset must be zero when creating new guest_memfd");
guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
- } else {
- /*
- * Install a unique fd for each memslot so that the fd
- * can be closed when the region is deleted without
- * needing to track if the fd is owned by the framework
- * or by the caller.
- */
- guest_memfd = kvm_dup(guest_memfd);
}
+ } else {
+ /*
+ * Install a unique fd for each memslot so that the fd
+ * can be closed when the region is deleted without
+ * needing to track if the fd is owned by the framework
+ * or by the caller.
+ */
+ guest_memfd = kvm_dup(guest_memfd);
+ }
+
+ if (guest_memfd >= 0) {
+ flags |= KVM_MEM_GUEST_MEMFD;
region->region.guest_memfd = guest_memfd;
region->region.guest_memfd_offset = guest_memfd_offset;
--
2.50.1
^ permalink raw reply related
* [PATCH v12 11/16] KVM: selftests: load elf via bounce buffer
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
If guest memory is backed using a VMA that does not allow GUP (e.g. a
userspace mapping of guest_memfd when the fd was allocated using
GUEST_MEMFD_FLAG_NO_DIRECT_MAP), then directly loading the test ELF
binary into it via read(2) potentially does not work. To nevertheless
support loading binaries in this cases, do the read(2) syscall using a
bounce buffer, and then memcpy from the bounce buffer into guest memory.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/test_util.h | 1 +
tools/testing/selftests/kvm/lib/elf.c | 8 +++----
tools/testing/selftests/kvm/lib/io.c | 23 +++++++++++++++++++
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index b4872ba8ed12..8140e59b59e5 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -48,6 +48,7 @@ do { \
ssize_t test_write(int fd, const void *buf, size_t count);
ssize_t test_read(int fd, void *buf, size_t count);
+ssize_t test_read_bounce(int fd, void *buf, size_t count);
int test_seq_read(const char *path, char **bufp, size_t *sizep);
void __printf(5, 6) test_assert(bool exp, const char *exp_str,
diff --git a/tools/testing/selftests/kvm/lib/elf.c b/tools/testing/selftests/kvm/lib/elf.c
index f34d926d9735..e829fbe0a11e 100644
--- a/tools/testing/selftests/kvm/lib/elf.c
+++ b/tools/testing/selftests/kvm/lib/elf.c
@@ -31,7 +31,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
* the real size of the ELF header.
*/
unsigned char ident[EI_NIDENT];
- test_read(fd, ident, sizeof(ident));
+ test_read_bounce(fd, ident, sizeof(ident));
TEST_ASSERT((ident[EI_MAG0] == ELFMAG0) && (ident[EI_MAG1] == ELFMAG1)
&& (ident[EI_MAG2] == ELFMAG2) && (ident[EI_MAG3] == ELFMAG3),
"ELF MAGIC Mismatch,\n"
@@ -79,7 +79,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
offset_rv = lseek(fd, 0, SEEK_SET);
TEST_ASSERT(offset_rv == 0, "Seek to ELF header failed,\n"
" rv: %zi expected: %i", offset_rv, 0);
- test_read(fd, hdrp, sizeof(*hdrp));
+ test_read_bounce(fd, hdrp, sizeof(*hdrp));
TEST_ASSERT(hdrp->e_phentsize == sizeof(Elf64_Phdr),
"Unexpected physical header size,\n"
" hdrp->e_phentsize: %x\n"
@@ -146,7 +146,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
/* Read in the program header. */
Elf64_Phdr phdr;
- test_read(fd, &phdr, sizeof(phdr));
+ test_read_bounce(fd, &phdr, sizeof(phdr));
/* Skip if this header doesn't describe a loadable segment. */
if (phdr.p_type != PT_LOAD)
@@ -187,7 +187,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
" expected: 0x%jx",
n1, errno, (intmax_t) offset_rv,
(intmax_t) phdr.p_offset);
- test_read(fd, addr_gva2hva(vm, phdr.p_vaddr),
+ test_read_bounce(fd, addr_gva2hva(vm, phdr.p_vaddr),
phdr.p_filesz);
}
}
diff --git a/tools/testing/selftests/kvm/lib/io.c b/tools/testing/selftests/kvm/lib/io.c
index fedb2a741f0b..60613dce6cfd 100644
--- a/tools/testing/selftests/kvm/lib/io.c
+++ b/tools/testing/selftests/kvm/lib/io.c
@@ -155,3 +155,26 @@ ssize_t test_read(int fd, void *buf, size_t count)
return num_read;
}
+
+/* Test read via intermediary buffer
+ *
+ * Same as test_read, except read(2)s happen into a bounce buffer that is memcpy'd
+ * to buf. For use with buffers that cannot be GUP'd (e.g. guest_memfd VMAs if
+ * guest_memfd was created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP).
+ */
+ssize_t test_read_bounce(int fd, void *buf, size_t count)
+{
+ void *bounce_buffer;
+ ssize_t num_read;
+
+ TEST_ASSERT(count > 0, "Unexpected count, count: %zu", count);
+
+ bounce_buffer = malloc(count);
+ TEST_ASSERT(bounce_buffer != NULL, "Failed to allocate bounce buffer");
+
+ num_read = test_read(fd, bounce_buffer, count);
+ memcpy(buf, bounce_buffer, num_read);
+ free(bounce_buffer);
+
+ return num_read;
+}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Nikita Kalyazin
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
ioctl. When set, guest_memfd folios will be removed from the direct map
after preparation, with direct map entries only restored when the folios
are freed.
To ensure these folios do not end up in places where the kernel cannot
deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
Note that this flag causes removal of direct map entries for all
guest_memfd folios independent of whether they are "shared" or "private"
(although current guest_memfd only supports either all folios in the
"shared" state, or all folios in the "private" state if
GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
entries of also the shared parts of guest_memfd are a special type of
non-CoCo VM where, host userspace is trusted to have access to all of
guest memory, but where Spectre-style transient execution attacks
through the host kernel's direct map should still be mitigated. In this
setup, KVM retains access to guest memory via userspace mappings of
guest_memfd, which are reflected back into KVM's memslots via
userspace_addr. This is needed for things like MMIO emulation on x86_64
to work.
Direct map entries are zapped right before guest or userspace mappings
of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
kvm_gmem_get_pfn() [called from the KVM MMU code]. At present, direct
map removal is not supported on platforms that support
kvm_gmem_populate(). In case such support is added in the future, the
following ordering is maintained: zap then prepare, invalidate then
restore, to avoid having guest-owned pages being temporarily mapped on
by host. This assumes that preparation or invalidation code does not
access the page content.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Co-developed-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
Documentation/virt/kvm/api.rst | 21 +++++-----
include/linux/kvm_host.h | 3 ++
include/uapi/linux/kvm.h | 1 +
virt/kvm/guest_memfd.c | 71 ++++++++++++++++++++++++++++++++--
4 files changed, 83 insertions(+), 13 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 032516783e96..8feec77b03fe 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6439,15 +6439,18 @@ a single guest_memfd file, but the bound ranges must not overlap).
The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
- ============================ ================================================
- GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
- descriptor.
- GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
- KVM_CREATE_GUEST_MEMFD (memory files created
- without INIT_SHARED will be marked private).
- Shared memory can be faulted into host userspace
- page tables. Private memory cannot.
- ============================ ================================================
+ ============================== ================================================
+ GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
+ descriptor.
+ GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
+ KVM_CREATE_GUEST_MEMFD (memory files created
+ without INIT_SHARED will be marked private).
+ Shared memory can be faulted into host userspace
+ page tables. Private memory cannot.
+ GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will unmap the memory
+ backing it from the kernel's address space
+ before passing it off to userspace or the guest.
+ ============================== ================================================
When the KVM MMU performs a PFN lookup to service a guest fault and the backing
guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ce8c5fdf2752..c95747e2278c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -738,6 +738,9 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
+ if (!kvm || kvm_arch_gmem_supports_no_direct_map(kvm))
+ flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+
return flags;
}
#endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 80364d4dbebb..d864f67efdb7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1642,6 +1642,7 @@ struct kvm_memory_attributes {
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
#define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
+#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2)
struct kvm_create_guest_memfd {
__u64 size;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 651649623448..80d4a6aca128 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,6 +7,7 @@
#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
+#include <linux/set_memory.h>
#include "kvm_mm.h"
@@ -76,6 +77,39 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
return 0;
}
+#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
+
+static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
+{
+ return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
+}
+
+static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
+{
+ int r = 0;
+
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+
+ if (WARN_ON_ONCE(!(GMEM_I(folio_inode(folio))->flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)))
+ return -EINVAL;
+
+ if (kvm_gmem_folio_no_direct_map(folio))
+ goto out;
+
+ r = folio_zap_direct_map(folio);
+ if (!r)
+ folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
+
+out:
+ return r;
+}
+
+static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
+{
+ folio_restore_direct_map(folio);
+ folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
+}
+
/*
* Process @folio, which contains @gfn, so that the guest can use it.
* The folio must be locked and the gfn must be contained in @slot.
@@ -388,11 +422,17 @@ static bool kvm_gmem_supports_mmap(struct inode *inode)
return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_MMAP;
}
+static bool kvm_gmem_no_direct_map(struct inode *inode)
+{
+ return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+}
+
static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
{
struct inode *inode = file_inode(vmf->vma->vm_file);
struct folio *folio;
vm_fault_t ret = VM_FAULT_LOCKED;
+ int err;
if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
return VM_FAULT_SIGBUS;
@@ -418,6 +458,14 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
folio_mark_uptodate(folio);
}
+ if (kvm_gmem_no_direct_map(folio_inode(folio))) {
+ err = kvm_gmem_folio_zap_direct_map(folio);
+ if (err) {
+ ret = vmf_error(err);
+ goto out_folio;
+ }
+ }
+
vmf->page = folio_file_page(folio, vmf->pgoff);
out_folio:
@@ -529,6 +577,9 @@ static void kvm_gmem_free_folio(struct folio *folio)
int order = folio_order(folio);
kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
+
+ if (kvm_gmem_folio_no_direct_map(folio))
+ kvm_gmem_folio_restore_direct_map(folio);
}
static const struct address_space_operations kvm_gmem_aops = {
@@ -591,6 +642,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+ if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+ mapping_set_no_direct_map(inode->i_mapping);
+
GMEM_I(inode)->flags = flags;
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
@@ -802,14 +856,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
folio_mark_uptodate(folio);
}
+ if (kvm_gmem_no_direct_map(folio_inode(folio))) {
+ r = kvm_gmem_folio_zap_direct_map(folio);
+ if (r)
+ goto out_unlock;
+ }
+
r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
+ if (r)
+ goto out_unlock;
+ *page = folio_file_page(folio, index);
folio_unlock(folio);
+ return 0;
- if (!r)
- *page = folio_file_page(folio, index);
- else
- folio_put(folio);
+out_unlock:
+ folio_unlock(folio);
+ folio_put(folio);
return r;
}
--
2.50.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox