* linux-next: KVM/s390x regression (was: [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations)
[not found] <20251001065707.920170-4-balbirs@nvidia.com>
@ 2025-10-17 14:49 ` Christian Borntraeger
2025-10-17 14:54 ` linux-next: KVM/s390x regression David Hildenbrand
0 siblings, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-17 14:49 UTC (permalink / raw)
To: balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, david, dev.jain, dri-devel, francois.dugast,
gourry, joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes,
lyude, matthew.brost, mpenttil, npache, osalvador, rakie.kim,
rcampbell, ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
This patch triggers a regression for s390x kvm as qemu guests can no longer start
error: kvm run failed Cannot allocate memory
PSW=mask 0000000180000000 addr 000000007fd00600
R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
Christian Borntraeger
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 14:49 ` linux-next: KVM/s390x regression (was: [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations) Christian Borntraeger
@ 2025-10-17 14:54 ` David Hildenbrand
2025-10-17 15:01 ` Christian Borntraeger
0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-17 14:54 UTC (permalink / raw)
To: Christian Borntraeger, balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 17.10.25 16:49, Christian Borntraeger wrote:
> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>
> error: kvm run failed Cannot allocate memory
> PSW=mask 0000000180000000 addr 000000007fd00600
> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>
> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
Only when running KVM guests and apart from that everything else seems
to be fine?
That's weird :)
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 14:54 ` linux-next: KVM/s390x regression David Hildenbrand
@ 2025-10-17 15:01 ` Christian Borntraeger
2025-10-17 15:07 ` David Hildenbrand
0 siblings, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-17 15:01 UTC (permalink / raw)
To: David Hildenbrand, balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
Am 17.10.25 um 16:54 schrieb David Hildenbrand:
> On 17.10.25 16:49, Christian Borntraeger wrote:
>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>
>> error: kvm run failed Cannot allocate memory
>> PSW=mask 0000000180000000 addr 000000007fd00600
>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>
>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>
> Only when running KVM guests and apart from that everything else seems to be fine?
We have other weirdness in linux-next but in different areas. Could that somehow be
related to use disabling THP for the kvm address space?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 15:01 ` Christian Borntraeger
@ 2025-10-17 15:07 ` David Hildenbrand
2025-10-17 15:20 ` Christian Borntraeger
0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-17 15:07 UTC (permalink / raw)
To: Christian Borntraeger, balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 17.10.25 17:01, Christian Borntraeger wrote:
> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>
>>> error: kvm run failed Cannot allocate memory
>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>
>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>
>> Only when running KVM guests and apart from that everything else seems to be fine?
>
> We have other weirdness in linux-next but in different areas. Could that somehow be
> related to use disabling THP for the kvm address space?
Not sure ... it's a bit weird. I mean, when KVM disables THPs we
essentially just remap everything to be mapped by PTEs. So there
shouldn't be any PMDs in that whole process.
Remapping a file THP (shmem) implies zapping the THP completely.
I assume in your kernel config has CONFIG_ZONE_DEVICE and
CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
What happens if you revert the change in mm/pgtable-generic.c?
But the whole -ENOMEM error is a weird symptom.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 15:07 ` David Hildenbrand
@ 2025-10-17 15:20 ` Christian Borntraeger
2025-10-17 17:07 ` David Hildenbrand
0 siblings, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-17 15:20 UTC (permalink / raw)
To: David Hildenbrand, balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
Am 17.10.25 um 17:07 schrieb David Hildenbrand:
> On 17.10.25 17:01, Christian Borntraeger wrote:
>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>
>>>> error: kvm run failed Cannot allocate memory
>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>
>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>
>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>
>> We have other weirdness in linux-next but in different areas. Could that somehow be
>> related to use disabling THP for the kvm address space?
>
> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>
> Remapping a file THP (shmem) implies zapping the THP completely.
>
>
> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
yes.
>
> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>
>
> What happens if you revert the change in mm/pgtable-generic.c?
That partial revert seems to fix the issue
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 0c847cdf4fd3..567e2d084071 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
if (pmdvalp)
*pmdvalp = pmdval;
- if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
+ if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
goto nomap;
if (unlikely(pmd_trans_huge(pmdval)))
goto nomap;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 15:20 ` Christian Borntraeger
@ 2025-10-17 17:07 ` David Hildenbrand
2025-10-17 21:56 ` Balbir Singh
0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-17 17:07 UTC (permalink / raw)
To: Christian Borntraeger, balbirs
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 17.10.25 17:20, Christian Borntraeger wrote:
>
>
> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>>
>>>>> error: kvm run failed Cannot allocate memory
>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>>
>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>>
>>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>>
>>> We have other weirdness in linux-next but in different areas. Could that somehow be
>>> related to use disabling THP for the kvm address space?
>>
>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>>
>> Remapping a file THP (shmem) implies zapping the THP completely.
>>
>>
>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>
> yes.
>
>>
>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>
>>
>> What happens if you revert the change in mm/pgtable-generic.c?
>
> That partial revert seems to fix the issue
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 0c847cdf4fd3..567e2d084071 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>
> if (pmdvalp)
> *pmdvalp = pmdval;
> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
> + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
Okay, but that means that effectively we stumble over a PMD entry that
is not a migration entry but still non-present.
And I would expect that it's a page table, because otherwise the change
wouldn't make a difference.
And the weird thing is that this only triggers sometimes, because if
it would always trigger nothing would ever work.
Is there some weird scenario where s390x might set a left page table
mapped in a PMD to non-present?
Staring at the definition of pmd_present() on s390x it's really just
return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
Maybe this is happening in the gmap code only and not actually in the
core-mm code?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 17:07 ` David Hildenbrand
@ 2025-10-17 21:56 ` Balbir Singh
2025-10-17 22:15 ` David Hildenbrand
2025-10-20 7:00 ` Christian Borntraeger
0 siblings, 2 replies; 23+ messages in thread
From: Balbir Singh @ 2025-10-17 21:56 UTC (permalink / raw)
To: David Hildenbrand, Christian Borntraeger
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 10/18/25 04:07, David Hildenbrand wrote:
> On 17.10.25 17:20, Christian Borntraeger wrote:
>>
>>
>> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>>>
>>>>>> error: kvm run failed Cannot allocate memory
>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>>>
>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>>>
>>>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>>>
>>>> We have other weirdness in linux-next but in different areas. Could that somehow be
>>>> related to use disabling THP for the kvm address space?
>>>
>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>>>
>>> Remapping a file THP (shmem) implies zapping the THP completely.
>>>
>>>
>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>>
>> yes.
>>
>>>
>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>>
>>>
>>> What happens if you revert the change in mm/pgtable-generic.c?
>>
>> That partial revert seems to fix the issue
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 0c847cdf4fd3..567e2d084071 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>> if (pmdvalp)
>> *pmdvalp = pmdval;
>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>> + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
>
> Okay, but that means that effectively we stumble over a PMD entry that is not a migration entry but still non-present.
>
> And I would expect that it's a page table, because otherwise the change
> wouldn't make a difference.
>
> And the weird thing is that this only triggers sometimes, because if
> it would always trigger nothing would ever work.
>
> Is there some weird scenario where s390x might set a left page table mapped in a PMD to non-present?
>
Good point
> Staring at the definition of pmd_present() on s390x it's really just
>
> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>
>
> Maybe this is happening in the gmap code only and not actually in the core-mm code?
>
I am not an s390 expert, but just looking at the code
So the check on s390 effectively
segment_entry/present = false or segment_entry_empty/invalid = true
Given that the revert works, the check changes to
segment_entry/present = false or pmd_migration_entry (PAGE_INVALID | PAGE_PROTECT)
So it isn't the first check of segment_entry/present = false
sounds like for s390 we would want __pte_offset_map to allow mappings with
segment_entry_empty/invalid entries?
Any chance we can get the stack trace and a dump of the PMD entry when the issue occurs?
In the meanwhile, does this fix/workaround work?
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 0c847cdf4fd3..31c1754d5bd4 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
if (pmdvalp)
*pmdvalp = pmdval;
- if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
+ if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
goto nomap;
if (unlikely(pmd_trans_huge(pmdval)))
goto nomap;
Thanks David and Christian!
Balbir
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 21:56 ` Balbir Singh
@ 2025-10-17 22:15 ` David Hildenbrand
2025-10-17 22:41 ` David Hildenbrand
2025-10-20 7:00 ` Christian Borntraeger
1 sibling, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-17 22:15 UTC (permalink / raw)
To: Balbir Singh, Christian Borntraeger
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 17.10.25 23:56, Balbir Singh wrote:
> On 10/18/25 04:07, David Hildenbrand wrote:
>> On 17.10.25 17:20, Christian Borntraeger wrote:
>>>
>>>
>>> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>>>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>>>>
>>>>>>> error: kvm run failed Cannot allocate memory
>>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>>>>
>>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>>>>
>>>>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>>>>
>>>>> We have other weirdness in linux-next but in different areas. Could that somehow be
>>>>> related to use disabling THP for the kvm address space?
>>>>
>>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>>>>
>>>> Remapping a file THP (shmem) implies zapping the THP completely.
>>>>
>>>>
>>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>>>
>>> yes.
>>>
>>>>
>>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>>>
>>>>
>>>> What happens if you revert the change in mm/pgtable-generic.c?
>>>
>>> That partial revert seems to fix the issue
>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>> index 0c847cdf4fd3..567e2d084071 100644
>>> --- a/mm/pgtable-generic.c
>>> +++ b/mm/pgtable-generic.c
>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>> if (pmdvalp)
>>> *pmdvalp = pmdval;
>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>> + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
>>
>> Okay, but that means that effectively we stumble over a PMD entry that is not a migration entry but still non-present.
>>
>> And I would expect that it's a page table, because otherwise the change
>> wouldn't make a difference.
>>
>> And the weird thing is that this only triggers sometimes, because if
>> it would always trigger nothing would ever work.
>>
>> Is there some weird scenario where s390x might set a left page table mapped in a PMD to non-present?
>>
>
> Good point
>
>> Staring at the definition of pmd_present() on s390x it's really just
>>
>> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>>
>>
>> Maybe this is happening in the gmap code only and not actually in the core-mm code?
>>
>
>
> I am not an s390 expert, but just looking at the code
>
> So the check on s390 effectively
>
> segment_entry/present = false or segment_entry_empty/invalid = true
pmd_present() == true iff _SEGMENT_ENTRY_PRESENT is set
because
return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
is the same as
return pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT;
But that means we have something where _SEGMENT_ENTRY_PRESENT is not set.
I suspect that can only be the gmap tables.
Likely __gmap_link() does not set _SEGMENT_ENTRY_PRESENT, which is fine
because it's a software managed bit for "ordinary" page tables, not gmap
tables.
Which raises the question why someone would wrongly use
pte_offset_map()/__pte_offset_map() on the gmap tables.
I cannot immediately spot any such usage in kvm/gmap code, though.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 22:15 ` David Hildenbrand
@ 2025-10-17 22:41 ` David Hildenbrand
2025-10-20 7:01 ` Christian Borntraeger
0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-17 22:41 UTC (permalink / raw)
To: Balbir Singh, Christian Borntraeger
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 18.10.25 00:15, David Hildenbrand wrote:
> On 17.10.25 23:56, Balbir Singh wrote:
>> On 10/18/25 04:07, David Hildenbrand wrote:
>>> On 17.10.25 17:20, Christian Borntraeger wrote:
>>>>
>>>>
>>>> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>>>>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>>>>>
>>>>>>>> error: kvm run failed Cannot allocate memory
>>>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>>>>>
>>>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>>>>>
>>>>>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>>>>>
>>>>>> We have other weirdness in linux-next but in different areas. Could that somehow be
>>>>>> related to use disabling THP for the kvm address space?
>>>>>
>>>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>>>>>
>>>>> Remapping a file THP (shmem) implies zapping the THP completely.
>>>>>
>>>>>
>>>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>>>>
>>>> yes.
>>>>
>>>>>
>>>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>>>>
>>>>>
>>>>> What happens if you revert the change in mm/pgtable-generic.c?
>>>>
>>>> That partial revert seems to fix the issue
>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>>> index 0c847cdf4fd3..567e2d084071 100644
>>>> --- a/mm/pgtable-generic.c
>>>> +++ b/mm/pgtable-generic.c
>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>>> if (pmdvalp)
>>>> *pmdvalp = pmdval;
>>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>>> + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
>>>
>>> Okay, but that means that effectively we stumble over a PMD entry that is not a migration entry but still non-present.
>>>
>>> And I would expect that it's a page table, because otherwise the change
>>> wouldn't make a difference.
>>>
>>> And the weird thing is that this only triggers sometimes, because if
>>> it would always trigger nothing would ever work.
>>>
>>> Is there some weird scenario where s390x might set a left page table mapped in a PMD to non-present?
>>>
>>
>> Good point
>>
>>> Staring at the definition of pmd_present() on s390x it's really just
>>>
>>> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>>>
>>>
>>> Maybe this is happening in the gmap code only and not actually in the core-mm code?
>>>
>>
>>
>> I am not an s390 expert, but just looking at the code
>>
>> So the check on s390 effectively
>>
>> segment_entry/present = false or segment_entry_empty/invalid = true
>
> pmd_present() == true iff _SEGMENT_ENTRY_PRESENT is set
>
> because
>
> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>
> is the same as
>
> return pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT;
>
> But that means we have something where _SEGMENT_ENTRY_PRESENT is not set.
>
> I suspect that can only be the gmap tables.
>
> Likely __gmap_link() does not set _SEGMENT_ENTRY_PRESENT, which is fine
> because it's a software managed bit for "ordinary" page tables, not gmap
> tables.
>
> Which raises the question why someone would wrongly use
> pte_offset_map()/__pte_offset_map() on the gmap tables.
>
> I cannot immediately spot any such usage in kvm/gmap code, though.
>
Ah, it's all that pte_alloc_map_lock() stuff in gmap.c.
Oh my.
So we're mapping a user PTE table that is linked into the gmap tables
through a PMD table that does not have the right sw bits set we would
expect in a user PMD table.
What's also scary is that pte_alloc_map_lock() would try to pte_alloc()
a user page table in the gmap, which sounds completely wrong?
Yeah, when walking the gmap and wanting to lock the linked user PTE
table, we should probably never use the pte_*map variants but obtain
the lock through pte_lockptr().
All magic we end up doing with RCU etc in __pte_offset_map_lock()
does not apply to the gmap PMD table.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 21:56 ` Balbir Singh
2025-10-17 22:15 ` David Hildenbrand
@ 2025-10-20 7:00 ` Christian Borntraeger
2025-10-20 8:41 ` David Hildenbrand
1 sibling, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-20 7:00 UTC (permalink / raw)
To: Balbir Singh, David Hildenbrand, Claudio Imbrenda
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
Am 17.10.25 um 23:56 schrieb Balbir Singh:
> In the meanwhile, does this fix/workaround work?
>
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 0c847cdf4fd3..31c1754d5bd4 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>
> if (pmdvalp)
> *pmdvalp = pmdval;
> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
> goto nomap;
> if (unlikely(pmd_trans_huge(pmdval)))
> goto nomap;
>
Yes, this seems to work.
CC Claudio.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-17 22:41 ` David Hildenbrand
@ 2025-10-20 7:01 ` Christian Borntraeger
0 siblings, 0 replies; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-20 7:01 UTC (permalink / raw)
To: David Hildenbrand, Balbir Singh, Claudio Imbrenda
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
Am 18.10.25 um 00:41 schrieb David Hildenbrand:
> On 18.10.25 00:15, David Hildenbrand wrote:
>> On 17.10.25 23:56, Balbir Singh wrote:
>>> On 10/18/25 04:07, David Hildenbrand wrote:
>>>> On 17.10.25 17:20, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>>>>>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>>>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>>>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start
>>>>>>>>>
>>>>>>>>> error: kvm run failed Cannot allocate memory
>>>>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000
>>>>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000
>>>>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000
>>>>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000
>>>>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000
>>>>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000
>>>>>>>>>
>>>>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea?
>>>>>>>>
>>>>>>>> Only when running KVM guests and apart from that everything else seems to be fine?
>>>>>>>
>>>>>>> We have other weirdness in linux-next but in different areas. Could that somehow be
>>>>>>> related to use disabling THP for the kvm address space?
>>>>>>
>>>>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process.
>>>>>>
>>>>>> Remapping a file THP (shmem) implies zapping the THP completely.
>>>>>>
>>>>>>
>>>>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>>>>>
>>>>> yes.
>>>>>
>>>>>>
>>>>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>>>>>
>>>>>>
>>>>>> What happens if you revert the change in mm/pgtable-generic.c?
>>>>>
>>>>> That partial revert seems to fix the issue
>>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>>>> index 0c847cdf4fd3..567e2d084071 100644
>>>>> --- a/mm/pgtable-generic.c
>>>>> +++ b/mm/pgtable-generic.c
>>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>>>> if (pmdvalp)
>>>>> *pmdvalp = pmdval;
>>>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>>>> + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
>>>>
>>>> Okay, but that means that effectively we stumble over a PMD entry that is not a migration entry but still non-present.
>>>>
>>>> And I would expect that it's a page table, because otherwise the change
>>>> wouldn't make a difference.
>>>>
>>>> And the weird thing is that this only triggers sometimes, because if
>>>> it would always trigger nothing would ever work.
>>>>
>>>> Is there some weird scenario where s390x might set a left page table mapped in a PMD to non-present?
>>>>
>>>
>>> Good point
>>>
>>>> Staring at the definition of pmd_present() on s390x it's really just
>>>>
>>>> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>>>>
>>>>
>>>> Maybe this is happening in the gmap code only and not actually in the core-mm code?
>>>>
>>>
>>>
>>> I am not an s390 expert, but just looking at the code
>>>
>>> So the check on s390 effectively
>>>
>>> segment_entry/present = false or segment_entry_empty/invalid = true
>>
>> pmd_present() == true iff _SEGMENT_ENTRY_PRESENT is set
>>
>> because
>>
>> return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
>>
>> is the same as
>>
>> return pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT;
>>
>> But that means we have something where _SEGMENT_ENTRY_PRESENT is not set.
>>
>> I suspect that can only be the gmap tables.
>>
>> Likely __gmap_link() does not set _SEGMENT_ENTRY_PRESENT, which is fine
>> because it's a software managed bit for "ordinary" page tables, not gmap
>> tables.
>>
>> Which raises the question why someone would wrongly use
>> pte_offset_map()/__pte_offset_map() on the gmap tables.
>>
>> I cannot immediately spot any such usage in kvm/gmap code, though.
>>
>
> Ah, it's all that pte_alloc_map_lock() stuff in gmap.c.
>
> Oh my.
>
> So we're mapping a user PTE table that is linked into the gmap tables through a PMD table that does not have the right sw bits set we would expect in a user PMD table.
>
> What's also scary is that pte_alloc_map_lock() would try to pte_alloc() a user page table in the gmap, which sounds completely wrong?
>
> Yeah, when walking the gmap and wanting to lock the linked user PTE table, we should probably never use the pte_*map variants but obtain
> the lock through pte_lockptr().
>
> All magic we end up doing with RCU etc in __pte_offset_map_lock()
> does not apply to the gmap PMD table.
>
CC Claudio.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-20 7:00 ` Christian Borntraeger
@ 2025-10-20 8:41 ` David Hildenbrand
2025-10-20 9:04 ` Claudio Imbrenda
2025-10-27 16:47 ` Claudio Imbrenda
0 siblings, 2 replies; 23+ messages in thread
From: David Hildenbrand @ 2025-10-20 8:41 UTC (permalink / raw)
To: Christian Borntraeger, Balbir Singh, Claudio Imbrenda
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next
On 20.10.25 09:00, Christian Borntraeger wrote:
> Am 17.10.25 um 23:56 schrieb Balbir Singh:
>
>> In the meanwhile, does this fix/workaround work?
>>
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 0c847cdf4fd3..31c1754d5bd4 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>
>> if (pmdvalp)
>> *pmdvalp = pmdval;
>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
>> goto nomap;
>> if (unlikely(pmd_trans_huge(pmdval)))
>> goto nomap;
>>
>
> Yes, this seems to work.
Right, but that's not what we will want here. We'll have to adjust s390x
gmap code (which is getting redesigned either way) to only take the page
lock.
In the end, we'll want here later a single
if (!pmd_present(pmdval))
goto nomap;
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-20 8:41 ` David Hildenbrand
@ 2025-10-20 9:04 ` Claudio Imbrenda
2025-10-27 16:47 ` Claudio Imbrenda
1 sibling, 0 replies; 23+ messages in thread
From: Claudio Imbrenda @ 2025-10-20 9:04 UTC (permalink / raw)
To: David Hildenbrand
Cc: Christian Borntraeger, Balbir Singh, Liam.Howlett, airlied, akpm,
apopple, baohua, baolin.wang, byungchul, dakr, dev.jain,
dri-devel, francois.dugast, gourry, joshua.hahnjy, linux-kernel,
linux-mm, lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next
On Mon, 20 Oct 2025 10:41:28 +0200
David Hildenbrand <david@redhat.com> wrote:
> On 20.10.25 09:00, Christian Borntraeger wrote:
> > Am 17.10.25 um 23:56 schrieb Balbir Singh:
> >
> >> In the meanwhile, does this fix/workaround work?
> >>
> >> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> >> index 0c847cdf4fd3..31c1754d5bd4 100644
> >> --- a/mm/pgtable-generic.c
> >> +++ b/mm/pgtable-generic.c
> >> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
> >>
> >> if (pmdvalp)
> >> *pmdvalp = pmdval;
> >> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
> >> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
> >> goto nomap;
> >> if (unlikely(pmd_trans_huge(pmdval)))
> >> goto nomap;
> >>
> >
> > Yes, this seems to work.
>
> Right, but that's not what we will want here. We'll have to adjust s390x
I'm looking into that
> gmap code (which is getting redesigned either way) to only take the page
unfortunately the rework won't make it in 6.18, so I'll have to quickly
cobble together a fix
> lock.
>
> In the end, we'll want here later a single
>
> if (!pmd_present(pmdval))
> goto nomap;
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-20 8:41 ` David Hildenbrand
2025-10-20 9:04 ` Claudio Imbrenda
@ 2025-10-27 16:47 ` Claudio Imbrenda
2025-10-27 16:59 ` David Hildenbrand
2025-10-27 17:06 ` Christian Borntraeger
1 sibling, 2 replies; 23+ messages in thread
From: Claudio Imbrenda @ 2025-10-27 16:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: Christian Borntraeger, Balbir Singh, Liam.Howlett, airlied, akpm,
apopple, baohua, baolin.wang, byungchul, dakr, dev.jain,
dri-devel, francois.dugast, gourry, joshua.hahnjy, linux-kernel,
linux-mm, lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next
On Mon, 20 Oct 2025 10:41:28 +0200
David Hildenbrand <david@redhat.com> wrote:
> On 20.10.25 09:00, Christian Borntraeger wrote:
> > Am 17.10.25 um 23:56 schrieb Balbir Singh:
> >
> >> In the meanwhile, does this fix/workaround work?
> >>
> >> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> >> index 0c847cdf4fd3..31c1754d5bd4 100644
> >> --- a/mm/pgtable-generic.c
> >> +++ b/mm/pgtable-generic.c
> >> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
> >>
> >> if (pmdvalp)
> >> *pmdvalp = pmdval;
> >> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
> >> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
> >> goto nomap;
> >> if (unlikely(pmd_trans_huge(pmdval)))
> >> goto nomap;
> >>
> >
> > Yes, this seems to work.
>
> Right, but that's not what we will want here. We'll have to adjust s390x
> gmap code (which is getting redesigned either way) to only take the page
> lock.
>
> In the end, we'll want here later a single
>
> if (!pmd_present(pmdval))
> goto nomap;
>
this seems to do the trick:
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8ff6bba107e8..22c448b32340 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long
gaddr, unsigned long vmaddr) | _SEGMENT_ENTRY_GMAP_UC
| _SEGMENT_ENTRY;
} else
- *table = pmd_val(*pmd) &
- _SEGMENT_ENTRY_HARDWARE_BITS;
+ *table = (pmd_val(*pmd) &
+ _SEGMENT_ENTRY_HARDWARE_BITS)
+ | _SEGMENT_ENTRY;
}
} else if (*table & _SEGMENT_ENTRY_PROTECT &&
!(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
it marks non-leaf gmap segment (pmd) entries as present, just as normal
pmds would be.
I think it's a good enough fix for now, pending the rewrite, which I
hope to get in the next merge window
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-27 16:47 ` Claudio Imbrenda
@ 2025-10-27 16:59 ` David Hildenbrand
2025-10-27 17:06 ` Christian Borntraeger
1 sibling, 0 replies; 23+ messages in thread
From: David Hildenbrand @ 2025-10-27 16:59 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: Christian Borntraeger, Balbir Singh, Liam.Howlett, airlied, akpm,
apopple, baohua, baolin.wang, byungchul, dakr, dev.jain,
dri-devel, francois.dugast, gourry, joshua.hahnjy, linux-kernel,
linux-mm, lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next
On 27.10.25 17:47, Claudio Imbrenda wrote:
> On Mon, 20 Oct 2025 10:41:28 +0200
> David Hildenbrand <david@redhat.com> wrote:
>
>> On 20.10.25 09:00, Christian Borntraeger wrote:
>>> Am 17.10.25 um 23:56 schrieb Balbir Singh:
>>>
>>>> In the meanwhile, does this fix/workaround work?
>>>>
>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>>> index 0c847cdf4fd3..31c1754d5bd4 100644
>>>> --- a/mm/pgtable-generic.c
>>>> +++ b/mm/pgtable-generic.c
>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>>>
>>>> if (pmdvalp)
>>>> *pmdvalp = pmdval;
>>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>>> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
>>>> goto nomap;
>>>> if (unlikely(pmd_trans_huge(pmdval)))
>>>> goto nomap;
>>>>
>>>
>>> Yes, this seems to work.
>>
>> Right, but that's not what we will want here. We'll have to adjust s390x
>> gmap code (which is getting redesigned either way) to only take the page
>> lock.
>>
>> In the end, we'll want here later a single
>>
>> if (!pmd_present(pmdval))
>> goto nomap;
>>
>
> this seems to do the trick:
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 8ff6bba107e8..22c448b32340 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long
> gaddr, unsigned long vmaddr) | _SEGMENT_ENTRY_GMAP_UC
> | _SEGMENT_ENTRY;
> } else
> - *table = pmd_val(*pmd) &
> - _SEGMENT_ENTRY_HARDWARE_BITS;
> + *table = (pmd_val(*pmd) &
> + _SEGMENT_ENTRY_HARDWARE_BITS)
> + | _SEGMENT_ENTRY;
Probably worth adding a comment. I remember we don't reuse this bit as a
SW bit in gmap code, right?
> }
> } else if (*table & _SEGMENT_ENTRY_PROTECT &&
> !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
>
>
>
> it marks non-leaf gmap segment (pmd) entries as present, just as normal
> pmds would be.
Yeah, I looked into hand-coding the PTL lookup but it just gets nasty
real quick.
>
> I think it's a good enough fix for now, pending the rewrite, which I
> hope to get in the next merge window
Agreed.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-27 16:47 ` Claudio Imbrenda
2025-10-27 16:59 ` David Hildenbrand
@ 2025-10-27 17:06 ` Christian Borntraeger
2025-10-28 9:24 ` Balbir Singh
2025-10-28 13:01 ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
1 sibling, 2 replies; 23+ messages in thread
From: Christian Borntraeger @ 2025-10-27 17:06 UTC (permalink / raw)
To: Claudio Imbrenda, David Hildenbrand
Cc: Balbir Singh, Liam.Howlett, airlied, akpm, apopple, baohua,
baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev
Am 27.10.25 um 17:47 schrieb Claudio Imbrenda:
> On Mon, 20 Oct 2025 10:41:28 +0200
> David Hildenbrand <david@redhat.com> wrote:
>
>> On 20.10.25 09:00, Christian Borntraeger wrote:
>>> Am 17.10.25 um 23:56 schrieb Balbir Singh:
>>>
>>>> In the meanwhile, does this fix/workaround work?
>>>>
>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>>> index 0c847cdf4fd3..31c1754d5bd4 100644
>>>> --- a/mm/pgtable-generic.c
>>>> +++ b/mm/pgtable-generic.c
>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>>>
>>>> if (pmdvalp)
>>>> *pmdvalp = pmdval;
>>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>>> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
>>>> goto nomap;
>>>> if (unlikely(pmd_trans_huge(pmdval)))
>>>> goto nomap;
>>>>
>>>
>>> Yes, this seems to work.
>>
>> Right, but that's not what we will want here. We'll have to adjust s390x
>> gmap code (which is getting redesigned either way) to only take the page
>> lock.
>>
>> In the end, we'll want here later a single
>>
>> if (!pmd_present(pmdval))
>> goto nomap;
>>
>
> this seems to do the trick:
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 8ff6bba107e8..22c448b32340 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long
> gaddr, unsigned long vmaddr) | _SEGMENT_ENTRY_GMAP_UC
> | _SEGMENT_ENTRY;
> } else
> - *table = pmd_val(*pmd) &
> - _SEGMENT_ENTRY_HARDWARE_BITS;
> + *table = (pmd_val(*pmd) &
> + _SEGMENT_ENTRY_HARDWARE_BITS)
> + | _SEGMENT_ENTRY;
> }
> } else if (*table & _SEGMENT_ENTRY_PROTECT &&
> !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
>
>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
can you send a proper patch? I guess we should add it to Andrews mm true to keep it close to the patch that uncovered the issue.
s390 maintainers cced.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-next: KVM/s390x regression
2025-10-27 17:06 ` Christian Borntraeger
@ 2025-10-28 9:24 ` Balbir Singh
2025-10-28 13:01 ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
1 sibling, 0 replies; 23+ messages in thread
From: Balbir Singh @ 2025-10-28 9:24 UTC (permalink / raw)
To: Christian Borntraeger, Claudio Imbrenda, David Hildenbrand
Cc: Liam.Howlett, airlied, akpm, apopple, baohua, baolin.wang,
byungchul, dakr, dev.jain, dri-devel, francois.dugast, gourry,
joshua.hahnjy, linux-kernel, linux-mm, lorenzo.stoakes, lyude,
matthew.brost, mpenttil, npache, osalvador, rakie.kim, rcampbell,
ryan.roberts, simona, ying.huang, ziy, kvm, linux-s390,
linux-next, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
On 10/28/25 04:06, Christian Borntraeger wrote:
> Am 27.10.25 um 17:47 schrieb Claudio Imbrenda:
>> On Mon, 20 Oct 2025 10:41:28 +0200
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 20.10.25 09:00, Christian Borntraeger wrote:
>>>> Am 17.10.25 um 23:56 schrieb Balbir Singh:
>>>>
>>>>> In the meanwhile, does this fix/workaround work?
>>>>>
>>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>>>>> index 0c847cdf4fd3..31c1754d5bd4 100644
>>>>> --- a/mm/pgtable-generic.c
>>>>> +++ b/mm/pgtable-generic.c
>>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>>>>> if (pmdvalp)
>>>>> *pmdvalp = pmdval;
>>>>> - if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>>>>> + if (unlikely(pmd_none(pmdval) || is_pmd_non_present_folio_entry(pmdval)))
>>>>> goto nomap;
>>>>> if (unlikely(pmd_trans_huge(pmdval)))
>>>>> goto nomap;
>>>>>
>>>>
>>>> Yes, this seems to work.
>>>
>>> Right, but that's not what we will want here. We'll have to adjust s390x
>>> gmap code (which is getting redesigned either way) to only take the page
>>> lock.
>>>
>>> In the end, we'll want here later a single
>>>
>>> if (!pmd_present(pmdval))
>>> goto nomap;
>>>
>>
>> this seems to do the trick:
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index 8ff6bba107e8..22c448b32340 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long
>> gaddr, unsigned long vmaddr) | _SEGMENT_ENTRY_GMAP_UC
>> | _SEGMENT_ENTRY;
>> } else
>> - *table = pmd_val(*pmd) &
>> - _SEGMENT_ENTRY_HARDWARE_BITS;
>> + *table = (pmd_val(*pmd) &
>> + _SEGMENT_ENTRY_HARDWARE_BITS)
>> + | _SEGMENT_ENTRY;
>> }
>> } else if (*table & _SEGMENT_ENTRY_PROTECT &&
>> !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
>>
>>
>
> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
>
> can you send a proper patch? I guess we should add it to Andrews mm true to keep it close to the patch that uncovered the issue.
> s390 maintainers cced.
Thanks for finding the fix. Ideally, we want this fix just before my series if possible!
Balbir
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-27 17:06 ` Christian Borntraeger
2025-10-28 9:24 ` Balbir Singh
@ 2025-10-28 13:01 ` Claudio Imbrenda
2025-10-28 13:01 ` [PATCH v1 1/1] " Claudio Imbrenda
2025-10-28 22:53 ` [PATCH v1 0/1] " Andrew Morton
1 sibling, 2 replies; 23+ messages in thread
From: Claudio Imbrenda @ 2025-10-28 13:01 UTC (permalink / raw)
To: akpm
Cc: balbirs, borntraeger, david, Liam.Howlett, airlied, apopple,
baohua, baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
This patch solves the issue uncovered by patch caf527048be8
("mm/huge_memory: add device-private THP support to PMD operations"),
which is at the moment in -next.
@Andrew: do you think it's possible to squeeze this patch in -next
_before_ the patches that introduce the issue? This will guarantee that
the patch is merged first, and will not break bisections once merged.
Claudio Imbrenda (1):
KVM: s390: Fix missing present bit for gmap puds
arch/s390/mm/gmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v1 1/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-28 13:01 ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
@ 2025-10-28 13:01 ` Claudio Imbrenda
2025-10-28 21:23 ` Balbir Singh
2025-10-29 10:00 ` David Hildenbrand
2025-10-28 22:53 ` [PATCH v1 0/1] " Andrew Morton
1 sibling, 2 replies; 23+ messages in thread
From: Claudio Imbrenda @ 2025-10-28 13:01 UTC (permalink / raw)
To: akpm
Cc: balbirs, borntraeger, david, Liam.Howlett, airlied, apopple,
baohua, baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
For hugetlbs, gmap puds have the present bit set. For normal puds
(which point to ptes), the bit is not set. This is in contrast to the
normal userspace puds, which always have the bit set for present pmds.
This causes issues when ___pte_offset_map() is modified to only check
for the present bit.
The solution to the problem is simply to always set the present bit for
present gmap pmds.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/lkml/20251017144924.10034-1-borntraeger@linux.ibm.com/
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
---
arch/s390/mm/gmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8ff6bba107e8..22c448b32340 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
| _SEGMENT_ENTRY_GMAP_UC
| _SEGMENT_ENTRY;
} else
- *table = pmd_val(*pmd) &
- _SEGMENT_ENTRY_HARDWARE_BITS;
+ *table = (pmd_val(*pmd) &
+ _SEGMENT_ENTRY_HARDWARE_BITS)
+ | _SEGMENT_ENTRY;
}
} else if (*table & _SEGMENT_ENTRY_PROTECT &&
!(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v1 1/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-28 13:01 ` [PATCH v1 1/1] " Claudio Imbrenda
@ 2025-10-28 21:23 ` Balbir Singh
2025-10-29 10:00 ` David Hildenbrand
1 sibling, 0 replies; 23+ messages in thread
From: Balbir Singh @ 2025-10-28 21:23 UTC (permalink / raw)
To: Claudio Imbrenda, akpm
Cc: borntraeger, david, Liam.Howlett, airlied, apopple, baohua,
baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
On 10/29/25 00:01, Claudio Imbrenda wrote:
> For hugetlbs, gmap puds have the present bit set. For normal puds
> (which point to ptes), the bit is not set. This is in contrast to the
> normal userspace puds, which always have the bit set for present pmds.
>
> This causes issues when ___pte_offset_map() is modified to only check
> for the present bit.
>
> The solution to the problem is simply to always set the present bit for
> present gmap pmds.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Link: https://lore.kernel.org/lkml/20251017144924.10034-1-borntraeger@linux.ibm.com/
> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> ---
> arch/s390/mm/gmap.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 8ff6bba107e8..22c448b32340 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
> | _SEGMENT_ENTRY_GMAP_UC
> | _SEGMENT_ENTRY;
> } else
> - *table = pmd_val(*pmd) &
> - _SEGMENT_ENTRY_HARDWARE_BITS;
> + *table = (pmd_val(*pmd) &
> + _SEGMENT_ENTRY_HARDWARE_BITS)
> + | _SEGMENT_ENTRY;
> }
> } else if (*table & _SEGMENT_ENTRY_PROTECT &&
> !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
Acked-by: Balbir Singh <balbirs@nvidia.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-28 13:01 ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
2025-10-28 13:01 ` [PATCH v1 1/1] " Claudio Imbrenda
@ 2025-10-28 22:53 ` Andrew Morton
1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2025-10-28 22:53 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: balbirs, borntraeger, david, Liam.Howlett, airlied, apopple,
baohua, baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
On Tue, 28 Oct 2025 14:01:49 +0100 Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> @Andrew: do you think it's possible to squeeze this patch in -next
> _before_ the patches that introduce the issue? This will guarantee that
> the patch is merged first, and will not break bisections once merged.
no problem, thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v1 1/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-28 13:01 ` [PATCH v1 1/1] " Claudio Imbrenda
2025-10-28 21:23 ` Balbir Singh
@ 2025-10-29 10:00 ` David Hildenbrand
2025-10-29 10:20 ` Claudio Imbrenda
1 sibling, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-29 10:00 UTC (permalink / raw)
To: Claudio Imbrenda, akpm
Cc: balbirs, borntraeger, Liam.Howlett, airlied, apopple, baohua,
baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
On 28.10.25 14:01, Claudio Imbrenda wrote:
> For hugetlbs, gmap puds have the present bit set. For normal puds
> (which point to ptes), the bit is not set. This is in contrast to the
> normal userspace puds, which always have the bit set for present pmds.
>
> This causes issues when ___pte_offset_map() is modified to only check
> for the present bit.
>
> The solution to the problem is simply to always set the present bit for
> present gmap pmds.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Link: https://lore.kernel.org/lkml/20251017144924.10034-1-borntraeger@linux.ibm.com/
> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> ---
> arch/s390/mm/gmap.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 8ff6bba107e8..22c448b32340 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
> | _SEGMENT_ENTRY_GMAP_UC
> | _SEGMENT_ENTRY;
> } else
> - *table = pmd_val(*pmd) &
> - _SEGMENT_ENTRY_HARDWARE_BITS;
I'd add a comment here like
/* Make sure that pmd_present() will work on these entries. */
> + *table = (pmd_val(*pmd) &
> + _SEGMENT_ENTRY_HARDWARE_BITS)
> + | _SEGMENT_ENTRY;
> }
> } else if (*table & _SEGMENT_ENTRY_PROTECT &&
> !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v1 1/1] KVM: s390: Fix missing present bit for gmap puds
2025-10-29 10:00 ` David Hildenbrand
@ 2025-10-29 10:20 ` Claudio Imbrenda
0 siblings, 0 replies; 23+ messages in thread
From: Claudio Imbrenda @ 2025-10-29 10:20 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, balbirs, borntraeger, Liam.Howlett, airlied, apopple,
baohua, baolin.wang, byungchul, dakr, dev.jain, dri-devel,
francois.dugast, gourry, joshua.hahnjy, linux-kernel, linux-mm,
lorenzo.stoakes, lyude, matthew.brost, mpenttil, npache,
osalvador, rakie.kim, rcampbell, ryan.roberts, simona, ying.huang,
ziy, kvm, linux-s390, linux-next, hca, gor, agordeev
On Wed, 29 Oct 2025 11:00:14 +0100
David Hildenbrand <david@redhat.com> wrote:
> On 28.10.25 14:01, Claudio Imbrenda wrote:
> > For hugetlbs, gmap puds have the present bit set. For normal puds
> > (which point to ptes), the bit is not set. This is in contrast to the
> > normal userspace puds, which always have the bit set for present pmds.
> >
> > This causes issues when ___pte_offset_map() is modified to only check
> > for the present bit.
> >
> > The solution to the problem is simply to always set the present bit for
> > present gmap pmds.
> >
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Link: https://lore.kernel.org/lkml/20251017144924.10034-1-borntraeger@linux.ibm.com/
> > Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> > Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> > ---
> > arch/s390/mm/gmap.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > index 8ff6bba107e8..22c448b32340 100644
> > --- a/arch/s390/mm/gmap.c
> > +++ b/arch/s390/mm/gmap.c
> > @@ -599,8 +599,9 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
> > | _SEGMENT_ENTRY_GMAP_UC
> > | _SEGMENT_ENTRY;
> > } else
> > - *table = pmd_val(*pmd) &
> > - _SEGMENT_ENTRY_HARDWARE_BITS;
>
> I'd add a comment here like
>
> /* Make sure that pmd_present() will work on these entries. */
the whole file is going away very soon anyway
>
> > + *table = (pmd_val(*pmd) &
> > + _SEGMENT_ENTRY_HARDWARE_BITS)
> > + | _SEGMENT_ENTRY;
> > }
> > } else if (*table & _SEGMENT_ENTRY_PROTECT &&
> > !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-10-29 10:21 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251001065707.920170-4-balbirs@nvidia.com>
2025-10-17 14:49 ` linux-next: KVM/s390x regression (was: [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations) Christian Borntraeger
2025-10-17 14:54 ` linux-next: KVM/s390x regression David Hildenbrand
2025-10-17 15:01 ` Christian Borntraeger
2025-10-17 15:07 ` David Hildenbrand
2025-10-17 15:20 ` Christian Borntraeger
2025-10-17 17:07 ` David Hildenbrand
2025-10-17 21:56 ` Balbir Singh
2025-10-17 22:15 ` David Hildenbrand
2025-10-17 22:41 ` David Hildenbrand
2025-10-20 7:01 ` Christian Borntraeger
2025-10-20 7:00 ` Christian Borntraeger
2025-10-20 8:41 ` David Hildenbrand
2025-10-20 9:04 ` Claudio Imbrenda
2025-10-27 16:47 ` Claudio Imbrenda
2025-10-27 16:59 ` David Hildenbrand
2025-10-27 17:06 ` Christian Borntraeger
2025-10-28 9:24 ` Balbir Singh
2025-10-28 13:01 ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
2025-10-28 13:01 ` [PATCH v1 1/1] " Claudio Imbrenda
2025-10-28 21:23 ` Balbir Singh
2025-10-29 10:00 ` David Hildenbrand
2025-10-29 10:20 ` Claudio Imbrenda
2025-10-28 22:53 ` [PATCH v1 0/1] " Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).