[PATCH v2 0/7] virtio-mem: s390 support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] virtio-mem: s390 support
@ 2024-10-14 14:46 David Hildenbrand
  2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet

Let's finally add s390 support for virtio-mem; my last RFC was sent
4 years ago, and a lot changed in the meantime.

The latest QEMU series is available at [1], which contains some more
details and a usage example on s390 (last patch).

There is not too much in here: The biggest part is querying a new diag(500)
STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".

The last two patches are not strictly required but certainly nice-to-have.

Note that -- in contrast to standby memory -- virtio-mem memory must be
configured to be automatically onlined as soon as hotplugged. The easiest
approach is using the "memhp_default_state=" kernel parameter or by using
proper udev rules. More details can be found at [2].

I have reviving+upstreaming a systemd service to handle configuring
that on my todo list, but for some reason I keep getting distracted ...

I tested various things, including:
 * Various memory hotplug/hotunplug combinations
 * Device hotplug/hotunplug
 * /proc/iomem output
 * reboot
 * kexec
 * kdump: make sure we don't hotplug memory

One remaining work item is kdump support for virtio-mem memory. This will
be sent out separately once initial support landed.

[1] https://lkml.kernel.org/r/20241008105455.2302628-1-david@redhat.com
[2] https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html

v1 -> v2:
* Document the new diag500 subfunction
* Use "s390" instead of "s390x" consistently

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>

David Hildenbrand (7):
  s390/kdump: implement is_kdump_kernel()
  Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  Documentation: s390-diag.rst: document diag500(STORAGE LIMIT)
    subfunction
  s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM
    memory devices
  virtio-mem: s390 support
  lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390
  s390/sparsemem: reduce section size to 128 MiB

 Documentation/virt/kvm/s390/s390-diag.rst | 32 ++++++++++++----
 arch/s390/boot/physmem_info.c             | 46 +++++++++++++++++++++--
 arch/s390/include/asm/kexec.h             |  4 ++
 arch/s390/include/asm/physmem_info.h      |  3 ++
 arch/s390/include/asm/sparsemem.h         |  2 +-
 arch/s390/kernel/crash_dump.c             |  6 +++
 drivers/virtio/Kconfig                    | 12 +++---
 lib/Kconfig.debug                         |  2 +-
 8 files changed, 89 insertions(+), 18 deletions(-)

base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
-- 
2.46.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 18:20   ` Heiko Carstens
  2024-10-14 14:46 ` [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall David Hildenbrand
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero

s390 currently always results in is_kdump_kernel() == false, because
it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
deactivate the elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic.

This is required for virtio-mem to reliably identify a kdump
environment to not try hotplugging memory.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/kexec.h | 4 ++++
 arch/s390/kernel/crash_dump.c | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
 
 void arch_kexec_unprotect_crashkres(void);
 #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
 #endif
 
 #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
 int arch_kimage_file_post_load_cleanup(struct kimage *image);
 #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
 #endif
+
 #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index edae13416196..cca1827d3d2e 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
 						       prot);
 }
 
+bool is_kdump_kernel(void)
+{
+	return oldmem_data.start && !is_ipl_type_dump();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
 static const char *nt_name(Elf64_Word type)
 {
 	const char *name = "LINUX";
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
@ 2024-10-14 18:20   ` Heiko Carstens
  2024-10-14 19:26     ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote:
> s390 currently always results in is_kdump_kernel() == false, because
> it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
> deactivate the elfcorehdr= kernel parameter.
> 
> Let's follow the powerpc example and implement our own logic.
> 
> This is required for virtio-mem to reliably identify a kdump
> environment to not try hotplugging memory.
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/include/asm/kexec.h | 4 ++++
>  arch/s390/kernel/crash_dump.c | 6 ++++++
>  2 files changed, 10 insertions(+)

Looks like this could work. But the comment in smp.c above
dump_available() needs to be updated.

Are you willing to do that, or should I provide an addon patch?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-14 18:20   ` Heiko Carstens
@ 2024-10-14 19:26     ` David Hildenbrand
  2024-10-15  8:30       ` Heiko Carstens
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:26 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 14.10.24 20:20, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote:
>> s390 currently always results in is_kdump_kernel() == false, because
>> it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
>> deactivate the elfcorehdr= kernel parameter.
>>
>> Let's follow the powerpc example and implement our own logic.
>>
>> This is required for virtio-mem to reliably identify a kdump
>> environment to not try hotplugging memory.
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/include/asm/kexec.h | 4 ++++
>>   arch/s390/kernel/crash_dump.c | 6 ++++++
>>   2 files changed, 10 insertions(+)
> 
> Looks like this could work. But the comment in smp.c above
> dump_available() needs to be updated.

A right, I remember that there was some outdated documentation.

> 
> Are you willing to do that, or should I provide an addon patch?
> 

I can squash the following:

diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..a4f538876462 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
   *    with sigp stop-and-store-status. The firmware or the boot-loader
   *    stored the registers of the boot CPU in the absolute lowcore in the
   *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
   *    The state for all CPUs except the boot CPU needs to be collected
   *    with sigp stop-and-store-status. The kexec code or the boot-loader
   *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old Kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter.
   */
  static bool dump_available(void)
  {


Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?

-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-14 19:26     ` David Hildenbrand
@ 2024-10-15  8:30       ` Heiko Carstens
  2024-10-15  8:41         ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15  8:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
> On 14.10.24 20:20, Heiko Carstens wrote:
> > Looks like this could work. But the comment in smp.c above
> > dump_available() needs to be updated.
> 
> A right, I remember that there was some outdated documentation.
> 
> > 
> > Are you willing to do that, or should I provide an addon patch?
> > 
> 
> I can squash the following:
> 
> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
> index 4df56fdb2488..a4f538876462 100644
> --- a/arch/s390/kernel/smp.c
> +++ b/arch/s390/kernel/smp.c
> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>   *    with sigp stop-and-store-status. The firmware or the boot-loader
>   *    stored the registers of the boot CPU in the absolute lowcore in the
>   *    memory of the old system.
> - * 3) kdump and the old kernel did not store the CPU state,
> - *    or stand-alone kdump for DASD
> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
> + * 3) kdump or stand-alone kdump for DASD
> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>   *    The state for all CPUs except the boot CPU needs to be collected
>   *    with sigp stop-and-store-status. The kexec code or the boot-loader
>   *    stored the registers of the boot CPU in the memory of the old system.
> - * 4) kdump and the old kernel stored the CPU state
> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
> - *    This case does not exist for s390 anymore, setup_arch explicitly
> - *    deactivates the elfcorehdr= kernel parameter
> + *
> + * Note that the old Kdump mode where the old kernel stored the CPU state

To be consistent with the rest of the comment, please write kdump in
all lower case characters, please.

> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent

Typo: kudmp.

> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?

Yes, it is some sort of kdump, even though a bit odd. But the comment
as it is doesn't need to be changed. Only at the very top, please also
change: "There are four cases" into "There are three cases".

Then it all looks good. Thanks a lot!


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15  8:30       ` Heiko Carstens
@ 2024-10-15  8:41         ` David Hildenbrand
  2024-10-15  8:53           ` David Hildenbrand
  2024-10-15 10:08           ` Heiko Carstens
  0 siblings, 2 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:41 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 15.10.24 10:30, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>> On 14.10.24 20:20, Heiko Carstens wrote:
>>> Looks like this could work. But the comment in smp.c above
>>> dump_available() needs to be updated.
>>
>> A right, I remember that there was some outdated documentation.
>>
>>>
>>> Are you willing to do that, or should I provide an addon patch?
>>>
>>
>> I can squash the following:
>>
>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>> index 4df56fdb2488..a4f538876462 100644
>> --- a/arch/s390/kernel/smp.c
>> +++ b/arch/s390/kernel/smp.c
>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>    *    with sigp stop-and-store-status. The firmware or the boot-loader
>>    *    stored the registers of the boot CPU in the absolute lowcore in the
>>    *    memory of the old system.
>> - * 3) kdump and the old kernel did not store the CPU state,
>> - *    or stand-alone kdump for DASD
>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>> + * 3) kdump or stand-alone kdump for DASD
>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>    *    The state for all CPUs except the boot CPU needs to be collected
>>    *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>    *    stored the registers of the boot CPU in the memory of the old system.
>> - * 4) kdump and the old kernel stored the CPU state
>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>> - *    deactivates the elfcorehdr= kernel parameter
>> + *
>> + * Note that the old Kdump mode where the old kernel stored the CPU state
> 
> To be consistent with the rest of the comment, please write kdump in
> all lower case characters, please.

It obviously was too late in the evening for me :) Thanks!

> 
>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
> 
> Typo: kudmp.
> 
>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
> 
> Yes, it is some sort of kdump, even though a bit odd.

My concern is that we'll now have

bool is_kdump_kernel(void)
{
        return oldmem_data.start && !is_ipl_type_dump();
}

Which matches 3), but if 2) is also called "kdump", then should it 
actually be

bool is_kdump_kernel(void)
{
        return oldmem_data.start;
}

?

When I wrote that code I was rather convinced that the variant in this 
patch is the right thing to do.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15  8:41         ` David Hildenbrand
@ 2024-10-15  8:53           ` David Hildenbrand
  2024-10-15  8:56             ` David Hildenbrand
  2024-10-15 10:08           ` Heiko Carstens
  1 sibling, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:53 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 15.10.24 10:41, David Hildenbrand wrote:
> On 15.10.24 10:30, Heiko Carstens wrote:
>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>> Looks like this could work. But the comment in smp.c above
>>>> dump_available() needs to be updated.
>>>
>>> A right, I remember that there was some outdated documentation.
>>>
>>>>
>>>> Are you willing to do that, or should I provide an addon patch?
>>>>
>>>
>>> I can squash the following:
>>>
>>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>>> index 4df56fdb2488..a4f538876462 100644
>>> --- a/arch/s390/kernel/smp.c
>>> +++ b/arch/s390/kernel/smp.c
>>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>>     *    with sigp stop-and-store-status. The firmware or the boot-loader
>>>     *    stored the registers of the boot CPU in the absolute lowcore in the
>>>     *    memory of the old system.
>>> - * 3) kdump and the old kernel did not store the CPU state,
>>> - *    or stand-alone kdump for DASD
>>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>>> + * 3) kdump or stand-alone kdump for DASD
>>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>>     *    The state for all CPUs except the boot CPU needs to be collected
>>>     *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>>     *    stored the registers of the boot CPU in the memory of the old system.
>>> - * 4) kdump and the old kernel stored the CPU state
>>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>>> - *    deactivates the elfcorehdr= kernel parameter
>>> + *
>>> + * Note that the old Kdump mode where the old kernel stored the CPU state
>>
>> To be consistent with the rest of the comment, please write kdump in
>> all lower case characters, please.
> 
> It obviously was too late in the evening for me :) Thanks!
> 
>>
>>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
>>
>> Typo: kudmp.
>>
>>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
>>
>> Yes, it is some sort of kdump, even though a bit odd.
> 
> My concern is that we'll now have
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start && !is_ipl_type_dump();
> }
> 
> Which matches 3), but if 2) is also called "kdump", then should it
> actually be
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start;
> }
> 
> ?
> 
> When I wrote that code I was rather convinced that the variant in this
> patch is the right thing to do.
> 

I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct:

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index cca1827d3d2e..fbc5de66d03b 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
         u64 hdr_off;
  
         /* If we are not in kdump or zfcp/nvme dump mode return */
-       if (!oldmem_data.start && !is_ipl_type_dump())
+       if (!dump_available())
                 return 0;
         /* If we cannot get HSA size for zfcp/nvme dump return error */
         if (is_ipl_type_dump() && !sclp.hsa_size)
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index b695f980bbde..09578f400ef7 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -148,7 +148,7 @@ static void os_info_old_init(void)
  
         if (os_info_init)
                 return;
-       if (!oldmem_data.start && !is_ipl_type_dump())
+       if (!dump_available())
                 goto fail;
         if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
                 goto fail;
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 33cebb91b933..6a194b4f6ba5 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -300,9 +300,7 @@ static int __init zcore_init(void)
         unsigned char arch;
         int rc;
  
-       if (!is_ipl_type_dump())
-               return -ENODATA;
-       if (oldmem_data.start)
+       if (is_kdump_kernel())
                 return -ENODATA;
  
         zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long));


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15  8:53           ` David Hildenbrand
@ 2024-10-15  8:56             ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:56 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 15.10.24 10:53, David Hildenbrand wrote:
> On 15.10.24 10:41, David Hildenbrand wrote:
>> On 15.10.24 10:30, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>>> Looks like this could work. But the comment in smp.c above
>>>>> dump_available() needs to be updated.
>>>>
>>>> A right, I remember that there was some outdated documentation.
>>>>
>>>>>
>>>>> Are you willing to do that, or should I provide an addon patch?
>>>>>
>>>>
>>>> I can squash the following:
>>>>
>>>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>>>> index 4df56fdb2488..a4f538876462 100644
>>>> --- a/arch/s390/kernel/smp.c
>>>> +++ b/arch/s390/kernel/smp.c
>>>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>>>      *    with sigp stop-and-store-status. The firmware or the boot-loader
>>>>      *    stored the registers of the boot CPU in the absolute lowcore in the
>>>>      *    memory of the old system.
>>>> - * 3) kdump and the old kernel did not store the CPU state,
>>>> - *    or stand-alone kdump for DASD
>>>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>>>> + * 3) kdump or stand-alone kdump for DASD
>>>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>>>      *    The state for all CPUs except the boot CPU needs to be collected
>>>>      *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>>>      *    stored the registers of the boot CPU in the memory of the old system.
>>>> - * 4) kdump and the old kernel stored the CPU state
>>>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>>>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>>>> - *    deactivates the elfcorehdr= kernel parameter
>>>> + *
>>>> + * Note that the old Kdump mode where the old kernel stored the CPU state
>>>
>>> To be consistent with the rest of the comment, please write kdump in
>>> all lower case characters, please.
>>
>> It obviously was too late in the evening for me :) Thanks!
>>
>>>
>>>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>>>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
>>>
>>> Typo: kudmp.
>>>
>>>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>>>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
>>>
>>> Yes, it is some sort of kdump, even though a bit odd.
>>
>> My concern is that we'll now have
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start && !is_ipl_type_dump();
>> }
>>
>> Which matches 3), but if 2) is also called "kdump", then should it
>> actually be
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start;
>> }
>>
>> ?
>>
>> When I wrote that code I was rather convinced that the variant in this
>> patch is the right thing to do.
>>
> 
> I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct:
> 
> diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
> index cca1827d3d2e..fbc5de66d03b 100644
> --- a/arch/s390/kernel/crash_dump.c
> +++ b/arch/s390/kernel/crash_dump.c
> @@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
>           u64 hdr_off;
>    
>           /* If we are not in kdump or zfcp/nvme dump mode return */
> -       if (!oldmem_data.start && !is_ipl_type_dump())
> +       if (!dump_available())
>                   return 0;
>           /* If we cannot get HSA size for zfcp/nvme dump return error */
>           if (is_ipl_type_dump() && !sclp.hsa_size)
> diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
> index b695f980bbde..09578f400ef7 100644
> --- a/arch/s390/kernel/os_info.c
> +++ b/arch/s390/kernel/os_info.c
> @@ -148,7 +148,7 @@ static void os_info_old_init(void)
>    
>           if (os_info_init)
>                   return;
> -       if (!oldmem_data.start && !is_ipl_type_dump())
> +       if (!dump_available())
>                   goto fail;
>           if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
>                   goto fail;
> diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
> index 33cebb91b933..6a194b4f6ba5 100644
> --- a/drivers/s390/char/zcore.c
> +++ b/drivers/s390/char/zcore.c
> @@ -300,9 +300,7 @@ static int __init zcore_init(void)
>           unsigned char arch;
>           int rc;
>    
> -       if (!is_ipl_type_dump())
> -               return -ENODATA;
> -       if (oldmem_data.start)
> +       if (is_kdump_kernel())
>                   return -ENODATA;
>    
>           zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long));
> 
> 

Ugh, ignore the last one, I'm just confused about dumping options on 
s390x at this point :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15  8:41         ` David Hildenbrand
  2024-10-15  8:53           ` David Hildenbrand
@ 2024-10-15 10:08           ` Heiko Carstens
  2024-10-15 10:40             ` David Hildenbrand
  1 sibling, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15 10:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero, Alexander Egorenkov, Mikhail Zaslonko

On Tue, Oct 15, 2024 at 10:41:21AM +0200, David Hildenbrand wrote:
> On 15.10.24 10:30, Heiko Carstens wrote:
> > On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
> > > On 14.10.24 20:20, Heiko Carstens wrote:
> > > > Looks like this could work. But the comment in smp.c above
> > > > dump_available() needs to be updated.
> > > 
> > > A right, I remember that there was some outdated documentation.

...

> My concern is that we'll now have
> 
> bool is_kdump_kernel(void)
> {
>        return oldmem_data.start && !is_ipl_type_dump();
> }
> 
> Which matches 3), but if 2) is also called "kdump", then should it actually
> be
> 
> bool is_kdump_kernel(void)
> {
>        return oldmem_data.start;
> }
> 
> ?
> 
> When I wrote that code I was rather convinced that the variant in this patch
> is the right thing to do.

Oh well, we simply of too many dump options. I'll let Alexander and
Mikhail better comment on this :)

Alexander, Mikhail, the discussion starts here, and we need a sane
is_kdump_kernel() for s390:
https://lore.kernel.org/all/20241014144622.876731-1-david@redhat.com/


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15 10:08           ` Heiko Carstens
@ 2024-10-15 10:40             ` David Hildenbrand
  2024-10-16 13:35               ` Alexander Egorenkov
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15 10:40 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero, Alexander Egorenkov, Mikhail Zaslonko

On 15.10.24 12:08, Heiko Carstens wrote:
> On Tue, Oct 15, 2024 at 10:41:21AM +0200, David Hildenbrand wrote:
>> On 15.10.24 10:30, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>>> Looks like this could work. But the comment in smp.c above
>>>>> dump_available() needs to be updated.
>>>>
>>>> A right, I remember that there was some outdated documentation.
> 
> ...
> 
>> My concern is that we'll now have
>>
>> bool is_kdump_kernel(void)
>> {
>>         return oldmem_data.start && !is_ipl_type_dump();
>> }
>>
>> Which matches 3), but if 2) is also called "kdump", then should it actually
>> be
>>
>> bool is_kdump_kernel(void)
>> {
>>         return oldmem_data.start;
>> }
>>
>> ?
>>
>> When I wrote that code I was rather convinced that the variant in this patch
>> is the right thing to do.
> 
> Oh well, we simply of too many dump options. I'll let Alexander and
> Mikhail better comment on this :)

Thanks!

> 
> Alexander, Mikhail, the discussion starts here, and we need a sane
> is_kdump_kernel() for s390:
> https://lore.kernel.org/all/20241014144622.876731-1-david@redhat.com/
> 

With the following cleanup in mind:

 From 9fbbff43f725a8482ce9e85857316ab8484ff3c8 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 15 Oct 2024 11:07:43 +0200
Subject: [PATCH] s390: use !dump_available() instead of "!oldmem_data.start &&
  !is_ipl_type_dump()"

Let's make the code more readable:

    !dump_available()
-> !(oldmem_data.start || is_ipl_type_dump())
-> !oldmem_data.start && !is_ipl_type_dump()

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/s390/kernel/crash_dump.c | 2 +-
  arch/s390/kernel/os_info.c    | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index cca1827d3d2e..fbc5de66d03b 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
  	u64 hdr_off;
  
  	/* If we are not in kdump or zfcp/nvme dump mode return */
-	if (!oldmem_data.start && !is_ipl_type_dump())
+	if (!dump_available())
  		return 0;
  	/* If we cannot get HSA size for zfcp/nvme dump return error */
  	if (is_ipl_type_dump() && !sclp.hsa_size)
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index b695f980bbde..09578f400ef7 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -148,7 +148,7 @@ static void os_info_old_init(void)
  
  	if (os_info_init)
  		return;
-	if (!oldmem_data.start && !is_ipl_type_dump())
+	if (!dump_available())
  		goto fail;
  	if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
  		goto fail;
-- 
2.46.1


It looks like we create /proc/vmcore, allocating the elfcorehdr essentially
if dump_available() && (!is_ipl_type_dump() || !sclp.hsa_size).

So in the condition, we would have previously made is_kdump_kernel() happt *after* the
fs init calls.

So I'm wondering if is_kdump_kernel() should simply translate to dump_available(). It
doesn't sound completely right to create /proc/vmcore for "standard zfcp/nvme dump",
but that is what we currently do.

I would have thought we have that special zcore thingy for that.


So I think we should either have

bool is_kdump_kernel(void)
{
	return dump_available();
}

Or

bool is_kdump_kernel(void)
{
	return dump_available() && (!is_ipl_type_dump() || !sclp.hsa_size);
}


I'd prefer the first version.

-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-15 10:40             ` David Hildenbrand
@ 2024-10-16 13:35               ` Alexander Egorenkov
  2024-10-16 15:47                 ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Alexander Egorenkov @ 2024-10-16 13:35 UTC (permalink / raw)
  To: david
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, egorenar, eperezma,
	frankja, gor, hca, imbrenda, jasowang, kvm, linux-doc,
	linux-kernel, linux-mm, linux-s390, mcasquer, mst, svens, thuth,
	virtualization, xuanzhuo, zaslonko

Hi David,

> My concern is that we'll now have
>
> bool is_kdump_kernel(void)
> {
>         return oldmem_data.start && !is_ipl_type_dump();
> }
>
> Which matches 3), but if 2) is also called "kdump", then should it actually
> be
>
> bool is_kdump_kernel(void)
> {
>         return oldmem_data.start;
> }
>
> ?
>
> When I wrote that code I was rather convinced that the variant in this patch
> is the right thing to do.

A short explanation about what a stand-alone kdump is.

* First, it's not really a _regular_ kdump activated with kexec-tools and
  executed by Linux itself but a regular stand-alone dump (SCSI) from the
  FW's perspective (one has to use HMC or dumpconf to execute it and not
  with kexec-tools like for the _regular_ kdump).
* One has to reserve crashkernel memory region in the old crashed kernel
  even if it remains unused until the dump starts.
* zipl uses regular kdump kernel and initramfs to create stand-alone
  dumper images and to write them to a dump disk which is used for
  IPLIng the stand-alone dumper.
* The zipl bootloader takes care of transferring the old kernel memory
  saved in HSA by the FW to the crashkernel memory region reserved by the old
  crashed kernel before it enters the dumper. The HSA memory is released
  by the zipl bootloader _before_ the dumper image is entered,
  therefore, we cannot use HSA to read old kernel memory, and instead
  use memory from crashkernel region, just like the regular kdump.
* is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
  the dumper like a regular stand-alone dump (e.g. zfcpdump).
* Summarized, zipl bootloader prepares an environment which is expected by
  the regular kdump for a stand-alone kdump dumper before it is entered.

In my opinion, the correct version of is_kdump_kernel() would be

bool is_kdump_kernel(void)
{
        return oldmem_data.start;
}

because Linux kernel doesn't differentiate between both the regular
and the stand-alone kdump where it matters while performing dumper
operations (e.g. reading saved old kernel memory from crashkernel memory region).

Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
is to tell us whether Linux kernel runs in a kdump like environment and not
whether the current mode is identical to the proper and true kdump,
right ? And if stand-alone kdump swims like a duck, quacks like one, then it
is one, regardless how it was started, by kexecing or IPLing
from a disk.

The stand-alone kdump has a very special use case which most users will
never encounter. And usually, one just takes zfcpdump instead which is
more robust and much smaller considering how big kdump initrd can get.
stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.

Regards
Alex

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-16 13:35               ` Alexander Egorenkov
@ 2024-10-16 15:47                 ` David Hildenbrand
  2024-10-16 15:54                   ` David Hildenbrand
  2024-10-21 12:46                   ` Alexander Egorenkov
  0 siblings, 2 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-16 15:47 UTC (permalink / raw)
  To: Alexander Egorenkov
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, eperezma, frankja,
	gor, hca, imbrenda, jasowang, kvm, linux-doc, linux-kernel,
	linux-mm, linux-s390, mcasquer, mst, svens, thuth, virtualization,
	xuanzhuo, zaslonko

>>
>> When I wrote that code I was rather convinced that the variant in this patch
>> is the right thing to do.
> 
> A short explanation about what a stand-alone kdump is.
> 
> * First, it's not really a _regular_ kdump activated with kexec-tools and
>    executed by Linux itself but a regular stand-alone dump (SCSI) from the
>    FW's perspective (one has to use HMC or dumpconf to execute it and not
>    with kexec-tools like for the _regular_ kdump).

Ah, that makes sense.

> * One has to reserve crashkernel memory region in the old crashed kernel
>    even if it remains unused until the dump starts.
> * zipl uses regular kdump kernel and initramfs to create stand-alone
>    dumper images and to write them to a dump disk which is used for
>    IPLIng the stand-alone dumper.
> * The zipl bootloader takes care of transferring the old kernel memory
>    saved in HSA by the FW to the crashkernel memory region reserved by the old
>    crashed kernel before it enters the dumper. The HSA memory is released
>    by the zipl bootloader _before_ the dumper image is entered,
>    therefore, we cannot use HSA to read old kernel memory, and instead
>    use memory from crashkernel region, just like the regular kdump.
> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>    the dumper like a regular stand-alone dump (e.g. zfcpdump).
> * Summarized, zipl bootloader prepares an environment which is expected by
>    the regular kdump for a stand-alone kdump dumper before it is entered.

Thanks for the details!

> 
> In my opinion, the correct version of is_kdump_kernel() would be
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start;
> }
> 
> because Linux kernel doesn't differentiate between both the regular
> and the stand-alone kdump where it matters while performing dumper
> operations (e.g. reading saved old kernel memory from crashkernel memory region).
> 

Right, but if we consider "/proc/vmcore is available", a better version 
would IMHO be:

bool is_kdump_kernel(void)
{
           return dump_available();
}

Because that is mostly (not completely) how is_kdump_kernel() would have 
worked right now *after* we had the elfcorehdr_alloc() during the 
fs_init call.


> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
> is to tell us whether Linux kernel runs in a kdump like environment and not
> whether the current mode is identical to the proper and true kdump,
> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
> is one, regardless how it was started, by kexecing or IPLing
> from a disk.

Same thinking here.

> 
> The stand-alone kdump has a very special use case which most users will
> never encounter. And usually, one just takes zfcpdump instead which is
> more robust and much smaller considering how big kdump initrd can get.
> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.

Makes sense, so it boils down to either

bool is_kdump_kernel(void)
{
          return oldmem_data.start;
}

Which means is_kdump_kernel() can be "false" even though /proc/vmcore is 
available or

bool is_kdump_kernel(void)
{
          return dump_available();
}

Which means is_kdump_kernel() can never be "false" if /proc/vmcore is 
available. There is the chance of is_kdump_kernel() being "true" if 
"elfcorehdr_alloc()" fails with -ENODEV.


You're call :) Thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-16 15:47                 ` David Hildenbrand
@ 2024-10-16 15:54                   ` David Hildenbrand
  2024-10-21 12:46                   ` Alexander Egorenkov
  1 sibling, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-16 15:54 UTC (permalink / raw)
  To: Alexander Egorenkov
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, eperezma, frankja,
	gor, hca, imbrenda, jasowang, kvm, linux-doc, linux-kernel,
	linux-mm, linux-s390, mcasquer, mst, svens, thuth, virtualization,
	xuanzhuo, zaslonko

On 16.10.24 17:47, David Hildenbrand wrote:
>>>
>>> When I wrote that code I was rather convinced that the variant in this patch
>>> is the right thing to do.
>>
>> A short explanation about what a stand-alone kdump is.
>>
>> * First, it's not really a _regular_ kdump activated with kexec-tools and
>>     executed by Linux itself but a regular stand-alone dump (SCSI) from the
>>     FW's perspective (one has to use HMC or dumpconf to execute it and not
>>     with kexec-tools like for the _regular_ kdump).
> 
> Ah, that makes sense.
> 
>> * One has to reserve crashkernel memory region in the old crashed kernel
>>     even if it remains unused until the dump starts.
>> * zipl uses regular kdump kernel and initramfs to create stand-alone
>>     dumper images and to write them to a dump disk which is used for
>>     IPLIng the stand-alone dumper.
>> * The zipl bootloader takes care of transferring the old kernel memory
>>     saved in HSA by the FW to the crashkernel memory region reserved by the old
>>     crashed kernel before it enters the dumper. The HSA memory is released
>>     by the zipl bootloader _before_ the dumper image is entered,
>>     therefore, we cannot use HSA to read old kernel memory, and instead
>>     use memory from crashkernel region, just like the regular kdump.
>> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>>     the dumper like a regular stand-alone dump (e.g. zfcpdump).
>> * Summarized, zipl bootloader prepares an environment which is expected by
>>     the regular kdump for a stand-alone kdump dumper before it is entered.
> 
> Thanks for the details!
> 
>>
>> In my opinion, the correct version of is_kdump_kernel() would be
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start;
>> }
>>
>> because Linux kernel doesn't differentiate between both the regular
>> and the stand-alone kdump where it matters while performing dumper
>> operations (e.g. reading saved old kernel memory from crashkernel memory region).
>>
> 
> Right, but if we consider "/proc/vmcore is available", a better version
> would IMHO be:
> 
> bool is_kdump_kernel(void)
> {
>             return dump_available();
> }
> 
> Because that is mostly (not completely) how is_kdump_kernel() would have
> worked right now *after* we had the elfcorehdr_alloc() during the
> fs_init call.
> 
> 
>> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
>> is to tell us whether Linux kernel runs in a kdump like environment and not
>> whether the current mode is identical to the proper and true kdump,
>> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
>> is one, regardless how it was started, by kexecing or IPLing
>> from a disk.
> 
> Same thinking here.
> 
>>
>> The stand-alone kdump has a very special use case which most users will
>> never encounter. And usually, one just takes zfcpdump instead which is
>> more robust and much smaller considering how big kdump initrd can get.
>> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.
> 
> Makes sense, so it boils down to either
> 
> bool is_kdump_kernel(void)
> {
>            return oldmem_data.start;
> }
> 
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
> available or
> 
> bool is_kdump_kernel(void)
> {
>            return dump_available();
> }
> 
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
> available. There is the chance of is_kdump_kernel() being "true" if
> "elfcorehdr_alloc()" fails with -ENODEV.
> 
> 
> You're call :) Thanks!
> 

What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:


 From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()

s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().

This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.

Update the documentation above dump_available().

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/s390/include/asm/kexec.h |  4 ++++
  arch/s390/kernel/crash_dump.c |  6 ++++++
  arch/s390/kernel/smp.c        | 16 ++++++++--------
  3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
  
  void arch_kexec_unprotect_crashkres(void);
  #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
  #endif
  
  #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
  int arch_kimage_file_post_load_cleanup(struct kimage *image);
  #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
  #endif
+
  #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
  						       prot);
  }
  
+bool is_kdump_kernel(void)
+{
+	return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
  static const char *nt_name(Elf64_Word type)
  {
  	const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
  
  /*
   * Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
   * 1) standard zfcp/nvme dump
   *    condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
   *    The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
   *    with sigp stop-and-store-status. The firmware or the boot-loader
   *    stored the registers of the boot CPU in the absolute lowcore in the
   *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
   *    The state for all CPUs except the boot CPU needs to be collected
   *    with sigp stop-and-store-status. The kexec code or the boot-loader
   *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
   */
  static bool dump_available(void)
  {
-- 
2.46.1


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-16 15:47                 ` David Hildenbrand
  2024-10-16 15:54                   ` David Hildenbrand
@ 2024-10-21 12:46                   ` Alexander Egorenkov
  2024-10-21 14:45                     ` David Hildenbrand
  1 sibling, 1 reply; 59+ messages in thread
From: Alexander Egorenkov @ 2024-10-21 12:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, eperezma, frankja,
	gor, hca, imbrenda, jasowang, kvm, linux-doc, linux-kernel,
	linux-mm, linux-s390, mcasquer, mst, svens, thuth, virtualization,
	xuanzhuo, zaslonko

Hi David,

David Hildenbrand <david@redhat.com> writes:

> Makes sense, so it boils down to either
>
> bool is_kdump_kernel(void)
> {
>           return oldmem_data.start;
> }
>
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is 
> available or
>
> bool is_kdump_kernel(void)
> {
>           return dump_available();
> }
>
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is 
> available. There is the chance of is_kdump_kernel() being "true" if 
> "elfcorehdr_alloc()" fails with -ENODEV.

Do you consider is_kdump_kernel() returning "true" in case of zfcpdump or
nvme/eckd+ldipl dump (also called NGDump) okay ? Because
dump_available() would return "true" in such cases too.
If yes then please explain why, i might have missed a previous
explanation from you.

I'm afraid everyone will make wrong assumptions while reading the name
of is_kdump_kernel() and assuming that it only applies to kdump or
kdump-alike dumps (like stand-alone kdump), and, therefore, introduce
bugs because the name of the function conveys the wrong idea to code
readers. I consider dump_available() as a superset of is_kdump_kernel()
and, therefore, to me they are not equivalent.

I have the feeling you consider is_kdump_kernel() equivalent to
"/proc/vmcore" being present and not really saying anything about
whether kdump is active ?

Regards
Alex

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-21 12:46                   ` Alexander Egorenkov
@ 2024-10-21 14:45                     ` David Hildenbrand
  2024-10-23  7:42                       ` Heiko Carstens
  2024-10-23 11:17                       ` Alexander Egorenkov
  0 siblings, 2 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-21 14:45 UTC (permalink / raw)
  To: Alexander Egorenkov
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, eperezma, frankja,
	gor, hca, imbrenda, jasowang, kvm, linux-doc, linux-kernel,
	linux-mm, linux-s390, mcasquer, mst, svens, thuth, virtualization,
	xuanzhuo, zaslonko



Am 21.10.24 um 14:46 schrieb Alexander Egorenkov:
> Hi David,
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> Makes sense, so it boils down to either
>>
>> bool is_kdump_kernel(void)
>> {
>>            return oldmem_data.start;
>> }
>>
>> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
>> available or
>>
>> bool is_kdump_kernel(void)
>> {
>>            return dump_available();
>> }
>>
>> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
>> available. There is the chance of is_kdump_kernel() being "true" if
>> "elfcorehdr_alloc()" fails with -ENODEV.

Thanks for having another look!

> 
> Do you consider is_kdump_kernel() returning "true" in case of zfcpdump or
> nvme/eckd+ldipl dump (also called NGDump) okay ? Because
> dump_available() would return "true" in such cases too.
> If yes then please explain why, i might have missed a previous
> explanation from you.

I consider it okay because this is the current behavior after elfcorehdr_alloc() 
succeeded and set elfcorehdr_addr.

Not sure if it is the right think to do, though :)

Whatever we do, we should achieve on s390 that the result of is_kdump_kernel() 
is consistent throughout the booting stages, just like on all other architectures.

Right now it goes from false->true when /proc/vmcore gets initialized (and only 
if it gets initialized properly).

> 
> I'm afraid everyone will make wrong assumptions while reading the name
> of is_kdump_kernel() and assuming that it only applies to kdump or
> kdump-alike dumps (like stand-alone kdump), and, therefore, introduce
> bugs because the name of the function conveys the wrong idea to code
> readers. I consider dump_available() as a superset of is_kdump_kernel()
> and, therefore, to me they are not equivalent.
 > > I have the feeling you consider is_kdump_kernel() equivalent to
> "/proc/vmcore" being present and not really saying anything about
> whether kdump is active ?

Yes, but primarily because this is the existing handling.

Staring at the powerpc implementation:

/*
  * Return true only when kexec based kernel dump capturing method is used.
  * This ensures all restritions applied for kdump case are not automatically
  * applied for fadump case.
  */
bool is_kdump_kernel(void)
{
	return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX;
}
EXPORT_SYMBOL_GPL(is_kdump_kernel);


Which was added by

commit b098f1c32365304633077d73e4ae21c72d4241b3
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Tue Sep 12 13:59:50 2023 +0530

     powerpc/fadump: make is_kdump_kernel() return false when fadump is active

     Currently, is_kdump_kernel() returns true in crash dump capture kernel
     for both kdump and fadump crash dump capturing methods, as both these
     methods set elfcorehdr_addr. Some restrictions enforced for crash dump
     capture kernel, based on is_kdump_kernel(), are specifically meant for
     kdump case and not desirable for fadump - eg. IO queues restriction in
     device drivers. So, define is_kdump_kernel() to return false when f/w
     assisted dump is active.


For my purpose (virtio-mem), it's sufficient to only support "kexec triggered 
kdump" either way, so I don't care.

So for me it's good enough to have

bool is_kdump_kernel(void)
{
	return oldmem_data.start;
}

And trying to document the situation in a comment like powerpc does :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-21 14:45                     ` David Hildenbrand
@ 2024-10-23  7:42                       ` Heiko Carstens
  2024-10-23  7:45                         ` David Hildenbrand
  2024-10-23 11:17                       ` Alexander Egorenkov
  1 sibling, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-23  7:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Alexander Egorenkov, agordeev, akpm, borntraeger, cohuck, corbet,
	eperezma, frankja, gor, imbrenda, jasowang, kvm, linux-doc,
	linux-kernel, linux-mm, linux-s390, mcasquer, mst, svens, thuth,
	virtualization, xuanzhuo, zaslonko

On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote:
> For my purpose (virtio-mem), it's sufficient to only support "kexec
> triggered kdump" either way, so I don't care.
> 
> So for me it's good enough to have
> 
> bool is_kdump_kernel(void)
> {
> 	return oldmem_data.start;
> }
> 
> And trying to document the situation in a comment like powerpc does :)

Then let's go forward with this, since as Alexander wrote, this is returning
what is actually happening. If this is not sufficient or something breaks we
can still address this.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-23  7:42                       ` Heiko Carstens
@ 2024-10-23  7:45                         ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-23  7:45 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Alexander Egorenkov, agordeev, akpm, borntraeger, cohuck, corbet,
	eperezma, frankja, gor, imbrenda, jasowang, kvm, linux-doc,
	linux-kernel, linux-mm, linux-s390, mcasquer, mst, svens, thuth,
	virtualization, xuanzhuo, zaslonko

On 23.10.24 09:42, Heiko Carstens wrote:
> On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote:
>> For my purpose (virtio-mem), it's sufficient to only support "kexec
>> triggered kdump" either way, so I don't care.
>>
>> So for me it's good enough to have
>>
>> bool is_kdump_kernel(void)
>> {
>> 	return oldmem_data.start;
>> }
>>
>> And trying to document the situation in a comment like powerpc does :)
> 
> Then let's go forward with this, since as Alexander wrote, this is returning
> what is actually happening. If this is not sufficient or something breaks we
> can still address this.
> 

Yes, I'll send this change separately from the other virtio-mem stuff 
out today.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
  2024-10-21 14:45                     ` David Hildenbrand
  2024-10-23  7:42                       ` Heiko Carstens
@ 2024-10-23 11:17                       ` Alexander Egorenkov
  1 sibling, 0 replies; 59+ messages in thread
From: Alexander Egorenkov @ 2024-10-23 11:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: agordeev, akpm, borntraeger, cohuck, corbet, eperezma, frankja,
	gor, hca, imbrenda, jasowang, kvm, linux-doc, linux-kernel,
	linux-mm, linux-s390, mcasquer, mst, svens, thuth, virtualization,
	xuanzhuo, zaslonko

Hi David,

David Hildenbrand <david@redhat.com> writes:


> Staring at the powerpc implementation:
>
> /*
>   * Return true only when kexec based kernel dump capturing method is used.
>   * This ensures all restritions applied for kdump case are not automatically
>   * applied for fadump case.
>   */
> bool is_kdump_kernel(void)
> {
> 	return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX;
> }
> EXPORT_SYMBOL_GPL(is_kdump_kernel);

Thanks for the pointer.

I would say power's version is semantically equivalent to what i have in
mind for s390 :) If a dump kernel is running, but not a stand-alone
one (apart from sa kdump), then it's a kdump kernel. 

Regards
Alex


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
  2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 18:04   ` Heiko Carstens
  2024-10-14 14:46 ` [PATCH v2 3/7] Documentation: s390-diag.rst: document diag500(STORAGE LIMIT) subfunction David Hildenbrand
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet

Let's make it a generic KVM hypercall, allowing other subfunctions to
be more independent of virtio.

This is a preparation for documenting a new hypercall.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/virt/kvm/s390/s390-diag.rst | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/s390/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst
index ca85f030eb0b..d9b7c6cbc99e 100644
--- a/Documentation/virt/kvm/s390/s390-diag.rst
+++ b/Documentation/virt/kvm/s390/s390-diag.rst
@@ -35,15 +35,16 @@ DIAGNOSE function codes not specific to KVM, please refer to the
 documentation for the s390 hypervisors defining them.
 
 
-DIAGNOSE function code 'X'500' - KVM virtio functions
------------------------------------------------------
+DIAGNOSE function code 'X'500' - KVM functions
+----------------------------------------------
 
-If the function code specifies 0x500, various virtio-related functions
-are performed.
+If the function code specifies 0x500, various KVM-specific functions
+are performed, including virtio functions.
 
-General register 1 contains the virtio subfunction code. Supported
-virtio subfunctions depend on KVM's userspace. Generally, userspace
-provides either s390-virtio (subcodes 0-2) or virtio-ccw (subcode 3).
+General register 1 contains the subfunction code. Supported subfunctions
+depend on KVM's userspace. Regarding virtio subfunctions, generally
+userspace provides either s390-virtio (subcodes 0-2) or virtio-ccw
+(subcode 3).
 
 Upon completion of the DIAGNOSE instruction, general register 2 contains
 the function's return code, which is either a return code or a subcode
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-14 14:46 ` [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall David Hildenbrand
@ 2024-10-14 18:04   ` Heiko Carstens
  2024-10-14 19:35     ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Mon, Oct 14, 2024 at 04:46:14PM +0200, David Hildenbrand wrote:
> Let's make it a generic KVM hypercall, allowing other subfunctions to
> be more independent of virtio.
> 
> This is a preparation for documenting a new hypercall.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  Documentation/virt/kvm/s390/s390-diag.rst | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

...

> -DIAGNOSE function code 'X'500' - KVM virtio functions
> ------------------------------------------------------
> +DIAGNOSE function code 'X'500' - KVM functions
> +----------------------------------------------
>  
> -If the function code specifies 0x500, various virtio-related functions
> -are performed.
> +If the function code specifies 0x500, various KVM-specific functions
> +are performed, including virtio functions.
>  
> -General register 1 contains the virtio subfunction code. Supported
> -virtio subfunctions depend on KVM's userspace. Generally, userspace
> -provides either s390-virtio (subcodes 0-2) or virtio-ccw (subcode 3).
> +General register 1 contains the subfunction code. Supported subfunctions
> +depend on KVM's userspace. Regarding virtio subfunctions, generally
> +userspace provides either s390-virtio (subcodes 0-2) or virtio-ccw
> +(subcode 3).

Reading this file leaves a number of questions open: how does one know
which subcodes are supported, and what happens if an unsupported
subcode is used?

I'm afraid there is no indication available and the only way to figure
out is to try and if it is unsupported the result is a specification
exception. Is that correct?

If so, it would be nice to document that too; but that is not
necessarily your problem.

I guess we won't see too many new diag 500 subcodes, or would it make
sense to implement some query subcode?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-14 18:04   ` Heiko Carstens
@ 2024-10-14 19:35     ` David Hildenbrand
  2024-10-15  8:12       ` Heiko Carstens
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:35 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On 14.10.24 20:04, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:14PM +0200, David Hildenbrand wrote:
>> Let's make it a generic KVM hypercall, allowing other subfunctions to
>> be more independent of virtio.
>>
>> This is a preparation for documenting a new hypercall.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   Documentation/virt/kvm/s390/s390-diag.rst | 15 ++++++++-------
>>   1 file changed, 8 insertions(+), 7 deletions(-)
> 
> ...
> 
>> -DIAGNOSE function code 'X'500' - KVM virtio functions
>> ------------------------------------------------------
>> +DIAGNOSE function code 'X'500' - KVM functions
>> +----------------------------------------------
>>   
>> -If the function code specifies 0x500, various virtio-related functions
>> -are performed.
>> +If the function code specifies 0x500, various KVM-specific functions
>> +are performed, including virtio functions.
>>   
>> -General register 1 contains the virtio subfunction code. Supported
>> -virtio subfunctions depend on KVM's userspace. Generally, userspace
>> -provides either s390-virtio (subcodes 0-2) or virtio-ccw (subcode 3).
>> +General register 1 contains the subfunction code. Supported subfunctions
>> +depend on KVM's userspace. Regarding virtio subfunctions, generally
>> +userspace provides either s390-virtio (subcodes 0-2) or virtio-ccw
>> +(subcode 3).
> 
> Reading this file leaves a number of questions open: how does one know
> which subcodes are supported, and what happens if an unsupported
> subcode is used?

Currently they have to be sensed
> 
> I'm afraid there is no indication available and the only way to figure
> out is to try and if it is unsupported the result is a specification
> exception. Is that correct?

Yes exactly.

> 
> If so, it would be nice to document that too; but that is not
> necessarily your problem.

I can squash:

diff --git a/Documentation/virt/kvm/s390/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst
index d9b7c6cbc99e..48a326d41cc0 100644
--- a/Documentation/virt/kvm/s390/s390-diag.rst
+++ b/Documentation/virt/kvm/s390/s390-diag.rst
@@ -50,6 +50,9 @@ Upon completion of the DIAGNOSE instruction, general register 2 contains
  the function's return code, which is either a return code or a subcode
  specific value.
  
+If the specified subfunction is not supported, a SPECIFICATION exception
+will be triggered.
+
  Subcode 0 - s390-virtio notification and early console printk
      Handled by userspace.
  


> 
> I guess we won't see too many new diag 500 subcodes, or would it make
> sense to implement some query subcode?

In the context of STORAGE LIMIT, a "query" subfunction is not really beneficial:

it's either one invocation of "query", conditionally followed by one invocation of "STORAGE LIMIT"
vs. one invocation of "STORAGE LIMIT".

Once there might be a bunch of other subfunctions, a "query" might make more sense.

Thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-14 19:35     ` David Hildenbrand
@ 2024-10-15  8:12       ` Heiko Carstens
  2024-10-15  8:16         ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15  8:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
> On 14.10.24 20:04, Heiko Carstens wrote:
> > On Mon, Oct 14, 2024 at 04:46:14PM +0200, David Hildenbrand wrote:
> > If so, it would be nice to document that too; but that is not
> > necessarily your problem.
> 
> I can squash:
> 
> diff --git a/Documentation/virt/kvm/s390/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst
> index d9b7c6cbc99e..48a326d41cc0 100644
> --- a/Documentation/virt/kvm/s390/s390-diag.rst
> +++ b/Documentation/virt/kvm/s390/s390-diag.rst
> @@ -50,6 +50,9 @@ Upon completion of the DIAGNOSE instruction, general register 2 contains
>  the function's return code, which is either a return code or a subcode
>  specific value.
> +If the specified subfunction is not supported, a SPECIFICATION exception
> +will be triggered.
> +

Looks good. Thanks!

> > I guess we won't see too many new diag 500 subcodes, or would it make
> > sense to implement some query subcode?
> 
> In the context of STORAGE LIMIT, a "query" subfunction is not really beneficial:
> 
> it's either one invocation of "query", conditionally followed by one invocation of "STORAGE LIMIT"
> vs. one invocation of "STORAGE LIMIT".
> 
> Once there might be a bunch of other subfunctions, a "query" might make more sense.

"If only there would be a query subcode available, so that the program
check handling would not be necessary; but in particular my new subcode
is not worth adding it" :)

Anyway, I do not care too much.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-15  8:12       ` Heiko Carstens
@ 2024-10-15  8:16         ` David Hildenbrand
  2024-10-15  8:21           ` Heiko Carstens
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On 15.10.24 10:12, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
>> On 14.10.24 20:04, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 04:46:14PM +0200, David Hildenbrand wrote:
>>> If so, it would be nice to document that too; but that is not
>>> necessarily your problem.
>>
>> I can squash:
>>
>> diff --git a/Documentation/virt/kvm/s390/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst
>> index d9b7c6cbc99e..48a326d41cc0 100644
>> --- a/Documentation/virt/kvm/s390/s390-diag.rst
>> +++ b/Documentation/virt/kvm/s390/s390-diag.rst
>> @@ -50,6 +50,9 @@ Upon completion of the DIAGNOSE instruction, general register 2 contains
>>   the function's return code, which is either a return code or a subcode
>>   specific value.
>> +If the specified subfunction is not supported, a SPECIFICATION exception
>> +will be triggered.
>> +
> 
> Looks good. Thanks!
> 
>>> I guess we won't see too many new diag 500 subcodes, or would it make
>>> sense to implement some query subcode?
>>
>> In the context of STORAGE LIMIT, a "query" subfunction is not really beneficial:
>>
>> it's either one invocation of "query", conditionally followed by one invocation of "STORAGE LIMIT"
>> vs. one invocation of "STORAGE LIMIT".
>>
>> Once there might be a bunch of other subfunctions, a "query" might make more sense.
> 
> "If only there would be a query subcode available, so that the program
> check handling would not be necessary; but in particular my new subcode
> is not worth adding it" :)
> 
> Anyway, I do not care too much.
> 

Okay, I see your point: it would allow for removing the program check 
handling from the STORAGE LIMIT invocation.

... if only we wouldn't need the exact same program check handling for 
the new query subfunction :P

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-15  8:16         ` David Hildenbrand
@ 2024-10-15  8:21           ` Heiko Carstens
  2024-10-15  8:32             ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15  8:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Tue, Oct 15, 2024 at 10:16:20AM +0200, David Hildenbrand wrote:
> On 15.10.24 10:12, Heiko Carstens wrote:
> > On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
> > > On 14.10.24 20:04, Heiko Carstens wrote:
> > "If only there would be a query subcode available, so that the program
> > check handling would not be necessary; but in particular my new subcode
> > is not worth adding it" :)
> > 
> > Anyway, I do not care too much.
> > 
> 
> Okay, I see your point: it would allow for removing the program check
> handling from the STORAGE LIMIT invocation.
> 
> ... if only we wouldn't need the exact same program check handling for the
> new query subfunction :P

Yeah yeah, but I think you got that this might help in the future.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-15  8:21           ` Heiko Carstens
@ 2024-10-15  8:32             ` David Hildenbrand
  2024-10-15  8:46               ` Heiko Carstens
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:32 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On 15.10.24 10:21, Heiko Carstens wrote:
> On Tue, Oct 15, 2024 at 10:16:20AM +0200, David Hildenbrand wrote:
>> On 15.10.24 10:12, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
>>>> On 14.10.24 20:04, Heiko Carstens wrote:
>>> "If only there would be a query subcode available, so that the program
>>> check handling would not be necessary; but in particular my new subcode
>>> is not worth adding it" :)
>>>
>>> Anyway, I do not care too much.
>>>
>>
>> Okay, I see your point: it would allow for removing the program check
>> handling from the STORAGE LIMIT invocation.
>>
>> ... if only we wouldn't need the exact same program check handling for the
>> new query subfunction :P
> 
> Yeah yeah, but I think you got that this might help in the future.

Right. Adding it later also doesn't quite help to get rid of the checks 
here, because some user space might implement STORAGE LIMIT without QUERY.

So strategically, the right approach would indeed be to add QUERY now.

Thoughts from the KVM folks? Unfortunately subfunction 0 is taken, which 
is usually QUERY IIRC.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-15  8:32             ` David Hildenbrand
@ 2024-10-15  8:46               ` Heiko Carstens
  2024-10-15  8:48                 ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15  8:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Tue, Oct 15, 2024 at 10:32:43AM +0200, David Hildenbrand wrote:
> On 15.10.24 10:21, Heiko Carstens wrote:
> > On Tue, Oct 15, 2024 at 10:16:20AM +0200, David Hildenbrand wrote:
> > > On 15.10.24 10:12, Heiko Carstens wrote:
> > > > On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
> > > > > On 14.10.24 20:04, Heiko Carstens wrote:
> > > > "If only there would be a query subcode available, so that the program
> > > > check handling would not be necessary; but in particular my new subcode
> > > > is not worth adding it" :)
> > > > 
> > > > Anyway, I do not care too much.
> > > > 
> > > 
> > > Okay, I see your point: it would allow for removing the program check
> > > handling from the STORAGE LIMIT invocation.
> > > 
> > > ... if only we wouldn't need the exact same program check handling for the
> > > new query subfunction :P
> > 
> > Yeah yeah, but I think you got that this might help in the future.
> 
> Right. Adding it later also doesn't quite help to get rid of the checks
> here, because some user space might implement STORAGE LIMIT without QUERY.

This would only help if the diag500 documentation would state that
implementation of the QUERY subcode is mandatory. That is: for every
new subcode larger than the QUERY subcode QUERY must also exist.

That way we only would have to implement program check handling once,
if a program check happens on QUERY none of the newer subcodes is
available, otherwise the return value would indicate that.

Otherwise this whole excercise would be pointless.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall
  2024-10-15  8:46               ` Heiko Carstens
@ 2024-10-15  8:48                 ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-15  8:48 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On 15.10.24 10:46, Heiko Carstens wrote:
> On Tue, Oct 15, 2024 at 10:32:43AM +0200, David Hildenbrand wrote:
>> On 15.10.24 10:21, Heiko Carstens wrote:
>>> On Tue, Oct 15, 2024 at 10:16:20AM +0200, David Hildenbrand wrote:
>>>> On 15.10.24 10:12, Heiko Carstens wrote:
>>>>> On Mon, Oct 14, 2024 at 09:35:27PM +0200, David Hildenbrand wrote:
>>>>>> On 14.10.24 20:04, Heiko Carstens wrote:
>>>>> "If only there would be a query subcode available, so that the program
>>>>> check handling would not be necessary; but in particular my new subcode
>>>>> is not worth adding it" :)
>>>>>
>>>>> Anyway, I do not care too much.
>>>>>
>>>>
>>>> Okay, I see your point: it would allow for removing the program check
>>>> handling from the STORAGE LIMIT invocation.
>>>>
>>>> ... if only we wouldn't need the exact same program check handling for the
>>>> new query subfunction :P
>>>
>>> Yeah yeah, but I think you got that this might help in the future.
>>
>> Right. Adding it later also doesn't quite help to get rid of the checks
>> here, because some user space might implement STORAGE LIMIT without QUERY.
> 
> This would only help if the diag500 documentation would state that
> implementation of the QUERY subcode is mandatory. That is: for every
> new subcode larger than the QUERY subcode QUERY must also exist.
> 
> That way we only would have to implement program check handling once,
> if a program check happens on QUERY none of the newer subcodes is
> available, otherwise the return value would indicate that.
> 
> Otherwise this whole excercise would be pointless.

Yes, that would be the idea.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 3/7] Documentation: s390-diag.rst: document diag500(STORAGE LIMIT) subfunction
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
  2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
  2024-10-14 14:46 ` [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet

Let's document our new diag500 subfunction that can be implemented by
userspace.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/virt/kvm/s390/s390-diag.rst | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/virt/kvm/s390/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst
index d9b7c6cbc99e..c69c6e0fa71b 100644
--- a/Documentation/virt/kvm/s390/s390-diag.rst
+++ b/Documentation/virt/kvm/s390/s390-diag.rst
@@ -77,6 +77,23 @@ Subcode 3 - virtio-ccw notification
 
     See also the virtio standard for a discussion of this hypercall.
 
+Subcode 4 - storage-limit
+    Handled by userspace.
+
+    After completion of the DIAGNOSE call, general register 2 will
+    contain the storage limit: the maximum physical address that might be
+    used for storage throughout the lifetime of the VM.
+
+    The storage limit does not indicate currently usable storage, it may
+    include holes, standby storage and areas reserved for other means, such
+    as memory hotplug or virtio-mem devices. Other interfaces for detecting
+    actually usable storage, such as SCLP, must be used in conjunction with
+    this subfunction.
+
+    Note that the storage limit can be larger, but never smaller than the
+    maximum storage address indicated by SCLP via the "maximum storage
+    increment" and the "increment size".
+
 
 DIAGNOSE function code 'X'501 - KVM breakpoint
 ----------------------------------------------
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
                   ` (2 preceding siblings ...)
  2024-10-14 14:46 ` [PATCH v2 3/7] Documentation: s390-diag.rst: document diag500(STORAGE LIMIT) subfunction David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 18:43   ` Heiko Carstens
                     ` (2 more replies)
  2024-10-14 14:46 ` [PATCH v2 5/7] virtio-mem: s390 support David Hildenbrand
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero

To support memory devices under QEMU/KVM, such as virtio-mem,
we have to prepare our kernel virtual address space accordingly and
have to know the highest possible physical memory address we might see
later: the storage limit. The good old SCLP interface is not suitable for
this use case.

In particular, memory owned by memory devices has no relationship to
storage increments, it is always detected using the device driver, and
unaware OSes (no driver) must never try making use of that memory.
Consequently this memory is located outside of the "maximum storage
increment"-indicated memory range.

Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
limit that can exceed the "maximum storage increment", and use the
existing interfaces (i.e., SCLP) to obtain information about the initial
memory that is not owned+managed by memory devices.

If a hypervisor does not support such memory devices, the address exposed
through diag500 STORAGE_LIMIT will correspond to the maximum storage
increment exposed through SCLP.

To teach kdump on s390 to include memory owned by memory devices, there
will be ways to query the relevant memory ranges from the device via a
driver running in special kdump mode (like virtio-mem already implements
to filter /proc/vmcore access so we don't end up reading from unplugged
device blocks).

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/boot/physmem_info.c        | 46 ++++++++++++++++++++++++++--
 arch/s390/include/asm/physmem_info.h |  3 ++
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/s390/boot/physmem_info.c b/arch/s390/boot/physmem_info.c
index 1d131a81cb8b..fb4e66e80fd8 100644
--- a/arch/s390/boot/physmem_info.c
+++ b/arch/s390/boot/physmem_info.c
@@ -109,6 +109,38 @@ static int diag260(void)
 	return 0;
 }
 
+static int diag500_storage_limit(unsigned long *max_physmem_end)
+{
+	register unsigned long __nr asm("1") = 0x4;
+	register unsigned long __storage_limit asm("2") = 0;
+	unsigned long reg1, reg2;
+	psw_t old;
+
+	asm volatile(
+		"	mvc	0(16,%[psw_old]),0(%[psw_pgm])\n"
+		"	epsw	%[reg1],%[reg2]\n"
+		"	st	%[reg1],0(%[psw_pgm])\n"
+		"	st	%[reg2],4(%[psw_pgm])\n"
+		"	larl	%[reg1],1f\n"
+		"	stg	%[reg1],8(%[psw_pgm])\n"
+		"	diag	2,4,0x500\n"
+		"1:	mvc	0(16,%[psw_pgm]),0(%[psw_old])\n"
+		: [reg1] "=&d" (reg1),
+		  [reg2] "=&a" (reg2),
+		  "+&d" (__storage_limit),
+		  "=Q" (get_lowcore()->program_new_psw),
+		  "=Q" (old)
+		: [psw_old] "a" (&old),
+		  [psw_pgm] "a" (&get_lowcore()->program_new_psw),
+		  "d" (__nr)
+		: "memory");
+	if (!__storage_limit)
+	        return -EINVAL;
+	/* convert inclusive end to exclusive end. */
+	*max_physmem_end = __storage_limit + 1;
+	return 0;
+}
+
 static int tprot(unsigned long addr)
 {
 	unsigned long reg1, reg2;
@@ -157,7 +189,9 @@ unsigned long detect_max_physmem_end(void)
 {
 	unsigned long max_physmem_end = 0;
 
-	if (!sclp_early_get_memsize(&max_physmem_end)) {
+	if (!diag500_storage_limit(&max_physmem_end)) {
+		physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
+	} else if (!sclp_early_get_memsize(&max_physmem_end)) {
 		physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
 	} else {
 		max_physmem_end = search_mem_end();
@@ -170,11 +204,17 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
 {
 	if (!sclp_early_read_storage_info()) {
 		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
+		return;
 	} else if (!diag260()) {
 		physmem_info.info_source = MEM_DETECT_DIAG260;
-	} else if (max_physmem_end) {
-		add_physmem_online_range(0, max_physmem_end);
+		return;
+	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
+		max_physmem_end = 0;
+		if (!sclp_early_get_memsize(&max_physmem_end))
+			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
 	}
+	if (max_physmem_end)
+		add_physmem_online_range(0, max_physmem_end);
 }
 
 void physmem_set_usable_limit(unsigned long limit)
diff --git a/arch/s390/include/asm/physmem_info.h b/arch/s390/include/asm/physmem_info.h
index f45cfc8bc233..51b68a43e195 100644
--- a/arch/s390/include/asm/physmem_info.h
+++ b/arch/s390/include/asm/physmem_info.h
@@ -9,6 +9,7 @@ enum physmem_info_source {
 	MEM_DETECT_NONE = 0,
 	MEM_DETECT_SCLP_STOR_INFO,
 	MEM_DETECT_DIAG260,
+	MEM_DETECT_DIAG500_STOR_LIMIT,
 	MEM_DETECT_SCLP_READ_INFO,
 	MEM_DETECT_BIN_SEARCH
 };
@@ -107,6 +108,8 @@ static inline const char *get_physmem_info_source(void)
 		return "sclp storage info";
 	case MEM_DETECT_DIAG260:
 		return "diag260";
+	case MEM_DETECT_DIAG500_STOR_LIMIT:
+		return "diag500 storage limit";
 	case MEM_DETECT_SCLP_READ_INFO:
 		return "sclp read info";
 	case MEM_DETECT_BIN_SEARCH:
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
@ 2024-10-14 18:43   ` Heiko Carstens
  2024-10-14 19:42     ` David Hildenbrand
  2024-10-15 15:01     ` Eric Farman
  2024-10-17  7:36   ` Alexander Gordeev
  2024-10-30 14:30   ` Alexander Gordeev
  2 siblings, 2 replies; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
> To support memory devices under QEMU/KVM, such as virtio-mem,
> we have to prepare our kernel virtual address space accordingly and
> have to know the highest possible physical memory address we might see
> later: the storage limit. The good old SCLP interface is not suitable for
> this use case.
> 
> In particular, memory owned by memory devices has no relationship to
> storage increments, it is always detected using the device driver, and
> unaware OSes (no driver) must never try making use of that memory.
> Consequently this memory is located outside of the "maximum storage
> increment"-indicated memory range.
> 
> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
> limit that can exceed the "maximum storage increment", and use the
> existing interfaces (i.e., SCLP) to obtain information about the initial
> memory that is not owned+managed by memory devices.
> 
> If a hypervisor does not support such memory devices, the address exposed
> through diag500 STORAGE_LIMIT will correspond to the maximum storage
> increment exposed through SCLP.
> 
> To teach kdump on s390 to include memory owned by memory devices, there
> will be ways to query the relevant memory ranges from the device via a
> driver running in special kdump mode (like virtio-mem already implements
> to filter /proc/vmcore access so we don't end up reading from unplugged
> device blocks).
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/boot/physmem_info.c        | 46 ++++++++++++++++++++++++++--
>  arch/s390/include/asm/physmem_info.h |  3 ++
>  2 files changed, 46 insertions(+), 3 deletions(-)

...

> +static int diag500_storage_limit(unsigned long *max_physmem_end)
> +{
> +	register unsigned long __nr asm("1") = 0x4;
> +	register unsigned long __storage_limit asm("2") = 0;
> +	unsigned long reg1, reg2;
> +	psw_t old;

In general we do not allow register asm usage anymore in s390 code,
except for a very few defined places. This is due to all the problems
that we've seen with code instrumentation and register corruption.

The patch below changes your code accordingly, but it is
untested. Please verify that your code still works.

> @@ -157,7 +189,9 @@ unsigned long detect_max_physmem_end(void)
>  {
>  	unsigned long max_physmem_end = 0;
>  
> -	if (!sclp_early_get_memsize(&max_physmem_end)) {
> +	if (!diag500_storage_limit(&max_physmem_end)) {
> +		physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
> +	} else if (!sclp_early_get_memsize(&max_physmem_end)) {
>  		physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
>  	} else {
>  		max_physmem_end = search_mem_end();
> @@ -170,11 +204,17 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
>  {
>  	if (!sclp_early_read_storage_info()) {
>  		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
> +		return;
>  	} else if (!diag260()) {
>  		physmem_info.info_source = MEM_DETECT_DIAG260;
> -	} else if (max_physmem_end) {
> -		add_physmem_online_range(0, max_physmem_end);
> +		return;
> +	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
> +		max_physmem_end = 0;
> +		if (!sclp_early_get_memsize(&max_physmem_end))
> +			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
>  	}
> +	if (max_physmem_end)
> +		add_physmem_online_range(0, max_physmem_end);
>  }

In general looks good to me, but I'd like to see that Vasily or
Alexander give an Ack to this patch.

diff --git a/arch/s390/boot/physmem_info.c b/arch/s390/boot/physmem_info.c
index fb4e66e80fd8..975fc478e0e3 100644
--- a/arch/s390/boot/physmem_info.c
+++ b/arch/s390/boot/physmem_info.c
@@ -109,10 +109,11 @@ static int diag260(void)
 	return 0;
 }
 
+#define DIAG500_SC_STOR_LIMIT 4
+
 static int diag500_storage_limit(unsigned long *max_physmem_end)
 {
-	register unsigned long __nr asm("1") = 0x4;
-	register unsigned long __storage_limit asm("2") = 0;
+	unsigned long storage_limit;
 	unsigned long reg1, reg2;
 	psw_t old;
 
@@ -123,21 +124,24 @@ static int diag500_storage_limit(unsigned long *max_physmem_end)
 		"	st	%[reg2],4(%[psw_pgm])\n"
 		"	larl	%[reg1],1f\n"
 		"	stg	%[reg1],8(%[psw_pgm])\n"
+		"	lghi	1,%[subcode]\n"
+		"	lghi	2,0\n"
 		"	diag	2,4,0x500\n"
 		"1:	mvc	0(16,%[psw_pgm]),0(%[psw_old])\n"
+		"	lgr	%[slimit],2\n"
 		: [reg1] "=&d" (reg1),
 		  [reg2] "=&a" (reg2),
-		  "+&d" (__storage_limit),
+		  [slimit] "=d" (storage_limit),
 		  "=Q" (get_lowcore()->program_new_psw),
 		  "=Q" (old)
 		: [psw_old] "a" (&old),
 		  [psw_pgm] "a" (&get_lowcore()->program_new_psw),
-		  "d" (__nr)
-		: "memory");
-	if (!__storage_limit)
-	        return -EINVAL;
-	/* convert inclusive end to exclusive end. */
-	*max_physmem_end = __storage_limit + 1;
+		  [subcode] "i" (DIAG500_SC_STOR_LIMIT)
+		: "memory", "1", "2");
+	if (!storage_limit)
+		return -EINVAL;
+	/* Convert inclusive end to exclusive end */
+	*max_physmem_end = storage_limit + 1;
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 18:43   ` Heiko Carstens
@ 2024-10-14 19:42     ` David Hildenbrand
  2024-10-15 15:01     ` Eric Farman
  1 sibling, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:42 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 14.10.24 20:43, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
>> To support memory devices under QEMU/KVM, such as virtio-mem,
>> we have to prepare our kernel virtual address space accordingly and
>> have to know the highest possible physical memory address we might see
>> later: the storage limit. The good old SCLP interface is not suitable for
>> this use case.
>>
>> In particular, memory owned by memory devices has no relationship to
>> storage increments, it is always detected using the device driver, and
>> unaware OSes (no driver) must never try making use of that memory.
>> Consequently this memory is located outside of the "maximum storage
>> increment"-indicated memory range.
>>
>> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
>> limit that can exceed the "maximum storage increment", and use the
>> existing interfaces (i.e., SCLP) to obtain information about the initial
>> memory that is not owned+managed by memory devices.
>>
>> If a hypervisor does not support such memory devices, the address exposed
>> through diag500 STORAGE_LIMIT will correspond to the maximum storage
>> increment exposed through SCLP.
>>
>> To teach kdump on s390 to include memory owned by memory devices, there
>> will be ways to query the relevant memory ranges from the device via a
>> driver running in special kdump mode (like virtio-mem already implements
>> to filter /proc/vmcore access so we don't end up reading from unplugged
>> device blocks).
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/boot/physmem_info.c        | 46 ++++++++++++++++++++++++++--
>>   arch/s390/include/asm/physmem_info.h |  3 ++
>>   2 files changed, 46 insertions(+), 3 deletions(-)
> 
> ...
> 
>> +static int diag500_storage_limit(unsigned long *max_physmem_end)
>> +{
>> +	register unsigned long __nr asm("1") = 0x4;
>> +	register unsigned long __storage_limit asm("2") = 0;
>> +	unsigned long reg1, reg2;
>> +	psw_t old;
> 
> In general we do not allow register asm usage anymore in s390 code,
> except for a very few defined places. This is due to all the problems
> that we've seen with code instrumentation and register corruption.

Makes sense. Note that I was inspired by GENERATE_KVM_HYPERCALL_FUNC 
that also still uses register asm usage.

> 
> The patch below changes your code accordingly, but it is
> untested. Please verify that your code still works.

Below LGTM, I'll give it a churn, thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 18:43   ` Heiko Carstens
  2024-10-14 19:42     ` David Hildenbrand
@ 2024-10-15 15:01     ` Eric Farman
  2024-10-15 15:20       ` Heiko Carstens
  2024-10-16 10:37       ` Halil Pasic
  1 sibling, 2 replies; 59+ messages in thread
From: Eric Farman @ 2024-10-15 15:01 UTC (permalink / raw)
  To: Heiko Carstens, David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, 2024-10-14 at 20:43 +0200, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
> > To support memory devices under QEMU/KVM, such as virtio-mem,
> > we have to prepare our kernel virtual address space accordingly and
> > have to know the highest possible physical memory address we might see
> > later: the storage limit. The good old SCLP interface is not suitable for
> > this use case.
> > 
> > In particular, memory owned by memory devices has no relationship to
> > storage increments, it is always detected using the device driver, and
> > unaware OSes (no driver) must never try making use of that memory.
> > Consequently this memory is located outside of the "maximum storage
> > increment"-indicated memory range.
> > 
> > Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
> > limit that can exceed the "maximum storage increment", and use the
> > existing interfaces (i.e., SCLP) to obtain information about the initial
> > memory that is not owned+managed by memory devices.

...snip...

> The patch below changes your code accordingly, but it is
> untested. Please verify that your code still works.

...snip...

> diff --git a/arch/s390/boot/physmem_info.c b/arch/s390/boot/physmem_info.c
> index fb4e66e80fd8..975fc478e0e3 100644
> --- a/arch/s390/boot/physmem_info.c
> +++ b/arch/s390/boot/physmem_info.c
> @@ -109,10 +109,11 @@ static int diag260(void)
>  	return 0;
>  }
>  
> +#define DIAG500_SC_STOR_LIMIT 4
> +
>  static int diag500_storage_limit(unsigned long *max_physmem_end)
>  {
> -	register unsigned long __nr asm("1") = 0x4;
> -	register unsigned long __storage_limit asm("2") = 0;
> +	unsigned long storage_limit;
>  	unsigned long reg1, reg2;
>  	psw_t old;
>  
> @@ -123,21 +124,24 @@ static int diag500_storage_limit(unsigned long *max_physmem_end)
>  		"	st	%[reg2],4(%[psw_pgm])\n"
>  		"	larl	%[reg1],1f\n"
>  		"	stg	%[reg1],8(%[psw_pgm])\n"
> +		"	lghi	1,%[subcode]\n"
> +		"	lghi	2,0\n"
>  		"	diag	2,4,0x500\n"
>  		"1:	mvc	0(16,%[psw_pgm]),0(%[psw_old])\n"
> +		"	lgr	%[slimit],2\n"
>  		: [reg1] "=&d" (reg1),
>  		  [reg2] "=&a" (reg2),
> -		  "+&d" (__storage_limit),
> +		  [slimit] "=d" (storage_limit),
>  		  "=Q" (get_lowcore()->program_new_psw),
>  		  "=Q" (old)
>  		: [psw_old] "a" (&old),
>  		  [psw_pgm] "a" (&get_lowcore()->program_new_psw),
> -		  "d" (__nr)
> -		: "memory");
> -	if (!__storage_limit)
> -	        return -EINVAL;
> -	/* convert inclusive end to exclusive end. */
> -	*max_physmem_end = __storage_limit + 1;
> +		  [subcode] "i" (DIAG500_SC_STOR_LIMIT)
> +		: "memory", "1", "2");
> +	if (!storage_limit)
> +		return -EINVAL;
> +	/* Convert inclusive end to exclusive end */
> +	*max_physmem_end = storage_limit + 1;
>  	return 0;
>  }
>  
> 

I like the idea of a defined constant here instead of hardcoded, but maybe it should be placed
somewhere in include/uapi so that QEMU can pick it up with update-linux-headers.sh and be in sync
with the kernel, instead of just an equivalent definition in [1] ?

[1] https://lore.kernel.org/qemu-devel/20241008105455.2302628-8-david@redhat.com/



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-15 15:01     ` Eric Farman
@ 2024-10-15 15:20       ` Heiko Carstens
  2024-10-25 10:52         ` David Hildenbrand
  2024-10-16 10:37       ` Halil Pasic
  1 sibling, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15 15:20 UTC (permalink / raw)
  To: Eric Farman
  Cc: David Hildenbrand, linux-kernel, linux-mm, linux-s390,
	virtualization, linux-doc, kvm, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Huth, Cornelia Huck,
	Janosch Frank, Claudio Imbrenda, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Tue, Oct 15, 2024 at 11:01:44AM -0400, Eric Farman wrote:
> On Mon, 2024-10-14 at 20:43 +0200, Heiko Carstens wrote:
> > On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
...
> > +#define DIAG500_SC_STOR_LIMIT 4
...
> I like the idea of a defined constant here instead of hardcoded, but maybe it should be placed
> somewhere in include/uapi so that QEMU can pick it up with update-linux-headers.sh and be in sync
> with the kernel, instead of just an equivalent definition in [1] ?
> 
> [1] https://lore.kernel.org/qemu-devel/20241008105455.2302628-8-david@redhat.com/

It is already a mess; we have already subcode 3 defined:

#define KVM_S390_VIRTIO_CCW_NOTIFY 3

in

arch/s390/include/uapi/asm/virtio-ccw.h

which for some reason is uapi. But it doesn't make sense to put the
new subcode 4 there too. So what is the end result?

Another uapi file? I think resolving this would be a project on its own.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-15 15:20       ` Heiko Carstens
@ 2024-10-25 10:52         ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-25 10:52 UTC (permalink / raw)
  To: Heiko Carstens, Eric Farman
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 15.10.24 17:20, Heiko Carstens wrote:
> On Tue, Oct 15, 2024 at 11:01:44AM -0400, Eric Farman wrote:
>> On Mon, 2024-10-14 at 20:43 +0200, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
> ...
>>> +#define DIAG500_SC_STOR_LIMIT 4
> ...
>> I like the idea of a defined constant here instead of hardcoded, but maybe it should be placed
>> somewhere in include/uapi so that QEMU can pick it up with update-linux-headers.sh and be in sync
>> with the kernel, instead of just an equivalent definition in [1] ?
>>
>> [1] https://lore.kernel.org/qemu-devel/20241008105455.2302628-8-david@redhat.com/
> 
> It is already a mess; we have already subcode 3 defined:
> 
> #define KVM_S390_VIRTIO_CCW_NOTIFY 3
> 
> in
> 
> arch/s390/include/uapi/asm/virtio-ccw.h
> 
> which for some reason is uapi. But it doesn't make sense to put the
> new subcode 4 there too. So what is the end result?
> 
> Another uapi file? I think resolving this would be a project on its own.

Agreed, thanks all!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-15 15:01     ` Eric Farman
  2024-10-15 15:20       ` Heiko Carstens
@ 2024-10-16 10:37       ` Halil Pasic
  1 sibling, 0 replies; 59+ messages in thread
From: Halil Pasic @ 2024-10-16 10:37 UTC (permalink / raw)
  To: Eric Farman
  Cc: Heiko Carstens, David Hildenbrand, linux-kernel, linux-mm,
	linux-s390, virtualization, linux-doc, kvm, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero, Halil Pasic

On Tue, 15 Oct 2024 11:01:44 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> > +		  [subcode] "i" (DIAG500_SC_STOR_LIMIT)
> > +		: "memory", "1", "2");
> > +	if (!storage_limit)
> > +		return -EINVAL;
> > +	/* Convert inclusive end to exclusive end */
> > +	*max_physmem_end = storage_limit + 1;
> >  	return 0;
> >  }
> >  
> >   
> 
> I like the idea of a defined constant here instead of hardcoded, but maybe it should be placed
> somewhere in include/uapi so that QEMU can pick it up with update-linux-headers.sh and be in sync
> with the kernel, instead of just an equivalent definition in [1] ?
> 
> [1] https://lore.kernel.org/qemu-devel/20241008105455.2302628-8-david@redhat.com/

I think it is fine to have equivalent definitions. This is more or less
an ISA thing we are introducing here. And IMHO it would be fine to have
such a definition even if the emulator was supposed to run on an OS that
is not Linux and without without KVM.

Regards,
Halil


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
  2024-10-14 18:43   ` Heiko Carstens
@ 2024-10-17  7:36   ` Alexander Gordeev
  2024-10-17  8:19     ` David Hildenbrand
  2024-10-30 14:30   ` Alexander Gordeev
  2 siblings, 1 reply; 59+ messages in thread
From: Alexander Gordeev @ 2024-10-17  7:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:

Hi David!

> @@ -157,7 +189,9 @@ unsigned long detect_max_physmem_end(void)
>  {
>  	unsigned long max_physmem_end = 0;
>  
> -	if (!sclp_early_get_memsize(&max_physmem_end)) {
> +	if (!diag500_storage_limit(&max_physmem_end)) {
> +		physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
> +	} else if (!sclp_early_get_memsize(&max_physmem_end)) {
>  		physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
>  	} else {
>  		max_physmem_end = search_mem_end();
> @@ -170,11 +204,17 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
>  {
>  	if (!sclp_early_read_storage_info()) {
>  		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
> +		return;
>  	} else if (!diag260()) {
>  		physmem_info.info_source = MEM_DETECT_DIAG260;
> -	} else if (max_physmem_end) {
> -		add_physmem_online_range(0, max_physmem_end);
> +		return;
> +	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
> +		max_physmem_end = 0;
> +		if (!sclp_early_get_memsize(&max_physmem_end))
> +			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;

Why search_mem_end() is not tried in case sclp_early_get_memsize() failed?

>  	}
> +	if (max_physmem_end)
> +		add_physmem_online_range(0, max_physmem_end);
>  }
>  
>  void physmem_set_usable_limit(unsigned long limit)
> diff --git a/arch/s390/include/asm/physmem_info.h b/arch/s390/include/asm/physmem_info.h
> index f45cfc8bc233..51b68a43e195 100644
> --- a/arch/s390/include/asm/physmem_info.h
> +++ b/arch/s390/include/asm/physmem_info.h
> @@ -9,6 +9,7 @@ enum physmem_info_source {
>  	MEM_DETECT_NONE = 0,
>  	MEM_DETECT_SCLP_STOR_INFO,
>  	MEM_DETECT_DIAG260,
> +	MEM_DETECT_DIAG500_STOR_LIMIT,
>  	MEM_DETECT_SCLP_READ_INFO,
>  	MEM_DETECT_BIN_SEARCH
>  };
> @@ -107,6 +108,8 @@ static inline const char *get_physmem_info_source(void)
>  		return "sclp storage info";
>  	case MEM_DETECT_DIAG260:
>  		return "diag260";
> +	case MEM_DETECT_DIAG500_STOR_LIMIT:
> +		return "diag500 storage limit";

AFAIU you want to always override MEM_DETECT_DIAG500_STOR_LIMIT method
with an online memory detection method. In that case this code is dead.

>  	case MEM_DETECT_SCLP_READ_INFO:
>  		return "sclp read info";
>  	case MEM_DETECT_BIN_SEARCH:

Thanks!


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17  7:36   ` Alexander Gordeev
@ 2024-10-17  8:19     ` David Hildenbrand
  2024-10-17  9:53       ` Alexander Gordeev
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-17  8:19 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 17.10.24 09:36, Alexander Gordeev wrote:
> On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:
> 
> Hi David!

Hi Alexander!

> 
>> @@ -157,7 +189,9 @@ unsigned long detect_max_physmem_end(void)
>>   {
>>   	unsigned long max_physmem_end = 0;
>>   
>> -	if (!sclp_early_get_memsize(&max_physmem_end)) {
>> +	if (!diag500_storage_limit(&max_physmem_end)) {
>> +		physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
>> +	} else if (!sclp_early_get_memsize(&max_physmem_end)) {
>>   		physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
>>   	} else {
>>   		max_physmem_end = search_mem_end();
>> @@ -170,11 +204,17 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
>>   {
>>   	if (!sclp_early_read_storage_info()) {
>>   		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
>> +		return;
>>   	} else if (!diag260()) {
>>   		physmem_info.info_source = MEM_DETECT_DIAG260;
>> -	} else if (max_physmem_end) {
>> -		add_physmem_online_range(0, max_physmem_end);
>> +		return;
>> +	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
>> +		max_physmem_end = 0;
>> +		if (!sclp_early_get_memsize(&max_physmem_end))
>> +			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
> 
> Why search_mem_end() is not tried in case sclp_early_get_memsize() failed?

Patch #3 documents that:

+    The storage limit does not indicate currently usable storage, it may
+    include holes, standby storage and areas reserved for other means, such
+    as memory hotplug or virtio-mem devices. Other interfaces for detecting
+    actually usable storage, such as SCLP, must be used in conjunction with
+    this subfunction.

If SCLP would fail, something would be seriously wrong and we should just crash
instead of trying to fallback to the legacy way of scanning.

> 
>>   	}
>> +	if (max_physmem_end)
>> +		add_physmem_online_range(0, max_physmem_end);
>>   }
>>   
>>   void physmem_set_usable_limit(unsigned long limit)
>> diff --git a/arch/s390/include/asm/physmem_info.h b/arch/s390/include/asm/physmem_info.h
>> index f45cfc8bc233..51b68a43e195 100644
>> --- a/arch/s390/include/asm/physmem_info.h
>> +++ b/arch/s390/include/asm/physmem_info.h
>> @@ -9,6 +9,7 @@ enum physmem_info_source {
>>   	MEM_DETECT_NONE = 0,
>>   	MEM_DETECT_SCLP_STOR_INFO,
>>   	MEM_DETECT_DIAG260,
>> +	MEM_DETECT_DIAG500_STOR_LIMIT,
>>   	MEM_DETECT_SCLP_READ_INFO,
>>   	MEM_DETECT_BIN_SEARCH
>>   };
>> @@ -107,6 +108,8 @@ static inline const char *get_physmem_info_source(void)
>>   		return "sclp storage info";
>>   	case MEM_DETECT_DIAG260:
>>   		return "diag260";
>> +	case MEM_DETECT_DIAG500_STOR_LIMIT:
>> +		return "diag500 storage limit";
> 
> AFAIU you want to always override MEM_DETECT_DIAG500_STOR_LIMIT method
> with an online memory detection method. In that case this code is dead.

Not in the above case, pathological case above where something went wrong
during sclp_early_get_memsize(). In that scenario, die_oom() would indicate
that there are no memory ranges but that "diag500 storage limit" worked.

Does that make sense?

Thanks for the review!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17  8:19     ` David Hildenbrand
@ 2024-10-17  9:53       ` Alexander Gordeev
  2024-10-17 10:00         ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Alexander Gordeev @ 2024-10-17  9:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

> > Why search_mem_end() is not tried in case sclp_early_get_memsize() failed?
> 
> Patch #3 documents that:
> 
> +    The storage limit does not indicate currently usable storage, it may
> +    include holes, standby storage and areas reserved for other means, such
> +    as memory hotplug or virtio-mem devices. Other interfaces for detecting
> +    actually usable storage, such as SCLP, must be used in conjunction with
> +    this subfunction.

Yes, I read this and that exactly what causes my confusion. In this wording it
sounds like SCLP *or* other methods are fine to use. But then you use SCLP or
DIAGNOSE 260, but not memory scanning. So I am still confused ;)

> If SCLP would fail, something would be seriously wrong and we should just crash
> instead of trying to fallback to the legacy way of scanning.

But what is wrong with the legacy way of scanning?

> > > +	case MEM_DETECT_DIAG500_STOR_LIMIT:
> > > +		return "diag500 storage limit";
> > 
> > AFAIU you want to always override MEM_DETECT_DIAG500_STOR_LIMIT method
> > with an online memory detection method. In that case this code is dead.
> 
> Not in the above case, pathological case above where something went wrong
> during sclp_early_get_memsize(). In that scenario, die_oom() would indicate
> that there are no memory ranges but that "diag500 storage limit" worked.
> 
> Does that make sense?

Yes, I get your approach.

> Thanks for the review!

Thanks!

> -- 
> Cheers,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17  9:53       ` Alexander Gordeev
@ 2024-10-17 10:00         ` David Hildenbrand
  2024-10-17 12:07           ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-17 10:00 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 17.10.24 11:53, Alexander Gordeev wrote:
>>> Why search_mem_end() is not tried in case sclp_early_get_memsize() failed?
>>
>> Patch #3 documents that:
>>
>> +    The storage limit does not indicate currently usable storage, it may
>> +    include holes, standby storage and areas reserved for other means, such
>> +    as memory hotplug or virtio-mem devices. Other interfaces for detecting
>> +    actually usable storage, such as SCLP, must be used in conjunction with
>> +    this subfunction.
> 
> Yes, I read this and that exactly what causes my confusion. In this wording it
> sounds like SCLP *or* other methods are fine to use. But then you use SCLP or
> DIAGNOSE 260, but not memory scanning. So I am still confused ;)

Well, DIAGNOSE 260 is z/VM only and DIAG 500 is KVM only. So there are 
currently not really any other reasonable ways besides SCLP.

> 
>> If SCLP would fail, something would be seriously wrong and we should just crash
>> instead of trying to fallback to the legacy way of scanning.
> 
> But what is wrong with the legacy way of scanning?

Missing to detect holes and starting to use them, detecting and using 
device memory without negotiating with the device ... it all falls to 
pieces.

> 
>>>> +	case MEM_DETECT_DIAG500_STOR_LIMIT:
>>>> +		return "diag500 storage limit";
>>>
>>> AFAIU you want to always override MEM_DETECT_DIAG500_STOR_LIMIT method
>>> with an online memory detection method. In that case this code is dead.
>>
>> Not in the above case, pathological case above where something went wrong
>> during sclp_early_get_memsize(). In that scenario, die_oom() would indicate
>> that there are no memory ranges but that "diag500 storage limit" worked.
>>
>> Does that make sense?
> 
> Yes, I get your approach.

Thanks, please let me know if I should make it clearer in the 
description, of if you think we can improve the code.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17 10:00         ` David Hildenbrand
@ 2024-10-17 12:07           ` David Hildenbrand
  2024-10-17 14:32             ` Alexander Gordeev
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-17 12:07 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 17.10.24 12:00, David Hildenbrand wrote:
> On 17.10.24 11:53, Alexander Gordeev wrote:
>>>> Why search_mem_end() is not tried in case sclp_early_get_memsize() failed?
>>>
>>> Patch #3 documents that:
>>>
>>> +    The storage limit does not indicate currently usable storage, it may
>>> +    include holes, standby storage and areas reserved for other means, such
>>> +    as memory hotplug or virtio-mem devices. Other interfaces for detecting
>>> +    actually usable storage, such as SCLP, must be used in conjunction with
>>> +    this subfunction.
>>
>> Yes, I read this and that exactly what causes my confusion. In this wording it
>> sounds like SCLP *or* other methods are fine to use. But then you use SCLP or
>> DIAGNOSE 260, but not memory scanning. So I am still confused ;)
> 
> Well, DIAGNOSE 260 is z/VM only and DIAG 500 is KVM only. So there are
> currently not really any other reasonable ways besides SCLP.

Correction: Staring at the code again, in detect_physmem_online_ranges()
we will indeed try:

a) sclp_early_read_storage_info()
b) diag260()

But if neither works, we cannot blindly add all that memory, something is
messed up. So we'll fallback to

c) sclp_early_get_memsize()

But if none of that works, something is seriously wrong.


I will squash the following:

diff --git a/arch/s390/boot/physmem_info.c b/arch/s390/boot/physmem_info.c
index 975fc478e0e3..6ad3ac2050eb 100644
--- a/arch/s390/boot/physmem_info.c
+++ b/arch/s390/boot/physmem_info.c
@@ -214,6 +214,12 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
                 return;
         } else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
                 max_physmem_end = 0;
+               /*
+                * If we know the storage limit but do not find any other
+                * indication of usable initial memory, something is messed
+                * up. In that case, we'll not add any physical memory so
+                * we'll run into die_oom() later.
+                */
                 if (!sclp_early_get_memsize(&max_physmem_end))
                         physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
         }


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17 12:07           ` David Hildenbrand
@ 2024-10-17 14:32             ` Alexander Gordeev
  2024-10-17 14:36               ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Alexander Gordeev @ 2024-10-17 14:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Thu, Oct 17, 2024 at 02:07:12PM +0200, David Hildenbrand wrote:
> On 17.10.24 12:00, David Hildenbrand wrote:
> > Well, DIAGNOSE 260 is z/VM only and DIAG 500 is KVM only. So there are
> > currently not really any other reasonable ways besides SCLP.
> 
> Correction: Staring at the code again, in detect_physmem_online_ranges()
> we will indeed try:
> 
> a) sclp_early_read_storage_info()
> b) diag260()

So why care to call diag260() in case of DIAGNOSE 500? What about the below?

void detect_physmem_online_ranges(unsigned long max_physmem_end)
{
	if (!sclp_early_read_storage_info()) {
		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
		unsigned long online_end;

		if (!sclp_early_get_memsize(&online_end)) {
			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
			add_physmem_online_range(0, online_end);
		}
	} else if (!diag260()) {
		physmem_info.info_source = MEM_DETECT_DIAG260;
	} else if (max_physmem_end) {
		add_physmem_online_range(0, max_physmem_end);
	}
}

> But if neither works, we cannot blindly add all that memory, something is
> messed up. So we'll fallback to
> 
> c) sclp_early_get_memsize()
> 
> But if none of that works, something is seriously wrong.

Ok, thanks for the clarification.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-17 14:32             ` Alexander Gordeev
@ 2024-10-17 14:36               ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-17 14:36 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 17.10.24 16:32, Alexander Gordeev wrote:
> On Thu, Oct 17, 2024 at 02:07:12PM +0200, David Hildenbrand wrote:
>> On 17.10.24 12:00, David Hildenbrand wrote:
>>> Well, DIAGNOSE 260 is z/VM only and DIAG 500 is KVM only. So there are
>>> currently not really any other reasonable ways besides SCLP.
>>
>> Correction: Staring at the code again, in detect_physmem_online_ranges()
>> we will indeed try:
>>
>> a) sclp_early_read_storage_info()
>> b) diag260()
> 
> So why care to call diag260() in case of DIAGNOSE 500? What about the below?
> 
> void detect_physmem_online_ranges(unsigned long max_physmem_end)
> {
> 	if (!sclp_early_read_storage_info()) {
> 		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
> 	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
> 		unsigned long online_end;
> 
> 		if (!sclp_early_get_memsize(&online_end)) {
> 			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
> 			add_physmem_online_range(0, online_end);
> 		}
> 	} else if (!diag260()) {
> 		physmem_info.info_source = MEM_DETECT_DIAG260;
> 	} else if (max_physmem_end) {
> 		add_physmem_online_range(0, max_physmem_end);
> 	}
> }

Works for me, thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
  2024-10-14 18:43   ` Heiko Carstens
  2024-10-17  7:36   ` Alexander Gordeev
@ 2024-10-30 14:30   ` Alexander Gordeev
  2024-10-30 14:33     ` Alexander Gordeev
  2 siblings, 1 reply; 59+ messages in thread
From: Alexander Gordeev @ 2024-10-30 14:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:16PM +0200, David Hildenbrand wrote:

Hi David,

> To support memory devices under QEMU/KVM, such as virtio-mem,
> we have to prepare our kernel virtual address space accordingly and
> have to know the highest possible physical memory address we might see
> later: the storage limit. The good old SCLP interface is not suitable for
> this use case.
> 
> In particular, memory owned by memory devices has no relationship to
> storage increments, it is always detected using the device driver, and
> unaware OSes (no driver) must never try making use of that memory.
> Consequently this memory is located outside of the "maximum storage
> increment"-indicated memory range.
> 
> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
> limit that can exceed the "maximum storage increment", and use the
> existing interfaces (i.e., SCLP) to obtain information about the initial
> memory that is not owned+managed by memory devices.
> 
> If a hypervisor does not support such memory devices, the address exposed
> through diag500 STORAGE_LIMIT will correspond to the maximum storage
> increment exposed through SCLP.
> 
> To teach kdump on s390 to include memory owned by memory devices, there
> will be ways to query the relevant memory ranges from the device via a
> driver running in special kdump mode (like virtio-mem already implements
> to filter /proc/vmcore access so we don't end up reading from unplugged
> device blocks).
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/boot/physmem_info.c        | 46 ++++++++++++++++++++++++++--
>  arch/s390/include/asm/physmem_info.h |  3 ++
>  2 files changed, 46 insertions(+), 3 deletions(-)

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>

Thanks!


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
  2024-10-30 14:30   ` Alexander Gordeev
@ 2024-10-30 14:33     ` Alexander Gordeev
  0 siblings, 0 replies; 59+ messages in thread
From: Alexander Gordeev @ 2024-10-30 14:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

> >  arch/s390/boot/physmem_info.c        | 46 ++++++++++++++++++++++++++--
> >  arch/s390/include/asm/physmem_info.h |  3 ++
> >  2 files changed, 46 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>

Sorry, it supposed to be for v3, so please dismiss this one.

> Thanks!


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
                   ` (3 preceding siblings ...)
  2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 18:48   ` Heiko Carstens
  2024-10-14 14:46 ` [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390 David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero

Now that s390 code is prepared for memory devices that reside above the
maximum storage increment exposed through SCLP, everything is in place
to unlock virtio-mem support.

As virtio-mem in Linux currently supports logically onlining/offlining
memory in pageblock granularity, we have an effective hot(un)plug
granularity of 1 MiB on s390.

As virito-mem adds/removes individual Linux memory blocks (256MB), we
will currently never use gigantic pages in the identity mapping.

It is worth noting that neither storage keys nor storage attributes (e.g.,
data / nodat) are touched when onlining memory blocks, which is good
because we are not supposed to touch these parts for unplugged device
blocks that are logically offline in Linux.

We will currently never initialize storage keys for virtio-mem
memory -- IOW, storage_key_init_range() is never called. It could be added
in the future when plugging device blocks. But as that function
essentially does nothing without modifying the code (changing
PAGE_DEFAULT_ACC), that's just fine for now.

kexec should work as intended and just like on other architectures that
support virtio-mem: we will never place kexec binaries on virtio-mem
memory, and never indicate virtio-mem memory to the 2nd kernel. The
device driver in the 2nd kernel can simply reset the device --
turning all memory unplugged, to then start plugging memory and adding
them to Linux, without causing trouble because the memory is already
used elsewhere.

The special s390 kdump mode, whereby the 2nd kernel creates the ELF
core header, won't currently dump virtio-mem memory. The virtio-mem
driver has a special kdump mode, from where we can detect memory ranges
to dump. Based on this, support for dumping virtio-mem memory can be
added in the future fairly easily.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/Kconfig | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 42a48ac763ee..fb320eea70fe 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -122,7 +122,7 @@ config VIRTIO_BALLOON
 
 config VIRTIO_MEM
 	tristate "Virtio mem driver"
-	depends on X86_64 || ARM64 || RISCV
+	depends on X86_64 || ARM64 || RISCV || S390
 	depends on VIRTIO
 	depends on MEMORY_HOTPLUG
 	depends on MEMORY_HOTREMOVE
@@ -132,11 +132,11 @@ config VIRTIO_MEM
 	 This driver provides access to virtio-mem paravirtualized memory
 	 devices, allowing to hotplug and hotunplug memory.
 
-	 This driver currently only supports x86-64 and arm64. Although it
-	 should compile on other architectures that implement memory
-	 hot(un)plug, architecture-specific and/or common
-	 code changes may be required for virtio-mem, kdump and kexec to work as
-	 expected.
+	 This driver currently supports x86-64, arm64, riscv and s390x.
+	 Although it should compile on other architectures that implement
+	 memory hot(un)plug, architecture-specific and/or common
+	 code changes may be required for virtio-mem, kdump and kexec to
+	 work as expected.
 
 	 If unsure, say M.
 
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-14 14:46 ` [PATCH v2 5/7] virtio-mem: s390 support David Hildenbrand
@ 2024-10-14 18:48   ` Heiko Carstens
  2024-10-14 19:16     ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:17PM +0200, David Hildenbrand wrote:
> The special s390 kdump mode, whereby the 2nd kernel creates the ELF
> core header, won't currently dump virtio-mem memory. The virtio-mem
> driver has a special kdump mode, from where we can detect memory ranges
> to dump. Based on this, support for dumping virtio-mem memory can be
> added in the future fairly easily.

Hm.. who will add this support? This looks like a showstopper to me.
Who is supposed to debug crash dumps where memory parts are missing?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-14 18:48   ` Heiko Carstens
@ 2024-10-14 19:16     ` David Hildenbrand
  2024-10-15  8:37       ` Heiko Carstens
  0 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 14.10.24 20:48, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:17PM +0200, David Hildenbrand wrote:
>> The special s390 kdump mode, whereby the 2nd kernel creates the ELF
>> core header, won't currently dump virtio-mem memory. The virtio-mem
>> driver has a special kdump mode, from where we can detect memory ranges
>> to dump. Based on this, support for dumping virtio-mem memory can be
>> added in the future fairly easily.
> 

Thanks for the review.

> Hm.. who will add this support? This looks like a showstopper to me.

The cover letter is clearer on that: "One remaining work item is kdump 
support for virtio-mem memory. This will be sent out separately once 
initial support landed."

I had a prototype, but need to spend some time to clean it up -- or find 
someone to hand it over to clean it up.

I have to chose wisely what I work on nowadays, and cannot spend that 
time if the basic support won't get ACKed.

> Who is supposed to debug crash dumps where memory parts are missing?

For many production use cases it certainly needs to exist.

But note that virtio-mem can be used with ZONE_MOVABLE, in which case 
mostly only user data (e.g., pagecache,anon) ends up on hotplugged 
memory, that would get excluded from makedumpfile in the default configs 
either way.

It's not uncommon to let kdump support be added later (e.g., AMD SNP 
variants).

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-14 19:16     ` David Hildenbrand
@ 2024-10-15  8:37       ` Heiko Carstens
  2024-10-21  6:33         ` Christian Borntraeger
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-15  8:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 09:16:45PM +0200, David Hildenbrand wrote:
> On 14.10.24 20:48, Heiko Carstens wrote:
> > On Mon, Oct 14, 2024 at 04:46:17PM +0200, David Hildenbrand wrote:
> > > to dump. Based on this, support for dumping virtio-mem memory can be
> > Hm.. who will add this support? This looks like a showstopper to me.
> 
> The cover letter is clearer on that: "One remaining work item is kdump
> support for virtio-mem memory. This will be sent out separately once initial
> support landed."
> 
> I had a prototype, but need to spend some time to clean it up -- or find
> someone to hand it over to clean it up.
> 
> I have to chose wisely what I work on nowadays, and cannot spend that time
> if the basic support won't get ACKed.
> 
> > Who is supposed to debug crash dumps where memory parts are missing?
> 
> For many production use cases it certainly needs to exist.
> 
> But note that virtio-mem can be used with ZONE_MOVABLE, in which case mostly
> only user data (e.g., pagecache,anon) ends up on hotplugged memory, that
> would get excluded from makedumpfile in the default configs either way.
> 
> It's not uncommon to let kdump support be added later (e.g., AMD SNP
> variants).

I'll leave it up to kvm folks to decide if we need kdump support from
the beginning or if we are good with the current implementation.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-15  8:37       ` Heiko Carstens
@ 2024-10-21  6:33         ` Christian Borntraeger
  2024-10-21 12:19           ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Christian Borntraeger @ 2024-10-21  6:33 UTC (permalink / raw)
  To: Heiko Carstens, David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Sven Schnelle, Thomas Huth,
	Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero



Am 15.10.24 um 10:37 schrieb Heiko Carstens:
> On Mon, Oct 14, 2024 at 09:16:45PM +0200, David Hildenbrand wrote:
>> On 14.10.24 20:48, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 04:46:17PM +0200, David Hildenbrand wrote:
>>>> to dump. Based on this, support for dumping virtio-mem memory can be
>>> Hm.. who will add this support? This looks like a showstopper to me.
>>
>> The cover letter is clearer on that: "One remaining work item is kdump
>> support for virtio-mem memory. This will be sent out separately once initial
>> support landed."
>>
>> I had a prototype, but need to spend some time to clean it up -- or find
>> someone to hand it over to clean it up.
>>
>> I have to chose wisely what I work on nowadays, and cannot spend that time
>> if the basic support won't get ACKed.
>>
>>> Who is supposed to debug crash dumps where memory parts are missing?
>>
>> For many production use cases it certainly needs to exist.
>>
>> But note that virtio-mem can be used with ZONE_MOVABLE, in which case mostly
>> only user data (e.g., pagecache,anon) ends up on hotplugged memory, that
>> would get excluded from makedumpfile in the default configs either way.
>>
>> It's not uncommon to let kdump support be added later (e.g., AMD SNP
>> variants).
> 
> I'll leave it up to kvm folks to decide if we need kdump support from
> the beginning or if we are good with the current implementation.

If David confirms that he has a plan for this, I am fine with a staged approach
for upstream.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/7] virtio-mem: s390 support
  2024-10-21  6:33         ` Christian Borntraeger
@ 2024-10-21 12:19           ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-21 12:19 UTC (permalink / raw)
  To: Christian Borntraeger, Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Sven Schnelle, Thomas Huth,
	Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero



Am 21.10.24 um 08:33 schrieb Christian Borntraeger:
> 
> 
> Am 15.10.24 um 10:37 schrieb Heiko Carstens:
>> On Mon, Oct 14, 2024 at 09:16:45PM +0200, David Hildenbrand wrote:
>>> On 14.10.24 20:48, Heiko Carstens wrote:
>>>
>>> The cover letter is clearer on that: "One remaining work item is kdump
>>> support for virtio-mem memory. This will be sent out separately once initial
>>> support landed."
>>>
>>> I had a prototype, but need to spend some time to clean it up -- or find
>>> someone to hand it over to clean it up.
>>>
>>> I have to chose wisely what I work on nowadays, and cannot spend that time
>>> if the basic support won't get ACKed.
>>>
>>>
>>> For many production use cases it certainly needs to exist.
>>>
>>> But note that virtio-mem can be used with ZONE_MOVABLE, in which case mostly
>>> only user data (e.g., pagecache,anon) ends up on hotplugged memory, that
>>> would get excluded from makedumpfile in the default configs either way.
>>>
>>> It's not uncommon to let kdump support be added later (e.g., AMD SNP
>>> variants).
>>
>> I'll leave it up to kvm folks to decide if we need kdump support from
>> the beginning or if we are good with the current implementation.
> 
> If David confirms that he has a plan for this, I am fine with a staged approach
> for upstream.

I do have a plan and a even a semi-working prototype that I am currently 
improving. In summary, the virtio-mem driver in kdump mode can report ranges 
with plugged memory to the core so we can include them in the elfcore hdr. That 
is the easy part.

The "challenge" is when the virtio-mem driver is built as a module and gets 
loaded after building/allocating the elfcore hdr (which happens when creating 
/proc/vmcore). We have to defer detecting+adding the ranges to the time 
/proc/vmcore gets opened. Not super complicated, but needs some thought to get 
it done in a clean way / with minimal churn.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
                   ` (4 preceding siblings ...)
  2024-10-14 14:46 ` [PATCH v2 5/7] virtio-mem: s390 support David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 18:53   ` Heiko Carstens
  2024-10-14 14:46 ` [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB David Hildenbrand
  2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
  7 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero

virtio-mem currently depends on !DEVMEM | STRICT_DEVMEM. Let's default
STRICT_DEVMEM to "y" just like we do for arm64 and x86.

There could be ways in the future to filter access to virtio-mem device
memory even without STRICT_DEVMEM, but for now let's just keep it
simple.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7315f643817a..e7a917540e2a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1905,7 +1905,7 @@ config STRICT_DEVMEM
 	bool "Filter access to /dev/mem"
 	depends on MMU && DEVMEM
 	depends on ARCH_HAS_DEVMEM_IS_ALLOWED || GENERIC_LIB_DEVMEM_IS_ALLOWED
-	default y if PPC || X86 || ARM64
+	default y if PPC || X86 || ARM64 || S390
 	help
 	  If this option is disabled, you allow userspace (root) access to all
 	  of memory, including kernel and userspace memory. Accidental
-- 
2.46.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390
  2024-10-14 14:46 ` [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390 David Hildenbrand
@ 2024-10-14 18:53   ` Heiko Carstens
  0 siblings, 0 replies; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:18PM +0200, David Hildenbrand wrote:
> virtio-mem currently depends on !DEVMEM | STRICT_DEVMEM. Let's default
> STRICT_DEVMEM to "y" just like we do for arm64 and x86.
> 
> There could be ways in the future to filter access to virtio-mem device
> memory even without STRICT_DEVMEM, but for now let's just keep it
> simple.
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  lib/Kconfig.debug | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Heiko Carstens <hca@linux.ibm.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
                   ` (5 preceding siblings ...)
  2024-10-14 14:46 ` [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390 David Hildenbrand
@ 2024-10-14 14:46 ` David Hildenbrand
  2024-10-14 17:53   ` Heiko Carstens
  2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
  7 siblings, 1 reply; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 14:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-s390, virtualization, linux-doc, kvm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Huth, Cornelia Huck, Janosch Frank, Claudio Imbrenda,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet, Mario Casquero

Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
Before that, we were using a section size of 32 MiB on both
architectures.

Likely the reason was that we'd expect a storage increment size of
256 MiB under z/VM back then. As we didn't support memory blocks spanning
multiple memory sections, we would have had to handle having multiple
memory blocks for a single storage increment, which complicates things.
Although that issue reappeared with even bigger storage increment sizes
later, nowadays we have memory blocks that can span multiple memory
sections and we avoid any such issue completely.

Now that we have a new mechanism to expose additional memory to a VM --
virtio-mem -- reduce the section size to 128 MiB to allow for more
flexibility and reduce the metadata overhead when dealing with hot(un)plug
granularity smaller than 256 MiB.

128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
base pages switched to 128 MiB as well: it's just big enough on these
architectures to allows for using a huge page (2 MiB) in the vmemmap in
sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
in the direct mapping, while still allowing for small hot(un)plug
granularity.

For s390, we could even switch to a 64 MiB section size, as our huge page
size is 1 MiB: but the smaller the section size, the more sections we'll
have to manage especially on bigger machines. Making it consistent with
x86-64 and arm64 feels like te right thing for now.

Note that the smallest memory hot(un)plug granularity is also limited by
the memory block size, determined by extracting the memory increment
size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;
therefore, we'll end up with a memory block size of 128 MiB with a
128 MiB section size.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/sparsemem.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/sparsemem.h b/arch/s390/include/asm/sparsemem.h
index c549893602ea..ff628c50afac 100644
--- a/arch/s390/include/asm/sparsemem.h
+++ b/arch/s390/include/asm/sparsemem.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_S390_SPARSEMEM_H
 #define _ASM_S390_SPARSEMEM_H

-#define SECTION_SIZE_BITS	28
+#define SECTION_SIZE_BITS	27
 #define MAX_PHYSMEM_BITS	CONFIG_MAX_PHYSMEM_BITS

 #endif /* _ASM_S390_SPARSEMEM_H */
-- 
2.46.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB
  2024-10-14 14:46 ` [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB David Hildenbrand
@ 2024-10-14 17:53   ` Heiko Carstens
  2024-10-14 19:47     ` David Hildenbrand
  0 siblings, 1 reply; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 17:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On Mon, Oct 14, 2024 at 04:46:19PM +0200, David Hildenbrand wrote:
> Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
> we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
> Before that, we were using a section size of 32 MiB on both
> architectures.
> 
> Likely the reason was that we'd expect a storage increment size of
> 256 MiB under z/VM back then. As we didn't support memory blocks spanning
> multiple memory sections, we would have had to handle having multiple
> memory blocks for a single storage increment, which complicates things.
> Although that issue reappeared with even bigger storage increment sizes
> later, nowadays we have memory blocks that can span multiple memory
> sections and we avoid any such issue completely.

I doubt that z/VM had support for memory hotplug back then already; and the
sclp memory hotplug code was always written in a way that it could handle
increment sizes smaller, larger or equal to section sizes.

If I remember correctly the section size was also be used to represent each
piece of memory in sysfs (aka memory block). So the different sizes were
chosen to avoid an excessive number of sysfs entries on 64 bit.

This problem went away later with the introduction of memory_block_size.

Even further back in time I think there were static arrays which had
2^(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) elements.

I just gave it a try and, as nowadays expected, bloat-o-meter doesn't
indicate anything like that anymore.

> 128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
> base pages switched to 128 MiB as well: it's just big enough on these
> architectures to allows for using a huge page (2 MiB) in the vmemmap in
> sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
> in the direct mapping, while still allowing for small hot(un)plug
> granularity.
> 
> For s390, we could even switch to a 64 MiB section size, as our huge page
> size is 1 MiB: but the smaller the section size, the more sections we'll
> have to manage especially on bigger machines. Making it consistent with
> x86-64 and arm64 feels like te right thing for now.

That's fine with me.

Acked-by: Heiko Carstens <hca@linux.ibm.com>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB
  2024-10-14 17:53   ` Heiko Carstens
@ 2024-10-14 19:47     ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:47 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet,
	Mario Casquero

On 14.10.24 19:53, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:19PM +0200, David Hildenbrand wrote:
>> Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
>> we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
>> Before that, we were using a section size of 32 MiB on both
>> architectures.
>>
>> Likely the reason was that we'd expect a storage increment size of
>> 256 MiB under z/VM back then. As we didn't support memory blocks spanning
>> multiple memory sections, we would have had to handle having multiple
>> memory blocks for a single storage increment, which complicates things.
>> Although that issue reappeared with even bigger storage increment sizes
>> later, nowadays we have memory blocks that can span multiple memory
>> sections and we avoid any such issue completely.
> 
> I doubt that z/VM had support for memory hotplug back then already; and the
> sclp memory hotplug code was always written in a way that it could handle
> increment sizes smaller, larger or equal to section sizes.
 > > If I remember correctly the section size was also be used to 
represent each
> piece of memory in sysfs (aka memory block). So the different sizes were
> chosen to avoid an excessive number of sysfs entries on 64 bit.
 > > This problem went away later with the introduction of 
memory_block_size.
> 
> Even further back in time I think there were static arrays which had
> 2^(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) elements.

Interesting. I'll drop the "Likely ..." paragraph then!

> 
> I just gave it a try and, as nowadays expected, bloat-o-meter doesn't
> indicate anything like that anymore.
> 
>> 128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
>> base pages switched to 128 MiB as well: it's just big enough on these
>> architectures to allows for using a huge page (2 MiB) in the vmemmap in
>> sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
>> in the direct mapping, while still allowing for small hot(un)plug
>> granularity.
>>
>> For s390, we could even switch to a 64 MiB section size, as our huge page
>> size is 1 MiB: but the smaller the section size, the more sections we'll
>> have to manage especially on bigger machines. Making it consistent with
>> x86-64 and arm64 feels like te right thing for now.
> 
> That's fine with me.
> 
> Acked-by: Heiko Carstens <hca@linux.ibm.com>
> 

Thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/7] virtio-mem: s390 support
  2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
                   ` (6 preceding siblings ...)
  2024-10-14 14:46 ` [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB David Hildenbrand
@ 2024-10-14 18:56 ` Heiko Carstens
  2024-10-14 19:17   ` David Hildenbrand
  2024-10-15  7:57   ` Claudio Imbrenda
  7 siblings, 2 replies; 59+ messages in thread
From: Heiko Carstens @ 2024-10-14 18:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Mon, Oct 14, 2024 at 04:46:12PM +0200, David Hildenbrand wrote:
> Let's finally add s390 support for virtio-mem; my last RFC was sent
> 4 years ago, and a lot changed in the meantime.
> 
> The latest QEMU series is available at [1], which contains some more
> details and a usage example on s390 (last patch).
> 
> There is not too much in here: The biggest part is querying a new diag(500)
> STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".
> 
> The last two patches are not strictly required but certainly nice-to-have.
> 
> Note that -- in contrast to standby memory -- virtio-mem memory must be
> configured to be automatically onlined as soon as hotplugged. The easiest
> approach is using the "memhp_default_state=" kernel parameter or by using
> proper udev rules. More details can be found at [2].
> 
> I have reviving+upstreaming a systemd service to handle configuring
> that on my todo list, but for some reason I keep getting distracted ...
> 
> I tested various things, including:
>  * Various memory hotplug/hotunplug combinations
>  * Device hotplug/hotunplug
>  * /proc/iomem output
>  * reboot
>  * kexec
>  * kdump: make sure we don't hotplug memory
> 
> One remaining work item is kdump support for virtio-mem memory. This will
> be sent out separately once initial support landed.

Besides the open kdump question, which I think is quite important, how
is this supposed to go upstream?

This could go via s390, however in any case this needs reviews and/or
Acks from kvm folks.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/7] virtio-mem: s390 support
  2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
@ 2024-10-14 19:17   ` David Hildenbrand
  2024-10-15  7:57   ` Claudio Imbrenda
  1 sibling, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-14 19:17 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Claudio Imbrenda, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On 14.10.24 20:56, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:12PM +0200, David Hildenbrand wrote:
>> Let's finally add s390 support for virtio-mem; my last RFC was sent
>> 4 years ago, and a lot changed in the meantime.
>>
>> The latest QEMU series is available at [1], which contains some more
>> details and a usage example on s390 (last patch).
>>
>> There is not too much in here: The biggest part is querying a new diag(500)
>> STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".
>>
>> The last two patches are not strictly required but certainly nice-to-have.
>>
>> Note that -- in contrast to standby memory -- virtio-mem memory must be
>> configured to be automatically onlined as soon as hotplugged. The easiest
>> approach is using the "memhp_default_state=" kernel parameter or by using
>> proper udev rules. More details can be found at [2].
>>
>> I have reviving+upstreaming a systemd service to handle configuring
>> that on my todo list, but for some reason I keep getting distracted ...
>>
>> I tested various things, including:
>>   * Various memory hotplug/hotunplug combinations
>>   * Device hotplug/hotunplug
>>   * /proc/iomem output
>>   * reboot
>>   * kexec
>>   * kdump: make sure we don't hotplug memory
>>
>> One remaining work item is kdump support for virtio-mem memory. This will
>> be sent out separately once initial support landed.
> 
> Besides the open kdump question, which I think is quite important, how
> is this supposed to go upstream?

MST suggested via the s390 tree in v1.

> 
> This could go via s390, however in any case this needs reviews and/or
> Acks from kvm folks.

Yes, hoping there will be some review (there was some on the QEMU series).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/7] virtio-mem: s390 support
  2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
  2024-10-14 19:17   ` David Hildenbrand
@ 2024-10-15  7:57   ` Claudio Imbrenda
  2024-10-25 10:54     ` David Hildenbrand
  1 sibling, 1 reply; 59+ messages in thread
From: Claudio Imbrenda @ 2024-10-15  7:57 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: David Hildenbrand, linux-kernel, linux-mm, linux-s390,
	virtualization, linux-doc, kvm, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Huth, Cornelia Huck,
	Janosch Frank, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Andrew Morton, Jonathan Corbet

On Mon, 14 Oct 2024 20:56:59 +0200
Heiko Carstens <hca@linux.ibm.com> wrote:

> On Mon, Oct 14, 2024 at 04:46:12PM +0200, David Hildenbrand wrote:
> > Let's finally add s390 support for virtio-mem; my last RFC was sent
> > 4 years ago, and a lot changed in the meantime.
> > 
> > The latest QEMU series is available at [1], which contains some more
> > details and a usage example on s390 (last patch).
> > 
> > There is not too much in here: The biggest part is querying a new diag(500)
> > STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".
> > 
> > The last two patches are not strictly required but certainly nice-to-have.
> > 
> > Note that -- in contrast to standby memory -- virtio-mem memory must be
> > configured to be automatically onlined as soon as hotplugged. The easiest
> > approach is using the "memhp_default_state=" kernel parameter or by using
> > proper udev rules. More details can be found at [2].
> > 
> > I have reviving+upstreaming a systemd service to handle configuring
> > that on my todo list, but for some reason I keep getting distracted ...
> > 
> > I tested various things, including:
> >  * Various memory hotplug/hotunplug combinations
> >  * Device hotplug/hotunplug
> >  * /proc/iomem output
> >  * reboot
> >  * kexec
> >  * kdump: make sure we don't hotplug memory
> > 
> > One remaining work item is kdump support for virtio-mem memory. This will
> > be sent out separately once initial support landed.  
> 
> Besides the open kdump question, which I think is quite important, how
> is this supposed to go upstream?
> 
> This could go via s390, however in any case this needs reviews and/or
> Acks from kvm folks.

we're working on it :)


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/7] virtio-mem: s390 support
  2024-10-15  7:57   ` Claudio Imbrenda
@ 2024-10-25 10:54     ` David Hildenbrand
  0 siblings, 0 replies; 59+ messages in thread
From: David Hildenbrand @ 2024-10-25 10:54 UTC (permalink / raw)
  To: Claudio Imbrenda, Heiko Carstens
  Cc: linux-kernel, linux-mm, linux-s390, virtualization, linux-doc,
	kvm, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Huth, Cornelia Huck, Janosch Frank,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Morton, Jonathan Corbet

On 15.10.24 09:57, Claudio Imbrenda wrote:
> On Mon, 14 Oct 2024 20:56:59 +0200
> Heiko Carstens <hca@linux.ibm.com> wrote:
> 
>> On Mon, Oct 14, 2024 at 04:46:12PM +0200, David Hildenbrand wrote:
>>> Let's finally add s390 support for virtio-mem; my last RFC was sent
>>> 4 years ago, and a lot changed in the meantime.
>>>
>>> The latest QEMU series is available at [1], which contains some more
>>> details and a usage example on s390 (last patch).
>>>
>>> There is not too much in here: The biggest part is querying a new diag(500)
>>> STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".
>>>
>>> The last two patches are not strictly required but certainly nice-to-have.
>>>
>>> Note that -- in contrast to standby memory -- virtio-mem memory must be
>>> configured to be automatically onlined as soon as hotplugged. The easiest
>>> approach is using the "memhp_default_state=" kernel parameter or by using
>>> proper udev rules. More details can be found at [2].
>>>
>>> I have reviving+upstreaming a systemd service to handle configuring
>>> that on my todo list, but for some reason I keep getting distracted ...
>>>
>>> I tested various things, including:
>>>   * Various memory hotplug/hotunplug combinations
>>>   * Device hotplug/hotunplug
>>>   * /proc/iomem output
>>>   * reboot
>>>   * kexec
>>>   * kdump: make sure we don't hotplug memory
>>>
>>> One remaining work item is kdump support for virtio-mem memory. This will
>>> be sent out separately once initial support landed.
>>
>> Besides the open kdump question, which I think is quite important, how
>> is this supposed to go upstream?
>>
>> This could go via s390, however in any case this needs reviews and/or
>> Acks from kvm folks.
> 
> we're working on it :)

I'll be sending a v3 later today, and a v1 of kdump support (which 
involves a bunch of changes to fs/proc/vmcore.c and virtio-mem) separately.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2024-12-05 15:25 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-14 14:46 [PATCH v2 0/7] virtio-mem: s390 support David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel() David Hildenbrand
2024-10-14 18:20   ` Heiko Carstens
2024-10-14 19:26     ` David Hildenbrand
2024-10-15  8:30       ` Heiko Carstens
2024-10-15  8:41         ` David Hildenbrand
2024-10-15  8:53           ` David Hildenbrand
2024-10-15  8:56             ` David Hildenbrand
2024-10-15 10:08           ` Heiko Carstens
2024-10-15 10:40             ` David Hildenbrand
2024-10-16 13:35               ` Alexander Egorenkov
2024-10-16 15:47                 ` David Hildenbrand
2024-10-16 15:54                   ` David Hildenbrand
2024-10-21 12:46                   ` Alexander Egorenkov
2024-10-21 14:45                     ` David Hildenbrand
2024-10-23  7:42                       ` Heiko Carstens
2024-10-23  7:45                         ` David Hildenbrand
2024-10-23 11:17                       ` Alexander Egorenkov
2024-10-14 14:46 ` [PATCH v2 2/7] Documentation: s390-diag.rst: make diag500 a generic KVM hypercall David Hildenbrand
2024-10-14 18:04   ` Heiko Carstens
2024-10-14 19:35     ` David Hildenbrand
2024-10-15  8:12       ` Heiko Carstens
2024-10-15  8:16         ` David Hildenbrand
2024-10-15  8:21           ` Heiko Carstens
2024-10-15  8:32             ` David Hildenbrand
2024-10-15  8:46               ` Heiko Carstens
2024-10-15  8:48                 ` David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 3/7] Documentation: s390-diag.rst: document diag500(STORAGE LIMIT) subfunction David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 4/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices David Hildenbrand
2024-10-14 18:43   ` Heiko Carstens
2024-10-14 19:42     ` David Hildenbrand
2024-10-15 15:01     ` Eric Farman
2024-10-15 15:20       ` Heiko Carstens
2024-10-25 10:52         ` David Hildenbrand
2024-10-16 10:37       ` Halil Pasic
2024-10-17  7:36   ` Alexander Gordeev
2024-10-17  8:19     ` David Hildenbrand
2024-10-17  9:53       ` Alexander Gordeev
2024-10-17 10:00         ` David Hildenbrand
2024-10-17 12:07           ` David Hildenbrand
2024-10-17 14:32             ` Alexander Gordeev
2024-10-17 14:36               ` David Hildenbrand
2024-10-30 14:30   ` Alexander Gordeev
2024-10-30 14:33     ` Alexander Gordeev
2024-10-14 14:46 ` [PATCH v2 5/7] virtio-mem: s390 support David Hildenbrand
2024-10-14 18:48   ` Heiko Carstens
2024-10-14 19:16     ` David Hildenbrand
2024-10-15  8:37       ` Heiko Carstens
2024-10-21  6:33         ` Christian Borntraeger
2024-10-21 12:19           ` David Hildenbrand
2024-10-14 14:46 ` [PATCH v2 6/7] lib/Kconfig.debug: default STRICT_DEVMEM to "y" on s390 David Hildenbrand
2024-10-14 18:53   ` Heiko Carstens
2024-10-14 14:46 ` [PATCH v2 7/7] s390/sparsemem: reduce section size to 128 MiB David Hildenbrand
2024-10-14 17:53   ` Heiko Carstens
2024-10-14 19:47     ` David Hildenbrand
2024-10-14 18:56 ` [PATCH v2 0/7] virtio-mem: s390 support Heiko Carstens
2024-10-14 19:17   ` David Hildenbrand
2024-10-15  7:57   ` Claudio Imbrenda
2024-10-25 10:54     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).