* [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-15 9:30 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
` (11 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
We want to protect vmcore modifications from concurrent opening of
the vmcore, and also serialize vmcore modiciations. Let's convert the
spinlock into a mutex, because some of the operations we'll be
protecting might sleep (e.g., memory allocations) and might take a bit
longer.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b52d85f8ad59..110ce193d20f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -62,7 +62,8 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0);
/* Device Dump Size */
static size_t vmcoredd_orig_sz;
-static DEFINE_SPINLOCK(vmcore_cb_lock);
+static DEFINE_MUTEX(vmcore_mutex);
+
DEFINE_STATIC_SRCU(vmcore_cb_srcu);
/* List of registered vmcore callbacks. */
static LIST_HEAD(vmcore_cb_list);
@@ -72,7 +73,7 @@ static bool vmcore_opened;
void register_vmcore_cb(struct vmcore_cb *cb)
{
INIT_LIST_HEAD(&cb->next);
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
list_add_tail(&cb->next, &vmcore_cb_list);
/*
* Registering a vmcore callback after the vmcore was opened is
@@ -80,13 +81,13 @@ void register_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback registration\n");
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
}
EXPORT_SYMBOL_GPL(register_vmcore_cb);
void unregister_vmcore_cb(struct vmcore_cb *cb)
{
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
list_del_rcu(&cb->next);
/*
* Unregistering a vmcore callback after the vmcore was opened is
@@ -95,7 +96,7 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback unregistration\n");
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
synchronize_srcu(&vmcore_cb_srcu);
}
@@ -120,9 +121,9 @@ static bool pfn_is_ram(unsigned long pfn)
static int open_vmcore(struct inode *inode, struct file *file)
{
- spin_lock(&vmcore_cb_lock);
+ mutex_lock(&vmcore_mutex);
vmcore_opened = true;
- spin_unlock(&vmcore_cb_lock);
+ mutex_unlock(&vmcore_mutex);
return 0;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-10-25 15:11 ` [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
@ 2024-11-15 9:30 ` Baoquan He
2024-11-15 10:03 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-15 9:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> We want to protect vmcore modifications from concurrent opening of
> the vmcore, and also serialize vmcore modiciations. Let's convert the
> spinlock into a mutex, because some of the operations we'll be
> protecting might sleep (e.g., memory allocations) and might take a bit
> longer.
Could you elaborate this a little further. E.g the concurrent opening of
vmcore is spot before this patchset or have been seen, and in which place
the memory allocation is spot. Asking this becasue I'd like to learn and
make clear if this is a existing issue and need be back ported into our
old RHEL distros. Thanks in advance.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/vmcore.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index b52d85f8ad59..110ce193d20f 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -62,7 +62,8 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0);
> /* Device Dump Size */
> static size_t vmcoredd_orig_sz;
>
> -static DEFINE_SPINLOCK(vmcore_cb_lock);
> +static DEFINE_MUTEX(vmcore_mutex);
> +
> DEFINE_STATIC_SRCU(vmcore_cb_srcu);
> /* List of registered vmcore callbacks. */
> static LIST_HEAD(vmcore_cb_list);
> @@ -72,7 +73,7 @@ static bool vmcore_opened;
> void register_vmcore_cb(struct vmcore_cb *cb)
> {
> INIT_LIST_HEAD(&cb->next);
> - spin_lock(&vmcore_cb_lock);
> + mutex_lock(&vmcore_mutex);
> list_add_tail(&cb->next, &vmcore_cb_list);
> /*
> * Registering a vmcore callback after the vmcore was opened is
> @@ -80,13 +81,13 @@ void register_vmcore_cb(struct vmcore_cb *cb)
> */
> if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback registration\n");
> - spin_unlock(&vmcore_cb_lock);
> + mutex_unlock(&vmcore_mutex);
> }
> EXPORT_SYMBOL_GPL(register_vmcore_cb);
>
> void unregister_vmcore_cb(struct vmcore_cb *cb)
> {
> - spin_lock(&vmcore_cb_lock);
> + mutex_lock(&vmcore_mutex);
> list_del_rcu(&cb->next);
> /*
> * Unregistering a vmcore callback after the vmcore was opened is
> @@ -95,7 +96,7 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
> */
> if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback unregistration\n");
> - spin_unlock(&vmcore_cb_lock);
> + mutex_unlock(&vmcore_mutex);
>
> synchronize_srcu(&vmcore_cb_srcu);
> }
> @@ -120,9 +121,9 @@ static bool pfn_is_ram(unsigned long pfn)
>
> static int open_vmcore(struct inode *inode, struct file *file)
> {
> - spin_lock(&vmcore_cb_lock);
> + mutex_lock(&vmcore_mutex);
> vmcore_opened = true;
> - spin_unlock(&vmcore_cb_lock);
> + mutex_unlock(&vmcore_mutex);
>
> return 0;
> }
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-11-15 9:30 ` Baoquan He
@ 2024-11-15 10:03 ` David Hildenbrand
2024-11-20 8:16 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-15 10:03 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 15.11.24 10:30, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> We want to protect vmcore modifications from concurrent opening of
>> the vmcore, and also serialize vmcore modiciations. Let's convert the
>
>
>> spinlock into a mutex, because some of the operations we'll be
>> protecting might sleep (e.g., memory allocations) and might take a bit
>> longer.
>
> Could you elaborate this a little further. E.g the concurrent opening of
> vmcore is spot before this patchset or have been seen, and in which place
> the memory allocation is spot. Asking this becasue I'd like to learn and
> make clear if this is a existing issue and need be back ported into our
> old RHEL distros. Thanks in advance.
It's a preparation for the other patches, that do what is described here:
a) We can currently modify the vmcore after it was opened. This can
happen if the vmcoredd is added after the vmcore was loaded. Similar
things will happen with the PROC_VMCORE_DEVICE_RAM extension.
b) To handle it cleanly we need to protect the modifications against
concurrent opening. And the modifcations end up allocating memory and
cannot easily take the spinlock.
So far a spinlock was sufficient, now a mutex is required.
Maybe we'd want to backport 1,2,3, but not sure if we consider this
critical enough.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-11-15 10:03 ` David Hildenbrand
@ 2024-11-20 8:16 ` Baoquan He
2024-11-20 8:56 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-20 8:16 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/15/24 at 11:03am, David Hildenbrand wrote:
> On 15.11.24 10:30, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > We want to protect vmcore modifications from concurrent opening of
> > > the vmcore, and also serialize vmcore modiciations. Let's convert the
> >
> >
> > > spinlock into a mutex, because some of the operations we'll be
> > > protecting might sleep (e.g., memory allocations) and might take a bit
> > > longer.
> >
> > Could you elaborate this a little further. E.g the concurrent opening of
> > vmcore is spot before this patchset or have been seen, and in which place
> > the memory allocation is spot. Asking this becasue I'd like to learn and
> > make clear if this is a existing issue and need be back ported into our
> > old RHEL distros. Thanks in advance.
>
> It's a preparation for the other patches, that do what is described here:
>
> a) We can currently modify the vmcore after it was opened. This can happen
> if the vmcoredd is added after the vmcore was loaded. Similar things will
> happen with the PROC_VMCORE_DEVICE_RAM extension.
>
> b) To handle it cleanly we need to protect the modifications against
> concurrent opening. And the modifcations end up allocating memory and cannot
> easily take the spinlock.
>
> So far a spinlock was sufficient, now a mutex is required.
I see, as we talked in patch 2 sub-thread, these information are very
valuable to help people get the background information when they read
code. Let's put it in patch log. Thanks.
>
> Maybe we'd want to backport 1,2,3, but not sure if we consider this critical
> enough.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
2024-11-20 8:16 ` Baoquan He
@ 2024-11-20 8:56 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-11-20 8:56 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 20.11.24 09:16, Baoquan He wrote:
> On 11/15/24 at 11:03am, David Hildenbrand wrote:
>> On 15.11.24 10:30, Baoquan He wrote:
>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>>> We want to protect vmcore modifications from concurrent opening of
>>>> the vmcore, and also serialize vmcore modiciations. Let's convert the
>>>
>>>
>>>> spinlock into a mutex, because some of the operations we'll be
>>>> protecting might sleep (e.g., memory allocations) and might take a bit
>>>> longer.
>>>
>>> Could you elaborate this a little further. E.g the concurrent opening of
>>> vmcore is spot before this patchset or have been seen, and in which place
>>> the memory allocation is spot. Asking this becasue I'd like to learn and
>>> make clear if this is a existing issue and need be back ported into our
>>> old RHEL distros. Thanks in advance.
>>
>> It's a preparation for the other patches, that do what is described here:
>>
>> a) We can currently modify the vmcore after it was opened. This can happen
>> if the vmcoredd is added after the vmcore was loaded. Similar things will
>> happen with the PROC_VMCORE_DEVICE_RAM extension.
>>
>> b) To handle it cleanly we need to protect the modifications against
>> concurrent opening. And the modifcations end up allocating memory and cannot
>> easily take the spinlock.
>>
>> So far a spinlock was sufficient, now a mutex is required.
>
> I see, as we talked in patch 2 sub-thread, these information are very
> valuable to help people get the background information when they read
> code. Let's put it in patch log. Thanks.
I'll extend the description if that helps, thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-15 9:32 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened David Hildenbrand
` (10 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's use our new mutex instead.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 110ce193d20f..b91c304463c9 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -53,7 +53,6 @@ static struct proc_dir_entry *proc_vmcore;
#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
/* Device Dump list and mutex to synchronize access to list */
static LIST_HEAD(vmcoredd_list);
-static DEFINE_MUTEX(vmcoredd_mutex);
static bool vmcoredd_disabled;
core_param(novmcoredd, vmcoredd_disabled, bool, 0);
@@ -248,7 +247,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
size_t tsz;
char *buf;
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
@@ -269,7 +268,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
}
out_unlock:
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
return ret;
}
@@ -283,7 +282,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
size_t tsz;
char *buf;
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
@@ -306,7 +305,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
}
out_unlock:
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
return ret;
}
#endif /* CONFIG_MMU */
@@ -1517,9 +1516,9 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
dump->size = data_size;
/* Add the dump to driver sysfs list */
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
list_add_tail(&dump->list, &vmcoredd_list);
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
vmcoredd_update_size(data_size);
return 0;
@@ -1537,7 +1536,7 @@ EXPORT_SYMBOL(vmcore_add_device_dump);
static void vmcore_free_device_dumps(void)
{
#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
- mutex_lock(&vmcoredd_mutex);
+ mutex_lock(&vmcore_mutex);
while (!list_empty(&vmcoredd_list)) {
struct vmcoredd_node *dump;
@@ -1547,7 +1546,7 @@ static void vmcore_free_device_dumps(void)
vfree(dump->buf);
vfree(dump);
}
- mutex_unlock(&vmcoredd_mutex);
+ mutex_unlock(&vmcore_mutex);
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
}
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
2024-10-25 15:11 ` [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
@ 2024-11-15 9:32 ` Baoquan He
2024-11-15 10:04 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-15 9:32 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> Let's use our new mutex instead.
Is there reason vmcoredd_mutex need be replaced and integrated with the
vmcore_mutex? Is it the reason the concurrent opening of vmcore could
happen with the old vmcoredd_mutex?
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/vmcore.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 110ce193d20f..b91c304463c9 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -53,7 +53,6 @@ static struct proc_dir_entry *proc_vmcore;
> #ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
> /* Device Dump list and mutex to synchronize access to list */
> static LIST_HEAD(vmcoredd_list);
> -static DEFINE_MUTEX(vmcoredd_mutex);
>
> static bool vmcoredd_disabled;
> core_param(novmcoredd, vmcoredd_disabled, bool, 0);
> @@ -248,7 +247,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
> size_t tsz;
> char *buf;
>
> - mutex_lock(&vmcoredd_mutex);
> + mutex_lock(&vmcore_mutex);
> list_for_each_entry(dump, &vmcoredd_list, list) {
> if (start < offset + dump->size) {
> tsz = min(offset + (u64)dump->size - start, (u64)size);
> @@ -269,7 +268,7 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
> }
>
> out_unlock:
> - mutex_unlock(&vmcoredd_mutex);
> + mutex_unlock(&vmcore_mutex);
> return ret;
> }
>
> @@ -283,7 +282,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
> size_t tsz;
> char *buf;
>
> - mutex_lock(&vmcoredd_mutex);
> + mutex_lock(&vmcore_mutex);
> list_for_each_entry(dump, &vmcoredd_list, list) {
> if (start < offset + dump->size) {
> tsz = min(offset + (u64)dump->size - start, (u64)size);
> @@ -306,7 +305,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
> }
>
> out_unlock:
> - mutex_unlock(&vmcoredd_mutex);
> + mutex_unlock(&vmcore_mutex);
> return ret;
> }
> #endif /* CONFIG_MMU */
> @@ -1517,9 +1516,9 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> dump->size = data_size;
>
> /* Add the dump to driver sysfs list */
> - mutex_lock(&vmcoredd_mutex);
> + mutex_lock(&vmcore_mutex);
> list_add_tail(&dump->list, &vmcoredd_list);
> - mutex_unlock(&vmcoredd_mutex);
> + mutex_unlock(&vmcore_mutex);
>
> vmcoredd_update_size(data_size);
> return 0;
> @@ -1537,7 +1536,7 @@ EXPORT_SYMBOL(vmcore_add_device_dump);
> static void vmcore_free_device_dumps(void)
> {
> #ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
> - mutex_lock(&vmcoredd_mutex);
> + mutex_lock(&vmcore_mutex);
> while (!list_empty(&vmcoredd_list)) {
> struct vmcoredd_node *dump;
>
> @@ -1547,7 +1546,7 @@ static void vmcore_free_device_dumps(void)
> vfree(dump->buf);
> vfree(dump);
> }
> - mutex_unlock(&vmcoredd_mutex);
> + mutex_unlock(&vmcore_mutex);
> #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> }
>
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
2024-11-15 9:32 ` Baoquan He
@ 2024-11-15 10:04 ` David Hildenbrand
2024-11-20 8:14 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-15 10:04 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 15.11.24 10:32, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> Let's use our new mutex instead.
>
> Is there reason vmcoredd_mutex need be replaced and integrated with the
> vmcore_mutex? Is it the reason the concurrent opening of vmcore could
> happen with the old vmcoredd_mutex?
Yes, see the next patch in this series. But I consider this valuable on
its own: there is no need to have two mutexes.
I can make that clearer in the patch description.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
2024-11-15 10:04 ` David Hildenbrand
@ 2024-11-20 8:14 ` Baoquan He
0 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-20 8:14 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/15/24 at 11:04am, David Hildenbrand wrote:
> On 15.11.24 10:32, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > Let's use our new mutex instead.
> >
> > Is there reason vmcoredd_mutex need be replaced and integrated with the
> > vmcore_mutex? Is it the reason the concurrent opening of vmcore could
> > happen with the old vmcoredd_mutex?
>
> Yes, see the next patch in this series. But I consider this valuable on its
> own: there is no need to have two mutexes.
>
> I can make that clearer in the patch description.
That would be great and more helpful. Because I didn't find the reason
about the lock integration and avoid concurrent opening of vmcore in
cover-letter and logs of the first few patches, I thought there have
been potential problems and the first few patches are used to fix them.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 01/11] fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 02/11] fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-22 9:16 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h David Hildenbrand
` (9 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's protect all vmcore modifications by the vmcore_mutex and
disallow vmcore modifications after the vmcore was opened: modifications
would no longer be safe. Properly synchronize against concurrent opening
of the vmcore.
As a nice side-effect, we now properly protect concurrent vmcore
modifications.
No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 42 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 23 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b91c304463c9..6371dbaa21be 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -243,33 +243,27 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
{
struct vmcoredd_node *dump;
u64 offset = 0;
- int ret = 0;
size_t tsz;
char *buf;
- mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
buf = dump->buf + start - offset;
- if (copy_to_iter(buf, tsz, iter) < tsz) {
- ret = -EFAULT;
- goto out_unlock;
- }
+ if (copy_to_iter(buf, tsz, iter) < tsz)
+ return -EFAULT;
size -= tsz;
start += tsz;
/* Leave now if buffer filled already */
if (!size)
- goto out_unlock;
+ return 0;
}
offset += dump->size;
}
-out_unlock:
- mutex_unlock(&vmcore_mutex);
- return ret;
+ return 0;
}
#ifdef CONFIG_MMU
@@ -278,20 +272,16 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
{
struct vmcoredd_node *dump;
u64 offset = 0;
- int ret = 0;
size_t tsz;
char *buf;
- mutex_lock(&vmcore_mutex);
list_for_each_entry(dump, &vmcoredd_list, list) {
if (start < offset + dump->size) {
tsz = min(offset + (u64)dump->size - start, (u64)size);
buf = dump->buf + start - offset;
if (remap_vmalloc_range_partial(vma, dst, buf, 0,
- tsz)) {
- ret = -EFAULT;
- goto out_unlock;
- }
+ tsz))
+ return -EFAULT;
size -= tsz;
start += tsz;
@@ -299,14 +289,12 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
/* Leave now if buffer filled already */
if (!size)
- goto out_unlock;
+ return 0;
}
offset += dump->size;
}
-out_unlock:
- mutex_unlock(&vmcore_mutex);
- return ret;
+ return 0;
}
#endif /* CONFIG_MMU */
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
@@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
return -EINVAL;
}
+ /* We'll recheck under lock later. */
+ if (data_race(vmcore_opened))
+ return -EBUSY;
+
if (!data || !strlen(data->dump_name) ||
!data->vmcoredd_callback || !data->size)
return -EINVAL;
@@ -1515,12 +1507,16 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
dump->buf = buf;
dump->size = data_size;
- /* Add the dump to driver sysfs list */
+ /* Add the dump to driver sysfs list and update the elfcore hdr */
mutex_lock(&vmcore_mutex);
- list_add_tail(&dump->list, &vmcoredd_list);
- mutex_unlock(&vmcore_mutex);
+ if (vmcore_opened) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+ list_add_tail(&dump->list, &vmcoredd_list);
vmcoredd_update_size(data_size);
+ mutex_unlock(&vmcore_mutex);
return 0;
out_err:
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-10-25 15:11 ` [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened David Hildenbrand
@ 2024-11-22 9:16 ` Baoquan He
2024-11-22 9:30 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-22 9:16 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
......snip...
> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> return -EINVAL;
> }
>
> + /* We'll recheck under lock later. */
> + if (data_race(vmcore_opened))
> + return -EBUSY;
As I commented to patch 7, if vmcore is opened and closed after
checking, do we need to give up any chance to add device dumping
as below?
fd = open(/proc/vmcore);
...do checking;
close(fd);
quit any device dump adding;
run makedumpfile on s390;
->fd = open(/proc/vmcore);
-> try to dump;
->close(fd);
> +
> if (!data || !strlen(data->dump_name) ||
> !data->vmcoredd_callback || !data->size)
> return -EINVAL;
> @@ -1515,12 +1507,16 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> dump->buf = buf;
> dump->size = data_size;
>
> - /* Add the dump to driver sysfs list */
> + /* Add the dump to driver sysfs list and update the elfcore hdr */
> mutex_lock(&vmcore_mutex);
> - list_add_tail(&dump->list, &vmcoredd_list);
> - mutex_unlock(&vmcore_mutex);
> + if (vmcore_opened) {
> + ret = -EBUSY;
> + goto out_err;
> + }
>
> + list_add_tail(&dump->list, &vmcoredd_list);
> vmcoredd_update_size(data_size);
> + mutex_unlock(&vmcore_mutex);
> return 0;
>
> out_err:
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-11-22 9:16 ` Baoquan He
@ 2024-11-22 9:30 ` David Hildenbrand
2024-11-25 14:41 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-22 9:30 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 22.11.24 10:16, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> ......snip...
>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>> return -EINVAL;
>> }
>>
>> + /* We'll recheck under lock later. */
>> + if (data_race(vmcore_opened))
>> + return -EBUSY;
>
Hi,
> As I commented to patch 7, if vmcore is opened and closed after
> checking, do we need to give up any chance to add device dumping
> as below?
>
> fd = open(/proc/vmcore);
> ...do checking;
> close(fd);
>
> quit any device dump adding;
>
> run makedumpfile on s390;
> ->fd = open(/proc/vmcore);
> -> try to dump;
> ->close(fd);
The only reasonable case where this could happen (with virtio_mem) would
be when you hotplug a virtio-mem device into a VM that is currently in
the kdump kernel. However, in this case, the device would not provide
any memory we want to dump:
(1) The memory was not available to the 1st (crashed) kernel, because
the device got hotplugged later.
(2) Hotplugged virtio-mem devices show up with "no plugged memory",
meaning there wouldn't be even something to dump.
Drivers will be loaded (as part of the kernel or as part of the initrd)
before any kdump action is happening. Similarly, just imagine your NIC
driver not being loaded when you start dumping to a network share ...
This should similarly apply to vmcoredd providers.
There is another concern I had at some point with changing the effective
/proc/vmcore size after someone already opened it, and might assume the
size will stay unmodified (IOW, the file was completely static before
vmcoredd showed up).
So unless there is a real use case that requires tracking whether the
file is no longer open, to support modifying the vmcore afterwards, we
should keep it simple.
I am not aware of any such cases, and my experiments with virtio_mem
showed that the driver get loaded extremely early from the initrd,
compared to when we actually start messing with /proc/vmcore from user
space.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-11-22 9:30 ` David Hildenbrand
@ 2024-11-25 14:41 ` Baoquan He
2024-11-29 10:38 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-25 14:41 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/22/24 at 10:30am, David Hildenbrand wrote:
> On 22.11.24 10:16, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > ......snip...
> > > @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> > > return -EINVAL;
> > > }
> > > + /* We'll recheck under lock later. */
> > > + if (data_race(vmcore_opened))
> > > + return -EBUSY;
> >
>
> Hi,
>
> > As I commented to patch 7, if vmcore is opened and closed after
> > checking, do we need to give up any chance to add device dumping
> > as below?
> >
> > fd = open(/proc/vmcore);
> > ...do checking;
> > close(fd);
> >
> > quit any device dump adding;
> >
> > run makedumpfile on s390;
> > ->fd = open(/proc/vmcore);
> > -> try to dump;
> > ->close(fd);
>
> The only reasonable case where this could happen (with virtio_mem) would be
> when you hotplug a virtio-mem device into a VM that is currently in the
> kdump kernel. However, in this case, the device would not provide any memory
> we want to dump:
>
> (1) The memory was not available to the 1st (crashed) kernel, because
> the device got hotplugged later.
> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
> meaning there wouldn't be even something to dump.
>
> Drivers will be loaded (as part of the kernel or as part of the initrd)
> before any kdump action is happening. Similarly, just imagine your NIC
> driver not being loaded when you start dumping to a network share ...
>
> This should similarly apply to vmcoredd providers.
>
> There is another concern I had at some point with changing the effective
> /proc/vmcore size after someone already opened it, and might assume the size
> will stay unmodified (IOW, the file was completely static before vmcoredd
> showed up).
>
> So unless there is a real use case that requires tracking whether the file
> is no longer open, to support modifying the vmcore afterwards, we should
> keep it simple.
>
> I am not aware of any such cases, and my experiments with virtio_mem showed
> that the driver get loaded extremely early from the initrd, compared to when
> we actually start messing with /proc/vmcore from user space.
Hmm, thanks for sharing your thoughts. I personally think if we could,
we had better make code as robust as possible. Here, since we have
already integrated the lock with one vmcore_mutex, whether the vmcoredd
is added before or after /proc/vmcore opening/closing, it won't harm.
The benefit is it works well with the current kwown case, virtio-mem
probing, makedumpfile running. And it also works well with other
possible cases, e.g if you doubt virtio-mem dumping and want to debug,
you set rd.break or take any way to drop into emengency shell of kdump
kernel, you can play it to poke virtio-mem module again and run makedumpfile
manually or reversing the order or taking any testing. Then kernel
implementation should not preset that user space have to run the
makedumpfile after the much earlier virtio-mem probing. If we implement
codes according to preset userspace scenario, that limit the future
adapttion, IMHO.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-11-25 14:41 ` Baoquan He
@ 2024-11-29 10:38 ` David Hildenbrand
2024-12-03 10:42 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-29 10:38 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 25.11.24 15:41, Baoquan He wrote:
> On 11/22/24 at 10:30am, David Hildenbrand wrote:
>> On 22.11.24 10:16, Baoquan He wrote:
>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>> ......snip...
>>>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>> return -EINVAL;
>>>> }
>>>> + /* We'll recheck under lock later. */
>>>> + if (data_race(vmcore_opened))
>>>> + return -EBUSY;
>>>
>>
>> Hi,
>>
>>> As I commented to patch 7, if vmcore is opened and closed after
>>> checking, do we need to give up any chance to add device dumping
>>> as below?
>>>
>>> fd = open(/proc/vmcore);
>>> ...do checking;
>>> close(fd);
>>>
>>> quit any device dump adding;
>>>
>>> run makedumpfile on s390;
>>> ->fd = open(/proc/vmcore);
>>> -> try to dump;
>>> ->close(fd);
>>
>> The only reasonable case where this could happen (with virtio_mem) would be
>> when you hotplug a virtio-mem device into a VM that is currently in the
>> kdump kernel. However, in this case, the device would not provide any memory
>> we want to dump:
>>
>> (1) The memory was not available to the 1st (crashed) kernel, because
>> the device got hotplugged later.
>> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
>> meaning there wouldn't be even something to dump.
>>
>> Drivers will be loaded (as part of the kernel or as part of the initrd)
>> before any kdump action is happening. Similarly, just imagine your NIC
>> driver not being loaded when you start dumping to a network share ...
>>
>> This should similarly apply to vmcoredd providers.
>>
>> There is another concern I had at some point with changing the effective
>> /proc/vmcore size after someone already opened it, and might assume the size
>> will stay unmodified (IOW, the file was completely static before vmcoredd
>> showed up).
>>
>> So unless there is a real use case that requires tracking whether the file
>> is no longer open, to support modifying the vmcore afterwards, we should
>> keep it simple.
>>
>> I am not aware of any such cases, and my experiments with virtio_mem showed
>> that the driver get loaded extremely early from the initrd, compared to when
>> we actually start messing with /proc/vmcore from user space.
>
> Hmm, thanks for sharing your thoughts.
To assist the discussion, here is the kdump dmesg output on s390x with the
virtio-mem driver as part of the initrd:
[Fri Nov 29 04:55:28 2024] Run /init as init process
[Fri Nov 29 04:55:28 2024] with arguments:
[Fri Nov 29 04:55:28 2024] /init
[Fri Nov 29 04:55:28 2024] with environment:
[Fri Nov 29 04:55:28 2024] HOME=/
[Fri Nov 29 04:55:28 2024] TERM=linux
[Fri Nov 29 04:55:28 2024] numa=off
[Fri Nov 29 04:55:28 2024] cma=0
[Fri Nov 29 04:55:28 2024] loop: module loaded
[Fri Nov 29 04:55:28 2024] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[Fri Nov 29 04:55:28 2024] loop0: detected capacity change from 0 to 45280
[Fri Nov 29 04:55:28 2024] overlayfs: upper fs does not support RENAME_WHITEOUT.
[Fri Nov 29 04:55:28 2024] overlayfs: failed to set xattr on upper
[Fri Nov 29 04:55:28 2024] overlayfs: ...falling back to uuid=null.
[Fri Nov 29 04:55:28 2024] systemd[1]: Successfully made /usr/ read-only.
[Fri Nov 29 04:55:28 2024] systemd[1]: systemd 256.8-1.fc41 running in system mode [...]
[Fri Nov 29 04:55:28 2024] systemd[1]: Detected virtualization kvm.
[Fri Nov 29 04:55:28 2024] systemd[1]: Detected architecture s390x.
[Fri Nov 29 04:55:28 2024] systemd[1]: Running in initrd.
[Fri Nov 29 04:55:28 2024] systemd[1]: No hostname configured, using default hostname.
[Fri Nov 29 04:55:28 2024] systemd[1]: Hostname set to <localhost>.
[Fri Nov 29 04:55:28 2024] systemd[1]: Queued start job for default target initrd.target.
[Fri Nov 29 04:55:28 2024] systemd[1]: Started systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch.
[Fri Nov 29 04:55:28 2024] systemd[1]: Expecting device dev-mapper-sysvg\x2droot.device - /dev/mapper/sysvg-root...
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target initrd-usr-fs.target - Initrd /usr File System.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target paths.target - Path Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target slices.target - Slice Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target swap.target - Swaps.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target timers.target - Timer Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-journald-dev-log.socket - Journal Socket (/dev/log).
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-journald.socket - Journal Sockets.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-udevd-control.socket - udev Control Socket.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-udevd-kernel.socket - udev Kernel Socket.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target sockets.target - Socket Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting kmod-static-nodes.service - Create List of Static Device Nodes...
[Fri Nov 29 04:55:28 2024] systemd[1]: memstrack.service - Memstrack Anylazing Service was skipped because no trigger condition checks were met.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-journald.service - Journal Service...
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-modules-load.service - Load Kernel Modules...
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished kmod-static-nodes.service - Create List of Static Device Nodes.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-tmpfiles-setup-dev-early.service - Create Static Device Nodes in /dev gracefully...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-modules-load.service - Load Kernel Modules.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-sysctl.service - Apply Kernel Variables...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-sysctl.service - Apply Kernel Variables.
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-tmpfiles-setup-dev-early.service - Create Static Device Nodes in /dev gracefully.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-sysusers.service - Create System Users...
[Fri Nov 29 04:55:28 2024] systemd-journald[205]: Collecting audit messages is disabled.
[Fri Nov 29 04:55:28 2024] systemd[1]: Started systemd-journald.service - Journal Service.
[Fri Nov 29 04:55:28 2024] evm: overlay not supported
[Fri Nov 29 04:55:29 2024] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[Fri Nov 29 04:55:29 2024] device-mapper: uevent: version 1.0.3
[Fri Nov 29 04:55:29 2024] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
[Fri Nov 29 04:55:29 2024] VFIO - User Level meta-driver version: 0.3
[Fri Nov 29 04:55:29 2024] virtio_blk virtio1: 1/0/0 default/read/poll queues
[Fri Nov 29 04:55:29 2024] virtio_blk virtio1: [vda] 35651584 512-byte logical blocks (18.3 GB/17.0 GiB)
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: start address: 0x100000000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: region size: 0x400000000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: device block size: 0x100000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: nid: 0
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: memory hot(un)plug disabled in kdump kernel
[Fri Nov 29 04:55:29 2024] GPT:Primary header thinks Alt. header is not at the end of the disk.
[Fri Nov 29 04:55:29 2024] GPT:14680063 != 35651583
[Fri Nov 29 04:55:29 2024] GPT:Alternate GPT header not at the end of the disk.
[Fri Nov 29 04:55:29 2024] GPT:14680063 != 35651583
[Fri Nov 29 04:55:29 2024] GPT: Use GNU Parted to correct GPT errors.
[Fri Nov 29 04:55:29 2024] vda: vda1 vda2
[Fri Nov 29 04:55:29 2024] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[Fri Nov 29 04:55:29 2024] XFS: attr2 mount option is deprecated.
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Mounting V5 Filesystem 4171e972-6865-4c47-ba7f-20bd52628650
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Starting recovery (logdev: internal)
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Ending recovery (logdev: internal)
Nov 29 04:55:28 localhost systemd-journald[205]: Journal started
Nov 29 04:55:28 localhost systemd-journald[205]: Runtime Journal (/run/log/journal/1377b7cdca054edd8904cb6f80695a23) is 1.4M, max 11.2M, 9.8M free.
Nov 29 04:55:28 localhost systemd-vconsole-setup[207]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring.
Nov 29 04:55:28 localhost systemd-modules-load[206]: Inserted module 'pkey'
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_alua' is built in
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_emc' is built in
Nov 29 04:55:28 localhost systemd-vconsole-setup[210]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_rdac' is built in
Nov 29 04:55:28 localhost systemd-sysusers[216]: Creating group 'systemd-journal' with GID 190.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-sysusers.service - Create System Users.
Nov 29 04:55:28 localhost systemd[1]: Starting systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev...
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.
Nov 29 04:55:28 localhost systemd[1]: Reached target local-fs-pre.target - Preparation for Local File Systems.
Nov 29 04:55:28 localhost systemd[1]: Reached target local-fs.target - Local File Systems.
Nov 29 04:55:28 localhost systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories...
Nov 29 04:55:28 localhost systemd-tmpfiles[227]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories.
Nov 29 04:55:28 localhost systemd-vconsole-setup[207]: Setting source virtual console failed, ignoring remaining ones.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:28 localhost systemd[1]: Starting dracut-cmdline-ask.service - dracut ask for additional cmdline parameters...
Nov 29 04:55:28 localhost systemd[1]: Finished dracut-cmdline-ask.service - dracut ask for additional cmdline parameters.
Nov 29 04:55:28 localhost systemd[1]: Starting dracut-cmdline.service - dracut cmdline hook...
Nov 29 04:55:28 localhost dracut-cmdline[244]: dracut-102-3.fc41
Nov 29 04:55:28 localhost dracut-cmdline[244]: Using kernel command line parameters: rd.zfcp.conf=0 rd.lvm.lv=sysvg/root console=tty1 console=ttyS0,115200n8 memhp_default_state=online_movable nr_cpus=1 cgroup_disable=memory numa=off udev.children-max=2 panic=10 transparent_hugepage=never novmcoredd vmcp_cma=0 cma=0 hugetlb_cma=0
Nov 29 04:55:29 localhost dracut-cmdline[244]: cio_ignored disabled on commandline
Nov 29 04:55:29 localhost dracut-cmdline[338]: rm: cannot remove '/etc/zfcp.conf': No such file or directory
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-cmdline.service - dracut cmdline hook.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-udev.service - dracut pre-udev hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-udev.service - dracut pre-udev hook.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-udevd.service - Rule-based Manager for Device Events and Files...
Nov 29 04:55:29 localhost systemd-udevd[394]: Using default interface naming scheme 'v255'.
Nov 29 04:55:29 localhost systemd[1]: Started systemd-udevd.service - Rule-based Manager for Device Events and Files.
Nov 29 04:55:29 localhost systemd[1]: dracut-pre-trigger.service - dracut pre-trigger hook was skipped because no trigger condition checks were met.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices...
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-udev-trigger.service - Coldplug All udev Devices.
Nov 29 04:55:29 localhost systemd[1]: Created slice system-modprobe.slice - Slice /system/modprobe.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-initqueue.service - dracut initqueue hook...
Nov 29 04:55:29 localhost systemd[1]: Starting modprobe@configfs.service - Load Kernel Module configfs...
Nov 29 04:55:29 localhost systemd[1]: modprobe@configfs.service: Deactivated successfully.
Nov 29 04:55:29 localhost systemd[1]: Finished modprobe@configfs.service - Load Kernel Module configfs.
Nov 29 04:55:29 localhost systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
Nov 29 04:55:29 localhost systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:29 localhost systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup...
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:29 localhost dracut-initqueue[485]: Scanning devices vda2 for LVM logical volumes sysvg/root
Nov 29 04:55:29 localhost dracut-initqueue[485]: sysvg/root linear
Nov 29 04:55:29 localhost systemd[1]: Found device dev-mapper-sysvg\x2droot.device - /dev/mapper/sysvg-root.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-root-device.target - Initrd Root Device.
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-initqueue.service - dracut initqueue hook.
Nov 29 04:55:29 localhost systemd[1]: Reached target remote-fs-pre.target - Preparation for Remote File Systems.
Nov 29 04:55:29 localhost systemd[1]: Reached target remote-fs.target - Remote File Systems.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-mount.service - dracut pre-mount hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-mount.service - dracut pre-mount hook.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-fsck-root.service - File System Check on /dev/mapper/sysvg-root...
Nov 29 04:55:29 localhost systemd-fsck[531]: /usr/sbin/fsck.xfs: XFS file system.
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-fsck-root.service - File System Check on /dev/mapper/sysvg-root.
Nov 29 04:55:29 localhost systemd[1]: Mounting sys-kernel-config.mount - Kernel Configuration File System...
Nov 29 04:55:29 localhost systemd[1]: Mounting sysroot.mount - /sysroot...
Nov 29 04:55:29 localhost systemd[1]: Mounted sys-kernel-config.mount - Kernel Configuration File System.
Nov 29 04:55:29 localhost systemd[1]: Reached target sysinit.target - System Initialization.
Nov 29 04:55:29 localhost systemd[1]: Reached target basic.target - Basic System.
Nov 29 04:55:29 localhost systemd[1]: System is tainted: unmerged-bin
Nov 29 04:55:29 localhost systemd[1]: Mounted sysroot.mount - /sysroot.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-root-fs.target - Initrd Root File System.
Nov 29 04:55:29 localhost systemd[1]: initrd-parse-etc.service - Mountpoints Configured in the Real Root was skipped because of an unmet condition check (ConditionPathExists=!/proc/vmcore).
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-fs.target - Initrd File Systems.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd.target - Initrd Default Target.
Nov 29 04:55:29 localhost systemd[1]: dracut-mount.service - dracut mount hook was skipped because no trigger condition checks were met.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-pivot.service - dracut pre-pivot and cleanup hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-pivot.service - dracut pre-pivot and cleanup hook.
Nov 29 04:55:29 localhost systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service...
Nov 29 04:55:29 localhost kdump[574]: Kdump is using the default log level(3).
Nov 29 04:55:30 localhost kdump[618]: saving to /sysroot/var/crash/127.0.0.1-2024-11-29-04:55:29/
Nov 29 04:55:30 localhost kdump[623]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2024-11-29-04:55:29/
Nov 29 04:55:30 localhost kdump[629]: saving vmcore-dmesg.txt complete
Nov 29 04:55:30 localhost kdump[631]: saving vmcore
Observe how the virtio_mem driver is loaded along with the other drivers that are essential for
any kdump activity.
> I personally think if we could,
> we had better make code as robust as possible.
And that's where we disagree. We will make is less robust if we don't warn the user that something
very unexpected is happening, treating it like it would be normal.
It isn't normal.
> Here, since we have
> already integrated the lock with one vmcore_mutex, whether the vmcoredd
> is added before or after /proc/vmcore opening/closing, it won't harm.
> The benefit is it works well with the current kwown case, virtio-mem
> probing, makedumpfile running. And it also works well with other
> possible cases, e.g if you doubt virtio-mem dumping and want to debug,
> you set rd.break or take any way to drop into emengency shell of kdump
> kernel, you can play it to poke virtio-mem module again and run makedumpfile
> manually or reversing the order or taking any testing.
Intercepting kdump to trigger it manually will just work as it used to.
Having developed and debugged the virtio_mem driver in kdump mode, I did not once even
were tempted to do any of that, and looking back it would not have been any helpful ;)
Crashing in a VM is easy+cheap, rebooting in a VM is easy+cheap, recompiling virtio_mem +
rebuilding the kdump initrd is easy and cheap.
> Then kernel
> implementation should not preset that user space have to run the
> makedumpfile after the much earlier virtio-mem probing. If we implement
> codes according to preset userspace scenario, that limit the future
> adapttion, IMHO.
It limits the userspace scenario to something that is reasonable and future proof:
start working on the vmcore once the system is sufficiently initialized
(basic driver loaded).
Then have it be immutable, just like it always was before we started allowing to patch it.
The only "alterantive" I see is allow modifying the vmcore after it was closed again,
but still issue a warning if that happens.
But again, I don't really see any reasonable use case for that.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-11-29 10:38 ` David Hildenbrand
@ 2024-12-03 10:42 ` Baoquan He
2024-12-03 10:51 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-12-03 10:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/29/24 at 11:38am, David Hildenbrand wrote:
> On 25.11.24 15:41, Baoquan He wrote:
> > On 11/22/24 at 10:30am, David Hildenbrand wrote:
> > > On 22.11.24 10:16, Baoquan He wrote:
> > > > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > > ......snip...
> > > > > @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> > > > > return -EINVAL;
> > > > > }
> > > > > + /* We'll recheck under lock later. */
> > > > > + if (data_race(vmcore_opened))
> > > > > + return -EBUSY;
> > > >
> > >
> > > Hi,
> > >
> > > > As I commented to patch 7, if vmcore is opened and closed after
> > > > checking, do we need to give up any chance to add device dumping
> > > > as below?
> > > >
> > > > fd = open(/proc/vmcore);
> > > > ...do checking;
> > > > close(fd);
> > > >
> > > > quit any device dump adding;
> > > >
> > > > run makedumpfile on s390;
> > > > ->fd = open(/proc/vmcore);
> > > > -> try to dump;
> > > > ->close(fd);
> > >
> > > The only reasonable case where this could happen (with virtio_mem) would be
> > > when you hotplug a virtio-mem device into a VM that is currently in the
> > > kdump kernel. However, in this case, the device would not provide any memory
> > > we want to dump:
> > >
> > > (1) The memory was not available to the 1st (crashed) kernel, because
> > > the device got hotplugged later.
> > > (2) Hotplugged virtio-mem devices show up with "no plugged memory",
> > > meaning there wouldn't be even something to dump.
> > >
> > > Drivers will be loaded (as part of the kernel or as part of the initrd)
> > > before any kdump action is happening. Similarly, just imagine your NIC
> > > driver not being loaded when you start dumping to a network share ...
> > >
> > > This should similarly apply to vmcoredd providers.
> > >
> > > There is another concern I had at some point with changing the effective
> > > /proc/vmcore size after someone already opened it, and might assume the size
> > > will stay unmodified (IOW, the file was completely static before vmcoredd
> > > showed up).
> > >
> > > So unless there is a real use case that requires tracking whether the file
> > > is no longer open, to support modifying the vmcore afterwards, we should
> > > keep it simple.
> > >
> > > I am not aware of any such cases, and my experiments with virtio_mem showed
> > > that the driver get loaded extremely early from the initrd, compared to when
> > > we actually start messing with /proc/vmcore from user space.
It's OK, David, I don't have strong opinion about the current
implementation. I raised this concern because
1) I saw the original vmcoredd only warn when doing register if
vmcore_opened is true;
2) in patch 1, it says vmcore_mutex is introduced to protect vmcore
modifications from concurrent opening. If we are confident, the old
vmcoredd_mutex can guarantee it, I could be wrong here.
Anyway, it's just a tiny concern, I believe it won't cause issue at
present. So it's up to you.
Thanks
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
2024-12-03 10:42 ` Baoquan He
@ 2024-12-03 10:51 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-12-03 10:51 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 03.12.24 11:42, Baoquan He wrote:
> On 11/29/24 at 11:38am, David Hildenbrand wrote:
>> On 25.11.24 15:41, Baoquan He wrote:
>>> On 11/22/24 at 10:30am, David Hildenbrand wrote:
>>>> On 22.11.24 10:16, Baoquan He wrote:
>>>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>>>> ......snip...
>>>>>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>>>> return -EINVAL;
>>>>>> }
>>>>>> + /* We'll recheck under lock later. */
>>>>>> + if (data_race(vmcore_opened))
>>>>>> + return -EBUSY;
>>>>>
>>>>
>>>> Hi,
>>>>
>>>>> As I commented to patch 7, if vmcore is opened and closed after
>>>>> checking, do we need to give up any chance to add device dumping
>>>>> as below?
>>>>>
>>>>> fd = open(/proc/vmcore);
>>>>> ...do checking;
>>>>> close(fd);
>>>>>
>>>>> quit any device dump adding;
>>>>>
>>>>> run makedumpfile on s390;
>>>>> ->fd = open(/proc/vmcore);
>>>>> -> try to dump;
>>>>> ->close(fd);
>>>>
>>>> The only reasonable case where this could happen (with virtio_mem) would be
>>>> when you hotplug a virtio-mem device into a VM that is currently in the
>>>> kdump kernel. However, in this case, the device would not provide any memory
>>>> we want to dump:
>>>>
>>>> (1) The memory was not available to the 1st (crashed) kernel, because
>>>> the device got hotplugged later.
>>>> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
>>>> meaning there wouldn't be even something to dump.
>>>>
>>>> Drivers will be loaded (as part of the kernel or as part of the initrd)
>>>> before any kdump action is happening. Similarly, just imagine your NIC
>>>> driver not being loaded when you start dumping to a network share ...
>>>>
>>>> This should similarly apply to vmcoredd providers.
>>>>
>>>> There is another concern I had at some point with changing the effective
>>>> /proc/vmcore size after someone already opened it, and might assume the size
>>>> will stay unmodified (IOW, the file was completely static before vmcoredd
>>>> showed up).
>>>>
>>>> So unless there is a real use case that requires tracking whether the file
>>>> is no longer open, to support modifying the vmcore afterwards, we should
>>>> keep it simple.
>>>>
>>>> I am not aware of any such cases, and my experiments with virtio_mem showed
>>>> that the driver get loaded extremely early from the initrd, compared to when
>>>> we actually start messing with /proc/vmcore from user space.
>
> It's OK, David, I don't have strong opinion about the current
> implementation. I raised this concern because
>
> 1) I saw the original vmcoredd only warn when doing register if
> vmcore_opened is true;
>
> 2) in patch 1, it says vmcore_mutex is introduced to protect vmcore
> modifications from concurrent opening. If we are confident, the old
> vmcoredd_mutex can guarantee it, I could be wrong here.
I don't see how. We're protecting the list, but not the
vmcoredd_update_size(), which modifies sizes, offsets and all that.
>
> Anyway, it's just a tiny concern, I believe it won't cause issue at
> present. So it's up to you.
I can keep allowing to add stuff after the vmcore was opened and closed
again (but not while it is open). It doesn't make any sense IMHO, but it
seems to be easy to add.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (2 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-15 9:44 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 05/11] fs/proc/vmcore: factor out allocating a vmcore memory node David Hildenbrand
` (8 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
These defines are not related to /proc/kcore, move them to crash_dump.h
instead. While at it, rename "struct vmcore" to "struct
vmcore_mem_node", which is a more fitting name.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 20 ++++++++++----------
include/linux/crash_dump.h | 13 +++++++++++++
include/linux/kcore.h | 13 -------------
3 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 6371dbaa21be..47652df95202 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -304,10 +304,10 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
*/
static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
{
+ struct vmcore_mem_node *m = NULL;
ssize_t acc = 0, tmp;
size_t tsz;
u64 start;
- struct vmcore *m = NULL;
if (!iov_iter_count(iter) || *fpos >= vmcore_size)
return 0;
@@ -560,8 +560,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
{
size_t size = vma->vm_end - vma->vm_start;
+ struct vmcore_mem_node *m;
u64 start, end, len, tsz;
- struct vmcore *m;
start = (u64)vma->vm_pgoff << PAGE_SHIFT;
end = start + size;
@@ -683,16 +683,16 @@ static const struct proc_ops vmcore_proc_ops = {
.proc_mmap = mmap_vmcore,
};
-static struct vmcore* __init get_new_element(void)
+static struct vmcore_mem_node * __init get_new_element(void)
{
- return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
+ return kzalloc(sizeof(struct vmcore_mem_node), GFP_KERNEL);
}
static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
struct list_head *vc_list)
{
+ struct vmcore_mem_node *m;
u64 size;
- struct vmcore *m;
size = elfsz + elfnotesegsz;
list_for_each_entry(m, vc_list, list) {
@@ -1090,11 +1090,11 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_mem_node *new;
int i;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;
ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
@@ -1133,11 +1133,11 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_mem_node *new;
int i;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr *phdr_ptr;
loff_t vmcore_off;
- struct vmcore *new;
ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
@@ -1175,8 +1175,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
struct list_head *vc_list)
{
+ struct vmcore_mem_node *m;
loff_t vmcore_off;
- struct vmcore *m;
/* Skip ELF header, program headers and ELF note segment. */
vmcore_off = elfsz + elfnotes_sz;
@@ -1587,9 +1587,9 @@ void vmcore_cleanup(void)
/* clear the vmcore list. */
while (!list_empty(&vmcore_list)) {
- struct vmcore *m;
+ struct vmcore_mem_node *m;
- m = list_first_entry(&vmcore_list, struct vmcore, list);
+ m = list_first_entry(&vmcore_list, struct vmcore_mem_node, list);
list_del(&m->list);
kfree(m);
}
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index acc55626afdc..5e48ab12c12b 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -114,10 +114,23 @@ struct vmcore_cb {
extern void register_vmcore_cb(struct vmcore_cb *cb);
extern void unregister_vmcore_cb(struct vmcore_cb *cb);
+struct vmcore_mem_node {
+ struct list_head list;
+ unsigned long long paddr;
+ unsigned long long size;
+ loff_t offset;
+};
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
+struct vmcoredd_node {
+ struct list_head list; /* List of dumps */
+ void *buf; /* Buffer containing device's dump */
+ unsigned int size; /* Size of the buffer */
+};
+
/* Device Dump information to be filled by drivers */
struct vmcoredd_data {
char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 86c0f1d18998..9a2fa013c91d 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -20,19 +20,6 @@ struct kcore_list {
int type;
};
-struct vmcore {
- struct list_head list;
- unsigned long long paddr;
- unsigned long long size;
- loff_t offset;
-};
-
-struct vmcoredd_node {
- struct list_head list; /* List of dumps */
- void *buf; /* Buffer containing device's dump */
- unsigned int size; /* Size of the buffer */
-};
-
#ifdef CONFIG_PROC_KCORE
void __init kclist_add(struct kcore_list *, void *, size_t, int type);
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-10-25 15:11 ` [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h David Hildenbrand
@ 2024-11-15 9:44 ` Baoquan He
2024-11-15 9:59 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-15 9:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> These defines are not related to /proc/kcore, move them to crash_dump.h
> instead. While at it, rename "struct vmcore" to "struct
> vmcore_mem_node", which is a more fitting name.
Agree it's inappropriate to put the defintions in kcore.h. However for
'struct vmcore', it's only used in fs/proc/vmcore.c from my code
serching, do you think if we can put it in fs/proc/vmcore.c directly?
And 'struct vmcoredd_node' too.
And about the renaming, with my understanding each instance of struct
vmcore represents one memory region, isn't it a little confusing to be
called vmcore_mem_node? I understand you probablly want to unify the
vmcore and vmcoredd's naming. I have to admit I don't know vmcoredd well
and its naming, while most of people have been knowing vmcore representing
memory region very well.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/vmcore.c | 20 ++++++++++----------
> include/linux/crash_dump.h | 13 +++++++++++++
> include/linux/kcore.h | 13 -------------
> 3 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 6371dbaa21be..47652df95202 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -304,10 +304,10 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
> */
> static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
> {
> + struct vmcore_mem_node *m = NULL;
> ssize_t acc = 0, tmp;
> size_t tsz;
> u64 start;
> - struct vmcore *m = NULL;
>
> if (!iov_iter_count(iter) || *fpos >= vmcore_size)
> return 0;
> @@ -560,8 +560,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
> static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
> {
> size_t size = vma->vm_end - vma->vm_start;
> + struct vmcore_mem_node *m;
> u64 start, end, len, tsz;
> - struct vmcore *m;
>
> start = (u64)vma->vm_pgoff << PAGE_SHIFT;
> end = start + size;
> @@ -683,16 +683,16 @@ static const struct proc_ops vmcore_proc_ops = {
> .proc_mmap = mmap_vmcore,
> };
>
> -static struct vmcore* __init get_new_element(void)
> +static struct vmcore_mem_node * __init get_new_element(void)
> {
> - return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
> + return kzalloc(sizeof(struct vmcore_mem_node), GFP_KERNEL);
> }
>
> static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
> struct list_head *vc_list)
> {
> + struct vmcore_mem_node *m;
> u64 size;
> - struct vmcore *m;
>
> size = elfsz + elfnotesegsz;
> list_for_each_entry(m, vc_list, list) {
> @@ -1090,11 +1090,11 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> size_t elfnotes_sz,
> struct list_head *vc_list)
> {
> + struct vmcore_mem_node *new;
> int i;
> Elf64_Ehdr *ehdr_ptr;
> Elf64_Phdr *phdr_ptr;
> loff_t vmcore_off;
> - struct vmcore *new;
>
> ehdr_ptr = (Elf64_Ehdr *)elfptr;
> phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
> @@ -1133,11 +1133,11 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> size_t elfnotes_sz,
> struct list_head *vc_list)
> {
> + struct vmcore_mem_node *new;
> int i;
> Elf32_Ehdr *ehdr_ptr;
> Elf32_Phdr *phdr_ptr;
> loff_t vmcore_off;
> - struct vmcore *new;
>
> ehdr_ptr = (Elf32_Ehdr *)elfptr;
> phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
> @@ -1175,8 +1175,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
> struct list_head *vc_list)
> {
> + struct vmcore_mem_node *m;
> loff_t vmcore_off;
> - struct vmcore *m;
>
> /* Skip ELF header, program headers and ELF note segment. */
> vmcore_off = elfsz + elfnotes_sz;
> @@ -1587,9 +1587,9 @@ void vmcore_cleanup(void)
>
> /* clear the vmcore list. */
> while (!list_empty(&vmcore_list)) {
> - struct vmcore *m;
> + struct vmcore_mem_node *m;
>
> - m = list_first_entry(&vmcore_list, struct vmcore, list);
> + m = list_first_entry(&vmcore_list, struct vmcore_mem_node, list);
> list_del(&m->list);
> kfree(m);
> }
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index acc55626afdc..5e48ab12c12b 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -114,10 +114,23 @@ struct vmcore_cb {
> extern void register_vmcore_cb(struct vmcore_cb *cb);
> extern void unregister_vmcore_cb(struct vmcore_cb *cb);
>
> +struct vmcore_mem_node {
> + struct list_head list;
> + unsigned long long paddr;
> + unsigned long long size;
> + loff_t offset;
> +};
> +
> #else /* !CONFIG_CRASH_DUMP */
> static inline bool is_kdump_kernel(void) { return false; }
> #endif /* CONFIG_CRASH_DUMP */
>
> +struct vmcoredd_node {
> + struct list_head list; /* List of dumps */
> + void *buf; /* Buffer containing device's dump */
> + unsigned int size; /* Size of the buffer */
> +};
> +
> /* Device Dump information to be filled by drivers */
> struct vmcoredd_data {
> char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */
> diff --git a/include/linux/kcore.h b/include/linux/kcore.h
> index 86c0f1d18998..9a2fa013c91d 100644
> --- a/include/linux/kcore.h
> +++ b/include/linux/kcore.h
> @@ -20,19 +20,6 @@ struct kcore_list {
> int type;
> };
>
> -struct vmcore {
> - struct list_head list;
> - unsigned long long paddr;
> - unsigned long long size;
> - loff_t offset;
> -};
> -
> -struct vmcoredd_node {
> - struct list_head list; /* List of dumps */
> - void *buf; /* Buffer containing device's dump */
> - unsigned int size; /* Size of the buffer */
> -};
> -
> #ifdef CONFIG_PROC_KCORE
> void __init kclist_add(struct kcore_list *, void *, size_t, int type);
>
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-11-15 9:44 ` Baoquan He
@ 2024-11-15 9:59 ` David Hildenbrand
2024-11-20 9:42 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-15 9:59 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 15.11.24 10:44, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> These defines are not related to /proc/kcore, move them to crash_dump.h
>> instead. While at it, rename "struct vmcore" to "struct
>> vmcore_mem_node", which is a more fitting name.
>
> Agree it's inappropriate to put the defintions in kcore.h. However for
> 'struct vmcore', it's only used in fs/proc/vmcore.c from my code
> serching, do you think if we can put it in fs/proc/vmcore.c directly?
> And 'struct vmcoredd_node' too.
See the next patches and how virtio-mem will make use of the feactored
out functions. Not putting them as inline functions into a header will
require exporting symbols just do add a vmcore memory node to the list,
which I want to avoid -- overkill for these simple helpers.
>
> And about the renaming, with my understanding each instance of struct
> vmcore represents one memory region, isn't it a little confusing to be
> called vmcore_mem_node? I understand you probablly want to unify the
> vmcore and vmcoredd's naming. I have to admit I don't know vmcoredd well
> and its naming, while most of people have been knowing vmcore representing
> memory region very well.
I chose "vmcore_mem_node" because it is a memory range stored in a list.
Note the symmetry with "vmcoredd_node"
If there are strong feelings I can use a different name, but
"vmcore_mem_node" really describes what it actually is. Especially now
that we have different vmcore nodes.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-11-15 9:59 ` David Hildenbrand
@ 2024-11-20 9:42 ` Baoquan He
2024-11-20 10:28 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-20 9:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/15/24 at 10:59am, David Hildenbrand wrote:
> On 15.11.24 10:44, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > These defines are not related to /proc/kcore, move them to crash_dump.h
> > > instead. While at it, rename "struct vmcore" to "struct
> > > vmcore_mem_node", which is a more fitting name.
> >
> > Agree it's inappropriate to put the defintions in kcore.h. However for
> > 'struct vmcore', it's only used in fs/proc/vmcore.c from my code
> > serching, do you think if we can put it in fs/proc/vmcore.c directly?
> > And 'struct vmcoredd_node' too.
>
> See the next patches and how virtio-mem will make use of the feactored out
> functions. Not putting them as inline functions into a header will require
> exporting symbols just do add a vmcore memory node to the list, which I want
> to avoid -- overkill for these simple helpers.
I see. It makes sense to put them in crash_dump.h. Thanks for
explanation.
>
> >
> > And about the renaming, with my understanding each instance of struct
> > vmcore represents one memory region, isn't it a little confusing to be
> > called vmcore_mem_node? I understand you probablly want to unify the
> > vmcore and vmcoredd's naming. I have to admit I don't know vmcoredd well
> > and its naming, while most of people have been knowing vmcore representing
> > memory region very well.
>
> I chose "vmcore_mem_node" because it is a memory range stored in a list.
> Note the symmetry with "vmcoredd_node"
I would say the justification of naming "vmcore_mem_node" is to keep
symmetry with "vmcoredd_node". If because it is a memory range, it really
should not be called vmcore_mem_node. As we know, memory node has
specific meaning in kernel, it's the memory range existing on a NUMA node.
And vmcoredd is not a widely used feature. At least in fedora/RHEL, we
leave it to customers themselves to use and handle, we don't support it.
And we add 'novmcoredd' to kdump kernel cmdline by default to disable it
in fedora/RHEL. So a rarely used feature should not be taken to decide
the naming of a mature and and widely used feature's name. My personal
opinion.
>
> If there are strong feelings I can use a different name, but
Yes, I would suggest we better keep the old name or take a more
appropriate one if have to change.
> "vmcore_mem_node" really describes what it actually is. Especially now that
> we have different vmcore nodes.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-11-20 9:42 ` Baoquan He
@ 2024-11-20 10:28 ` David Hildenbrand
2024-11-21 4:35 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-20 10:28 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 20.11.24 10:42, Baoquan He wrote:
> On 11/15/24 at 10:59am, David Hildenbrand wrote:
>> On 15.11.24 10:44, Baoquan He wrote:
>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>>> These defines are not related to /proc/kcore, move them to crash_dump.h
>>>> instead. While at it, rename "struct vmcore" to "struct
>>>> vmcore_mem_node", which is a more fitting name.
>>>
>>> Agree it's inappropriate to put the defintions in kcore.h. However for
>>> 'struct vmcore', it's only used in fs/proc/vmcore.c from my code
>>> serching, do you think if we can put it in fs/proc/vmcore.c directly?
>>> And 'struct vmcoredd_node' too.
>>
>> See the next patches and how virtio-mem will make use of the feactored out
>> functions. Not putting them as inline functions into a header will require
>> exporting symbols just do add a vmcore memory node to the list, which I want
>> to avoid -- overkill for these simple helpers.
>
> I see. It makes sense to put them in crash_dump.h. Thanks for
> explanation.
>
I'll add these details to the description.
>>
>>>
>>> And about the renaming, with my understanding each instance of struct
>>> vmcore represents one memory region, isn't it a little confusing to be
>>> called vmcore_mem_node? I understand you probablly want to unify the
>>> vmcore and vmcoredd's naming. I have to admit I don't know vmcoredd well
>>> and its naming, while most of people have been knowing vmcore representing
>>> memory region very well.
>>
>> I chose "vmcore_mem_node" because it is a memory range stored in a list.
>> Note the symmetry with "vmcoredd_node"
>
> I would say the justification of naming "vmcore_mem_node" is to keep
> symmetry with "vmcoredd_node". If because it is a memory range, it really
> should not be called vmcore_mem_node. As we know, memory node has
> specific meaning in kernel, it's the memory range existing on a NUMA node.
>
> And vmcoredd is not a widely used feature. At least in fedora/RHEL, we
> leave it to customers themselves to use and handle, we don't support it.
> And we add 'novmcoredd' to kdump kernel cmdline by default to disable it
> in fedora/RHEL. So a rarely used feature should not be taken to decide
> the naming of a mature and and widely used feature's name. My personal
> opinion.
It's a memory range that gets added to a list. So it's a node in a list
... representing a memory range. :) I don't particularly care about the
"node" part here.
The old "struct vmcore" name is misleading: makes one believe it somehow
represents "/proc/vmcore", but it really doesn't. (see below on function
naming)
>
>>
>> If there are strong feelings I can use a different name, but
>
> Yes, I would suggest we better keep the old name or take a more
> appropriate one if have to change.
In light of patch #5 and #6, really only something like
"vmcore_mem_node" makes sense. Alternatively "vmcore_range" or
"vmcore_mem_range".
Leaving it as "struct vmcore" would mean that we had to do in #5 and #6:
* vmcore_alloc_add_mem_node() -> vmcore_alloc_add()
* vmcore_free_mem_nodes() -> vmcore_free()
Which would *really* be misleading, because we are not "freeing" the vmcore.
Would "vmcore_range" work for you? Then we could do:
* vmcore_alloc_add_mem_node() -> vmcore_alloc_add_range()
* vmcore_free_mem_nodes() -> vmcore_free_ranges()
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-11-20 10:28 ` David Hildenbrand
@ 2024-11-21 4:35 ` Baoquan He
2024-11-21 15:37 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-21 4:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/20/24 at 11:28am, David Hildenbrand wrote:
> On 20.11.24 10:42, Baoquan He wrote:
> > On 11/15/24 at 10:59am, David Hildenbrand wrote:
> > > On 15.11.24 10:44, Baoquan He wrote:
> > > > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > > > These defines are not related to /proc/kcore, move them to crash_dump.h
> > > > > instead. While at it, rename "struct vmcore" to "struct
> > > > > vmcore_mem_node", which is a more fitting name.
> > > >
> > > > Agree it's inappropriate to put the defintions in kcore.h. However for
> > > > 'struct vmcore', it's only used in fs/proc/vmcore.c from my code
> > > > serching, do you think if we can put it in fs/proc/vmcore.c directly?
> > > > And 'struct vmcoredd_node' too.
> > >
> > > See the next patches and how virtio-mem will make use of the feactored out
> > > functions. Not putting them as inline functions into a header will require
> > > exporting symbols just do add a vmcore memory node to the list, which I want
> > > to avoid -- overkill for these simple helpers.
> >
> > I see. It makes sense to put them in crash_dump.h. Thanks for
> > explanation.
> >
>
> I'll add these details to the description.
Thanks.
>
> > >
> > > >
> > > > And about the renaming, with my understanding each instance of struct
> > > > vmcore represents one memory region, isn't it a little confusing to be
> > > > called vmcore_mem_node? I understand you probablly want to unify the
> > > > vmcore and vmcoredd's naming. I have to admit I don't know vmcoredd well
> > > > and its naming, while most of people have been knowing vmcore representing
> > > > memory region very well.
> > >
> > > I chose "vmcore_mem_node" because it is a memory range stored in a list.
> > > Note the symmetry with "vmcoredd_node"
> >
> > I would say the justification of naming "vmcore_mem_node" is to keep
> > symmetry with "vmcoredd_node". If because it is a memory range, it really
> > should not be called vmcore_mem_node. As we know, memory node has
> > specific meaning in kernel, it's the memory range existing on a NUMA node.
> >
> > And vmcoredd is not a widely used feature. At least in fedora/RHEL, we
> > leave it to customers themselves to use and handle, we don't support it.
> > And we add 'novmcoredd' to kdump kernel cmdline by default to disable it
> > in fedora/RHEL. So a rarely used feature should not be taken to decide
> > the naming of a mature and and widely used feature's name. My personal
> > opinion.
>
> It's a memory range that gets added to a list. So it's a node in a list ...
> representing a memory range. :) I don't particularly care about the "node"
> part here.
Ah, I missed that about list node. There are list items, list entries
and list nodes, I didn't think of list node at tht time.
>
> The old "struct vmcore" name is misleading: makes one believe it somehow
> represents "/proc/vmcore", but it really doesn't. (see below on function
> naming)
Yeah, agree. struct vmcore is a concept of the whole logical file.
>
> >
> > >
> > > If there are strong feelings I can use a different name, but
> >
> > Yes, I would suggest we better keep the old name or take a more
> > appropriate one if have to change.
>
> In light of patch #5 and #6, really only something like "vmcore_mem_node"
> makes sense. Alternatively "vmcore_range" or "vmcore_mem_range".
>
> Leaving it as "struct vmcore" would mean that we had to do in #5 and #6:
>
> * vmcore_alloc_add_mem_node() -> vmcore_alloc_add()
> * vmcore_free_mem_nodes() -> vmcore_free()
>
> Which would *really* be misleading, because we are not "freeing" the vmcore.
>
> Would "vmcore_range" work for you? Then we could do:
>
> * vmcore_alloc_add_mem_node() -> vmcore_alloc_add_range()
> * vmcore_free_mem_nodes() -> vmcore_free_ranges()
Yeah, vmcore_range is better, which won't cause misunderstanding.
Thanks.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
2024-11-21 4:35 ` Baoquan He
@ 2024-11-21 15:37 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-11-21 15:37 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
>>>> If there are strong feelings I can use a different name, but
>>>
>>> Yes, I would suggest we better keep the old name or take a more
>>> appropriate one if have to change.
>>
>> In light of patch #5 and #6, really only something like "vmcore_mem_node"
>> makes sense. Alternatively "vmcore_range" or "vmcore_mem_range".
>>
>> Leaving it as "struct vmcore" would mean that we had to do in #5 and #6:
>>
>> * vmcore_alloc_add_mem_node() -> vmcore_alloc_add()
>> * vmcore_free_mem_nodes() -> vmcore_free()
>>
>> Which would *really* be misleading, because we are not "freeing" the vmcore.
>>
>> Would "vmcore_range" work for you? Then we could do:
>>
>> * vmcore_alloc_add_mem_node() -> vmcore_alloc_add_range()
>> * vmcore_free_mem_nodes() -> vmcore_free_ranges()
>
> Yeah, vmcore_range is better, which won't cause misunderstanding.
> Thanks.
>
Thanks, I'll use that and adjust patch #5 and #6, keeping your ACKs.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 05/11] fs/proc/vmcore: factor out allocating a vmcore memory node
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (3 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 04/11] fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-20 9:45 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 06/11] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
` (7 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 21 ++-------------------
include/linux/crash_dump.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 47652df95202..76fdc3fb8c0e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -683,11 +683,6 @@ static const struct proc_ops vmcore_proc_ops = {
.proc_mmap = mmap_vmcore,
};
-static struct vmcore_mem_node * __init get_new_element(void)
-{
- return kzalloc(sizeof(struct vmcore_mem_node), GFP_KERNEL);
-}
-
static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
struct list_head *vc_list)
{
@@ -1090,7 +1085,6 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
- struct vmcore_mem_node *new;
int i;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr *phdr_ptr;
@@ -1113,13 +1107,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
size = end - start;
- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
+ if (vmcore_alloc_add_mem_node(vc_list, start, size))
return -ENOMEM;
- new->paddr = start;
- new->size = size;
- list_add_tail(&new->list, vc_list);
/* Update the program header offset. */
phdr_ptr->p_offset = vmcore_off + (paddr - start);
@@ -1133,7 +1122,6 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
size_t elfnotes_sz,
struct list_head *vc_list)
{
- struct vmcore_mem_node *new;
int i;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr *phdr_ptr;
@@ -1156,13 +1144,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
size = end - start;
- /* Add this contiguous chunk of memory to vmcore list.*/
- new = get_new_element();
- if (!new)
+ if (vmcore_alloc_add_mem_node(vc_list, start, size))
return -ENOMEM;
- new->paddr = start;
- new->size = size;
- list_add_tail(&new->list, vc_list);
/* Update the program header offset */
phdr_ptr->p_offset = vmcore_off + (paddr - start);
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 5e48ab12c12b..ae77049fc023 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -121,6 +121,20 @@ struct vmcore_mem_node {
loff_t offset;
};
+/* Allocate a vmcore memory node and add it to the list. */
+static inline int vmcore_alloc_add_mem_node(struct list_head *list,
+ unsigned long long paddr, unsigned long long size)
+{
+ struct vmcore_mem_node *m = kzalloc(sizeof(*m), GFP_KERNEL);
+
+ if (!m)
+ return -ENOMEM;
+ m->paddr = paddr;
+ m->size = size;
+ list_add_tail(&m->list, list);
+ return 0;
+}
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 05/11] fs/proc/vmcore: factor out allocating a vmcore memory node
2024-10-25 15:11 ` [PATCH v1 05/11] fs/proc/vmcore: factor out allocating a vmcore memory node David Hildenbrand
@ 2024-11-20 9:45 ` Baoquan He
0 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-20 9:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> Let's factor it out into include/linux/crash_dump.h, from where we can
> use it also outside of vmcore.c later.
LGTM,
Acked-by: Baoquan He <bhe@redhat.com>
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/vmcore.c | 21 ++-------------------
> include/linux/crash_dump.h | 14 ++++++++++++++
> 2 files changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 47652df95202..76fdc3fb8c0e 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -683,11 +683,6 @@ static const struct proc_ops vmcore_proc_ops = {
> .proc_mmap = mmap_vmcore,
> };
>
> -static struct vmcore_mem_node * __init get_new_element(void)
> -{
> - return kzalloc(sizeof(struct vmcore_mem_node), GFP_KERNEL);
> -}
> -
> static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
> struct list_head *vc_list)
> {
> @@ -1090,7 +1085,6 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> size_t elfnotes_sz,
> struct list_head *vc_list)
> {
> - struct vmcore_mem_node *new;
> int i;
> Elf64_Ehdr *ehdr_ptr;
> Elf64_Phdr *phdr_ptr;
> @@ -1113,13 +1107,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
> end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
> size = end - start;
>
> - /* Add this contiguous chunk of memory to vmcore list.*/
> - new = get_new_element();
> - if (!new)
> + if (vmcore_alloc_add_mem_node(vc_list, start, size))
> return -ENOMEM;
> - new->paddr = start;
> - new->size = size;
> - list_add_tail(&new->list, vc_list);
>
> /* Update the program header offset. */
> phdr_ptr->p_offset = vmcore_off + (paddr - start);
> @@ -1133,7 +1122,6 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> size_t elfnotes_sz,
> struct list_head *vc_list)
> {
> - struct vmcore_mem_node *new;
> int i;
> Elf32_Ehdr *ehdr_ptr;
> Elf32_Phdr *phdr_ptr;
> @@ -1156,13 +1144,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
> end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
> size = end - start;
>
> - /* Add this contiguous chunk of memory to vmcore list.*/
> - new = get_new_element();
> - if (!new)
> + if (vmcore_alloc_add_mem_node(vc_list, start, size))
> return -ENOMEM;
> - new->paddr = start;
> - new->size = size;
> - list_add_tail(&new->list, vc_list);
>
> /* Update the program header offset */
> phdr_ptr->p_offset = vmcore_off + (paddr - start);
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 5e48ab12c12b..ae77049fc023 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -121,6 +121,20 @@ struct vmcore_mem_node {
> loff_t offset;
> };
>
> +/* Allocate a vmcore memory node and add it to the list. */
> +static inline int vmcore_alloc_add_mem_node(struct list_head *list,
> + unsigned long long paddr, unsigned long long size)
> +{
> + struct vmcore_mem_node *m = kzalloc(sizeof(*m), GFP_KERNEL);
> +
> + if (!m)
> + return -ENOMEM;
> + m->paddr = paddr;
> + m->size = size;
> + list_add_tail(&m->list, list);
> + return 0;
> +}
> +
> #else /* !CONFIG_CRASH_DUMP */
> static inline bool is_kdump_kernel(void) { return false; }
> #endif /* CONFIG_CRASH_DUMP */
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 06/11] fs/proc/vmcore: factor out freeing a list of vmcore ranges
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (4 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 05/11] fs/proc/vmcore: factor out allocating a vmcore memory node David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-20 9:46 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
` (6 subsequent siblings)
12 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/vmcore.c | 9 +--------
include/linux/crash_dump.h | 11 +++++++++++
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 76fdc3fb8c0e..3e90416ee54e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -1568,14 +1568,7 @@ void vmcore_cleanup(void)
proc_vmcore = NULL;
}
- /* clear the vmcore list. */
- while (!list_empty(&vmcore_list)) {
- struct vmcore_mem_node *m;
-
- m = list_first_entry(&vmcore_list, struct vmcore_mem_node, list);
- list_del(&m->list);
- kfree(m);
- }
+ vmcore_free_mem_nodes(&vmcore_list);
free_elfcorebuf();
/* clear vmcore device dump list */
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index ae77049fc023..722dbcff7371 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -135,6 +135,17 @@ static inline int vmcore_alloc_add_mem_node(struct list_head *list,
return 0;
}
+/* Free a list of vmcore memory nodes. */
+static inline void vmcore_free_mem_nodes(struct list_head *list)
+{
+ struct vmcore_mem_node *m, *tmp;
+
+ list_for_each_entry_safe(m, tmp, list, list) {
+ list_del(&m->list);
+ kfree(m);
+ }
+}
+
#else /* !CONFIG_CRASH_DUMP */
static inline bool is_kdump_kernel(void) { return false; }
#endif /* CONFIG_CRASH_DUMP */
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 06/11] fs/proc/vmcore: factor out freeing a list of vmcore ranges
2024-10-25 15:11 ` [PATCH v1 06/11] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
@ 2024-11-20 9:46 ` Baoquan He
0 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-20 9:46 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> Let's factor it out into include/linux/crash_dump.h, from where we can
> use it also outside of vmcore.c later.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/vmcore.c | 9 +--------
> include/linux/crash_dump.h | 11 +++++++++++
> 2 files changed, 12 insertions(+), 8 deletions(-)
LGTM,
Acked-by: Baoquan He <bhe@redhat.com>
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 76fdc3fb8c0e..3e90416ee54e 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -1568,14 +1568,7 @@ void vmcore_cleanup(void)
> proc_vmcore = NULL;
> }
>
> - /* clear the vmcore list. */
> - while (!list_empty(&vmcore_list)) {
> - struct vmcore_mem_node *m;
> -
> - m = list_first_entry(&vmcore_list, struct vmcore_mem_node, list);
> - list_del(&m->list);
> - kfree(m);
> - }
> + vmcore_free_mem_nodes(&vmcore_list);
> free_elfcorebuf();
>
> /* clear vmcore device dump list */
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index ae77049fc023..722dbcff7371 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -135,6 +135,17 @@ static inline int vmcore_alloc_add_mem_node(struct list_head *list,
> return 0;
> }
>
> +/* Free a list of vmcore memory nodes. */
> +static inline void vmcore_free_mem_nodes(struct list_head *list)
> +{
> + struct vmcore_mem_node *m, *tmp;
> +
> + list_for_each_entry_safe(m, tmp, list, list) {
> + list_del(&m->list);
> + kfree(m);
> + }
> +}
> +
> #else /* !CONFIG_CRASH_DUMP */
> static inline bool is_kdump_kernel(void) { return false; }
> #endif /* CONFIG_CRASH_DUMP */
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (5 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 06/11] fs/proc/vmcore: factor out freeing a list of vmcore ranges David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-20 10:13 ` Baoquan He
2024-11-22 7:31 ` Baoquan He
2024-10-25 15:11 ` [PATCH v1 08/11] virtio-mem: mark device ready before registering callbacks in kdump mode David Hildenbrand
` (5 subsequent siblings)
12 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
the crashed kernel.
RAM provided by memory devices such as virtio-mem can only be detected
using the device driver; when vmcore_init() is called, these device
drivers are usually not loaded yet, or the devices did not get probed
yet. Consequently, on s390 these RAM ranges will not be included in
the crash dump, which makes the dump partially corrupt and is
unfortunate.
Instead of deferring the vmcore_init() call, to an (unclear?) later point,
let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
the device drivers probe the device and get access to this information.
Then, we'll add these ranges to the vmcore, adding more PT_LOAD
entries and updating the offsets+vmcore size.
Use Kconfig tricks to include this code automatically only if (a) there is
a device driver compiled that implements the callback
(PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
this information (NEED_PROC_VMCORE_DEVICE_RAM).
The current target use case is s390, which only creates an elf64
elfcore, so focusing on elf64 is sufficient.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
fs/proc/Kconfig | 25 ++++++
fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
include/linux/crash_dump.h | 9 +++
3 files changed, 190 insertions(+)
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index d80a1431ef7b..1e11de5f9380 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
as ELF notes to /proc/vmcore. You can still disable device
dump using the kernel command line option 'novmcoredd'.
+config PROVIDE_PROC_VMCORE_DEVICE_RAM
+ def_bool n
+
+config NEED_PROC_VMCORE_DEVICE_RAM
+ def_bool n
+
+config PROC_VMCORE_DEVICE_RAM
+ def_bool y
+ depends on PROC_VMCORE
+ depends on NEED_PROC_VMCORE_DEVICE_RAM
+ depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
+ help
+ If the elfcore hdr is allocated and prepared by the dump kernel
+ ("2nd kernel") instead of the crashed kernel, RAM provided by memory
+ devices such as virtio-mem will not be included in the dump
+ image, because only the device driver can properly detect them.
+
+ With this config enabled, these RAM ranges will be queried from the
+ device drivers once the device gets probed, so they can be included
+ in the crash dump.
+
+ Relevant architectures should select NEED_PROC_VMCORE_DEVICE_RAM
+ and relevant device drivers should select
+ PROVIDE_PROC_VMCORE_DEVICE_RAM.
+
config PROC_SYSCTL
bool "Sysctl support (/proc/sys)" if EXPERT
depends on PROC_FS
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3e90416ee54e..c332a9a4920b 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -69,6 +69,8 @@ static LIST_HEAD(vmcore_cb_list);
/* Whether the vmcore has been opened once. */
static bool vmcore_opened;
+static void vmcore_process_device_ram(struct vmcore_cb *cb);
+
void register_vmcore_cb(struct vmcore_cb *cb)
{
INIT_LIST_HEAD(&cb->next);
@@ -80,6 +82,8 @@ void register_vmcore_cb(struct vmcore_cb *cb)
*/
if (vmcore_opened)
pr_warn_once("Unexpected vmcore callback registration\n");
+ else if (cb->get_device_ram)
+ vmcore_process_device_ram(cb);
mutex_unlock(&vmcore_mutex);
}
EXPORT_SYMBOL_GPL(register_vmcore_cb);
@@ -1511,6 +1515,158 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
EXPORT_SYMBOL(vmcore_add_device_dump);
#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+static int vmcore_realloc_elfcore_buffer_elf64(size_t new_size)
+{
+ char *elfcorebuf_new;
+
+ if (WARN_ON_ONCE(new_size < elfcorebuf_sz))
+ return -EINVAL;
+ if (get_order(elfcorebuf_sz_orig) == get_order(new_size)) {
+ elfcorebuf_sz_orig = new_size;
+ return 0;
+ }
+
+ elfcorebuf_new = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(new_size));
+ if (!elfcorebuf_new)
+ return -ENOMEM;
+ memcpy(elfcorebuf_new, elfcorebuf, elfcorebuf_sz);
+ free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
+ elfcorebuf = elfcorebuf_new;
+ elfcorebuf_sz_orig = new_size;
+ return 0;
+}
+
+static void vmcore_reset_offsets_elf64(void)
+{
+ Elf64_Phdr *phdr_start = (Elf64_Phdr *)(elfcorebuf + sizeof(Elf64_Ehdr));
+ loff_t vmcore_off = elfcorebuf_sz + elfnotes_sz;
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfcorebuf;
+ Elf64_Phdr *phdr;
+ int i;
+
+ for (i = 0, phdr = phdr_start; i < ehdr->e_phnum; i++, phdr++) {
+ u64 start, end;
+
+ /*
+ * After merge_note_headers_elf64() we should only have a single
+ * PT_NOTE entry that starts immediately after elfcorebuf_sz.
+ */
+ if (phdr->p_type == PT_NOTE) {
+ phdr->p_offset = elfcorebuf_sz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz, PAGE_SIZE);
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off = vmcore_off + end - start;
+ }
+ set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
+}
+
+static int vmcore_add_device_ram_elf64(struct list_head *list, size_t count)
+{
+ Elf64_Phdr *phdr_start = (Elf64_Phdr *)(elfcorebuf + sizeof(Elf64_Ehdr));
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfcorebuf;
+ struct vmcore_mem_node *cur;
+ Elf64_Phdr *phdr;
+ size_t new_size;
+ int rc;
+
+ if ((Elf32_Half)(ehdr->e_phnum + count) != ehdr->e_phnum + count) {
+ pr_err("Kdump: too many device ram ranges\n");
+ return -ENOSPC;
+ }
+
+ /* elfcorebuf_sz must always cover full pages. */
+ new_size = sizeof(Elf64_Ehdr) +
+ (ehdr->e_phnum + count) * sizeof(Elf64_Phdr);
+ new_size = roundup(new_size, PAGE_SIZE);
+
+ /*
+ * Make sure we have sufficient space to include the new PT_LOAD
+ * entries.
+ */
+ rc = vmcore_realloc_elfcore_buffer_elf64(new_size);
+ if (rc) {
+ pr_err("Kdump: resizing elfcore failed\n");
+ return rc;
+ }
+
+ /* Modify our used elfcore buffer size to cover the new entries. */
+ elfcorebuf_sz = new_size;
+
+ /* Fill the added PT_LOAD entries. */
+ phdr = phdr_start + ehdr->e_phnum;
+ list_for_each_entry(cur, list, list) {
+ WARN_ON_ONCE(!IS_ALIGNED(cur->paddr | cur->size, PAGE_SIZE));
+ elfcorehdr_fill_device_ram_ptload_elf64(phdr, cur->paddr, cur->size);
+
+ /* p_offset will be adjusted later. */
+ phdr++;
+ ehdr->e_phnum++;
+ }
+ list_splice_tail(list, &vmcore_list);
+
+ /* We changed elfcorebuf_sz and added new entries; reset all offsets. */
+ vmcore_reset_offsets_elf64();
+
+ /* Finally, recalculated the total vmcore size. */
+ vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+ &vmcore_list);
+ proc_vmcore->size = vmcore_size;
+ return 0;
+}
+
+static void vmcore_process_device_ram(struct vmcore_cb *cb)
+{
+ unsigned char *e_ident = (unsigned char *)elfcorebuf;
+ struct vmcore_mem_node *first, *m;
+ LIST_HEAD(list);
+ int count;
+
+ if (cb->get_device_ram(cb, &list)) {
+ pr_err("Kdump: obtaining device ram ranges failed\n");
+ return;
+ }
+ count = list_count_nodes(&list);
+ if (!count)
+ return;
+
+ /* We only support Elf64 dumps for now. */
+ if (WARN_ON_ONCE(e_ident[EI_CLASS] != ELFCLASS64)) {
+ pr_err("Kdump: device ram ranges only support Elf64\n");
+ goto out_free;
+ }
+
+ /*
+ * For some reason these ranges are already know? Might happen
+ * with unusual register->unregister->register sequences; we'll simply
+ * sanity check using the first range.
+ */
+ first = list_first_entry(&list, struct vmcore_mem_node, list);
+ list_for_each_entry(m, &vmcore_list, list) {
+ unsigned long long m_end = m->paddr + m->size;
+ unsigned long long first_end = first->paddr + first->size;
+
+ if (first->paddr < m_end && m->paddr < first_end)
+ goto out_free;
+ }
+
+ /* If adding the mem nodes succeeds, they must not be freed. */
+ if (!vmcore_add_device_ram_elf64(&list, count))
+ return;
+out_free:
+ vmcore_free_mem_nodes(&list);
+}
+#else /* !CONFIG_PROC_VMCORE_DEVICE_RAM */
+static void vmcore_process_device_ram(struct vmcore_cb *cb)
+{
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
+
/* Free all dumps in vmcore device dump list */
static void vmcore_free_device_dumps(void)
{
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 722dbcff7371..8e581a053d7f 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -20,6 +20,8 @@ extern int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size);
extern void elfcorehdr_free(unsigned long long addr);
extern ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos);
extern ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+void elfcorehdr_fill_device_ram_ptload_elf64(Elf64_Phdr *phdr,
+ unsigned long long paddr, unsigned long long size);
extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
unsigned long from, unsigned long pfn,
unsigned long size, pgprot_t prot);
@@ -99,6 +101,12 @@ static inline void vmcore_unusable(void)
* indicated in the vmcore instead. For example, a ballooned page
* contains no data and reading from such a page will cause high
* load in the hypervisor.
+ * @get_device_ram: query RAM ranges that can only be detected by device
+ * drivers, such as the virtio-mem driver, so they can be included in
+ * the crash dump on architectures that allocate the elfcore hdr in the dump
+ * ("2nd") kernel. Indicated RAM ranges may contain holes to reduce the
+ * total number of ranges; such holes can be detected using the pfn_is_ram
+ * callback just like for other RAM.
* @next: List head to manage registered callbacks internally; initialized by
* register_vmcore_cb().
*
@@ -109,6 +117,7 @@ static inline void vmcore_unusable(void)
*/
struct vmcore_cb {
bool (*pfn_is_ram)(struct vmcore_cb *cb, unsigned long pfn);
+ int (*get_device_ram)(struct vmcore_cb *cb, struct list_head *list);
struct list_head next;
};
extern void register_vmcore_cb(struct vmcore_cb *cb);
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-10-25 15:11 ` [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
@ 2024-11-20 10:13 ` Baoquan He
2024-11-20 10:48 ` David Hildenbrand
2024-11-22 7:31 ` Baoquan He
1 sibling, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-20 10:13 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
> the crashed kernel.
>
> RAM provided by memory devices such as virtio-mem can only be detected
> using the device driver; when vmcore_init() is called, these device
> drivers are usually not loaded yet, or the devices did not get probed
> yet. Consequently, on s390 these RAM ranges will not be included in
> the crash dump, which makes the dump partially corrupt and is
> unfortunate.
>
> Instead of deferring the vmcore_init() call, to an (unclear?) later point,
> let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
> the device drivers probe the device and get access to this information.
>
> Then, we'll add these ranges to the vmcore, adding more PT_LOAD
> entries and updating the offsets+vmcore size.
>
> Use Kconfig tricks to include this code automatically only if (a) there is
> a device driver compiled that implements the callback
> (PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
> this information (NEED_PROC_VMCORE_DEVICE_RAM).
>
> The current target use case is s390, which only creates an elf64
> elfcore, so focusing on elf64 is sufficient.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> fs/proc/Kconfig | 25 ++++++
> fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
> include/linux/crash_dump.h | 9 +++
> 3 files changed, 190 insertions(+)
>
> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> index d80a1431ef7b..1e11de5f9380 100644
> --- a/fs/proc/Kconfig
> +++ b/fs/proc/Kconfig
> @@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
> as ELF notes to /proc/vmcore. You can still disable device
> dump using the kernel command line option 'novmcoredd'.
>
> +config PROVIDE_PROC_VMCORE_DEVICE_RAM
> + def_bool n
> +
> +config NEED_PROC_VMCORE_DEVICE_RAM
> + def_bool n
> +
> +config PROC_VMCORE_DEVICE_RAM
> + def_bool y
> + depends on PROC_VMCORE
> + depends on NEED_PROC_VMCORE_DEVICE_RAM
> + depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
Kconfig item is always a thing I need learn to master. When I checked
this part, I have to write them down to deliberate. I am wondering if
below 'simple version' works too and more understandable. Please help
point out what I have missed.
===========simple version======
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE && VIRTIO_MEM
depends on NEED_PROC_VMCORE_DEVICE_RAM
config S390
select NEED_PROC_VMCORE_DEVICE_RAM
============
======= config items extracted from this patchset====
config PROVIDE_PROC_VMCORE_DEVICE_RAM
def_bool n
config NEED_PROC_VMCORE_DEVICE_RAM
def_bool n
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE
depends on NEED_PROC_VMCORE_DEVICE_RAM
depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
config VIRTIO_MEM
depends on X86_64 || ARM64 || RISCV
~~~~~ I don't get why VIRTIO_MEM dones't depend on S390 if
s390 need PROC_VMCORE_DEVICE_RAM.
......
select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
config S390
select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
=================================================
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-20 10:13 ` Baoquan He
@ 2024-11-20 10:48 ` David Hildenbrand
2024-11-20 14:05 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-20 10:48 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 20.11.24 11:13, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
>> the crashed kernel.
>>
>> RAM provided by memory devices such as virtio-mem can only be detected
>> using the device driver; when vmcore_init() is called, these device
>> drivers are usually not loaded yet, or the devices did not get probed
>> yet. Consequently, on s390 these RAM ranges will not be included in
>> the crash dump, which makes the dump partially corrupt and is
>> unfortunate.
>>
>> Instead of deferring the vmcore_init() call, to an (unclear?) later point,
>> let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
>> the device drivers probe the device and get access to this information.
>>
>> Then, we'll add these ranges to the vmcore, adding more PT_LOAD
>> entries and updating the offsets+vmcore size.
>>
>> Use Kconfig tricks to include this code automatically only if (a) there is
>> a device driver compiled that implements the callback
>> (PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
>> this information (NEED_PROC_VMCORE_DEVICE_RAM).
>>
>> The current target use case is s390, which only creates an elf64
>> elfcore, so focusing on elf64 is sufficient.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> fs/proc/Kconfig | 25 ++++++
>> fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
>> include/linux/crash_dump.h | 9 +++
>> 3 files changed, 190 insertions(+)
>>
>> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
>> index d80a1431ef7b..1e11de5f9380 100644
>> --- a/fs/proc/Kconfig
>> +++ b/fs/proc/Kconfig
>> @@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
>> as ELF notes to /proc/vmcore. You can still disable device
>> dump using the kernel command line option 'novmcoredd'.
>>
>> +config PROVIDE_PROC_VMCORE_DEVICE_RAM
>> + def_bool n
>> +
>> +config NEED_PROC_VMCORE_DEVICE_RAM
>> + def_bool n
>> +
>> +config PROC_VMCORE_DEVICE_RAM
>> + def_bool y
>> + depends on PROC_VMCORE
>> + depends on NEED_PROC_VMCORE_DEVICE_RAM
>> + depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
>
> Kconfig item is always a thing I need learn to master.
Yes, it's usually a struggle to get it right. It took me a couple of
iterations to get to this point :)
> When I checked
> this part, I have to write them down to deliberate. I am wondering if
> below 'simple version' works too and more understandable. Please help
> point out what I have missed.
>
> ===========simple version======
> config PROC_VMCORE_DEVICE_RAM
> def_bool y
> depends on PROC_VMCORE && VIRTIO_MEM
> depends on NEED_PROC_VMCORE_DEVICE_RAM
>
> config S390
> select NEED_PROC_VMCORE_DEVICE_RAM
> ============
So the three changes you did are
(a) Remove the config option but select/depend on them.
(b) Remove the "depends on PROC_VMCORE" from PROC_VMCORE_DEVICE_RAM,
and the "if PROC_VMCORE" from s390.
(c) Remove the PROVIDE_PROC_VMCORE_DEVICE_RAM
Regarding (a), that doesn't work. If you select a config option that
doesn't exist, it is silently dropped. It's always treated as if it
wouldn't be set.
Regarding (b), I think that's an anti-pattern (having config options
enabled that are completely ineffective) and I don't see a benefit
dropping them.
Regarding (c), it would mean that s390x unconditionally includes that
code even if virtio-mem is not configured in.
So while we could drop PROVIDE_PROC_VMCORE_DEVICE_RAM -- (c), it would
that we end up including code in configurations that don't possibly need
it. That's why I included that part.
>
>
> ======= config items extracted from this patchset====
> config PROVIDE_PROC_VMCORE_DEVICE_RAM
> def_bool n
>
> config NEED_PROC_VMCORE_DEVICE_RAM
> def_bool n
>
> config PROC_VMCORE_DEVICE_RAM
> def_bool y
> depends on PROC_VMCORE
> depends on NEED_PROC_VMCORE_DEVICE_RAM
> depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
>
> config VIRTIO_MEM
> depends on X86_64 || ARM64 || RISCV
> ~~~~~ I don't get why VIRTIO_MEM dones't depend on S390 if
> s390 need PROC_VMCORE_DEVICE_RAM.
This series depends on s390 support for virtio-mem, which just went
upstream.
See
commit 38968bcdcc1d46f2fdcd3a72599d5193bf8baf84
Author: David Hildenbrand <david@redhat.com>
Date: Fri Oct 25 16:14:49 2024 +0200
virtio-mem: s390 support
> ......
> select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
>
> config S390
> select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> =================================================
>
Thanks for having a look!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-20 10:48 ` David Hildenbrand
@ 2024-11-20 14:05 ` Baoquan He
2024-11-20 14:39 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-20 14:05 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/20/24 at 11:48am, David Hildenbrand wrote:
> On 20.11.24 11:13, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
> > > the crashed kernel.
> > >
> > > RAM provided by memory devices such as virtio-mem can only be detected
> > > using the device driver; when vmcore_init() is called, these device
> > > drivers are usually not loaded yet, or the devices did not get probed
> > > yet. Consequently, on s390 these RAM ranges will not be included in
> > > the crash dump, which makes the dump partially corrupt and is
> > > unfortunate.
> > >
> > > Instead of deferring the vmcore_init() call, to an (unclear?) later point,
> > > let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
> > > the device drivers probe the device and get access to this information.
> > >
> > > Then, we'll add these ranges to the vmcore, adding more PT_LOAD
> > > entries and updating the offsets+vmcore size.
> > >
> > > Use Kconfig tricks to include this code automatically only if (a) there is
> > > a device driver compiled that implements the callback
> > > (PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
> > > this information (NEED_PROC_VMCORE_DEVICE_RAM).
> > >
> > > The current target use case is s390, which only creates an elf64
> > > elfcore, so focusing on elf64 is sufficient.
> > >
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > fs/proc/Kconfig | 25 ++++++
> > > fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
> > > include/linux/crash_dump.h | 9 +++
> > > 3 files changed, 190 insertions(+)
> > >
> > > diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> > > index d80a1431ef7b..1e11de5f9380 100644
> > > --- a/fs/proc/Kconfig
> > > +++ b/fs/proc/Kconfig
> > > @@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
> > > as ELF notes to /proc/vmcore. You can still disable device
> > > dump using the kernel command line option 'novmcoredd'.
> > > +config PROVIDE_PROC_VMCORE_DEVICE_RAM
> > > + def_bool n
> > > +
> > > +config NEED_PROC_VMCORE_DEVICE_RAM
> > > + def_bool n
> > > +
> > > +config PROC_VMCORE_DEVICE_RAM
> > > + def_bool y
> > > + depends on PROC_VMCORE
> > > + depends on NEED_PROC_VMCORE_DEVICE_RAM
> > > + depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
> >
> > Kconfig item is always a thing I need learn to master.
>
> Yes, it's usually a struggle to get it right. It took me a couple of
> iterations to get to this point :)
>
> > When I checked
> > this part, I have to write them down to deliberate. I am wondering if
> > below 'simple version' works too and more understandable. Please help
> > point out what I have missed.
> >
> > ===========simple version======
> > config PROC_VMCORE_DEVICE_RAM
> > def_bool y
> > depends on PROC_VMCORE && VIRTIO_MEM
> > depends on NEED_PROC_VMCORE_DEVICE_RAM
> >
> > config S390
> > select NEED_PROC_VMCORE_DEVICE_RAM
> > ============
Sorry, things written down didn't correctly reflect them in my mind.
===========simple version======
fs/proc/Kconfig:
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE && VIRTIO_MEM
depends on NEED_PROC_VMCORE_DEVICE_RAM
arch/s390/Kconfig:
config NEED_PROC_VMCORE_DEVICE_RAM
def y
==================================
>
> So the three changes you did are
>
> (a) Remove the config option but select/depend on them.
>
> (b) Remove the "depends on PROC_VMCORE" from PROC_VMCORE_DEVICE_RAM,
> and the "if PROC_VMCORE" from s390.
>
> (c) Remove the PROVIDE_PROC_VMCORE_DEVICE_RAM
>
>
> Regarding (a), that doesn't work. If you select a config option that doesn't
> exist, it is silently dropped. It's always treated as if it wouldn't be set.
>
> Regarding (b), I think that's an anti-pattern (having config options enabled
> that are completely ineffective) and I don't see a benefit dropping them.
>
> Regarding (c), it would mean that s390x unconditionally includes that code
> even if virtio-mem is not configured in.
>
> So while we could drop PROVIDE_PROC_VMCORE_DEVICE_RAM -- (c), it would that
> we end up including code in configurations that don't possibly need it.
> That's why I included that part.
>
> >
> >
> > ======= config items extracted from this patchset====
> > config PROVIDE_PROC_VMCORE_DEVICE_RAM
> > def_bool n
> >
> > config NEED_PROC_VMCORE_DEVICE_RAM
> > def_bool n
> >
> > config PROC_VMCORE_DEVICE_RAM
> > def_bool y
> > depends on PROC_VMCORE
> > depends on NEED_PROC_VMCORE_DEVICE_RAM
> > depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
> >
> > config VIRTIO_MEM
> > depends on X86_64 || ARM64 || RISCV
> > ~~~~~ I don't get why VIRTIO_MEM dones't depend on S390 if
> > s390 need PROC_VMCORE_DEVICE_RAM.
>
> This series depends on s390 support for virtio-mem, which just went
> upstream.
Got It, I just applied this series on top of the latest mainline's
master branch. Thanks for telling.
>
>
> commit 38968bcdcc1d46f2fdcd3a72599d5193bf8baf84
> Author: David Hildenbrand <david@redhat.com>
> Date: Fri Oct 25 16:14:49 2024 +0200
>
> virtio-mem: s390 support
>
>
> > ......
> > select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> >
> > config S390
> > select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> > =================================================
> >
>
> Thanks for having a look!
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-20 14:05 ` Baoquan He
@ 2024-11-20 14:39 ` David Hildenbrand
2024-11-21 4:30 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-20 14:39 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 20.11.24 15:05, Baoquan He wrote:
> On 11/20/24 at 11:48am, David Hildenbrand wrote:
>> On 20.11.24 11:13, Baoquan He wrote:
>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>>> s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
>>>> the crashed kernel.
>>>>
>>>> RAM provided by memory devices such as virtio-mem can only be detected
>>>> using the device driver; when vmcore_init() is called, these device
>>>> drivers are usually not loaded yet, or the devices did not get probed
>>>> yet. Consequently, on s390 these RAM ranges will not be included in
>>>> the crash dump, which makes the dump partially corrupt and is
>>>> unfortunate.
>>>>
>>>> Instead of deferring the vmcore_init() call, to an (unclear?) later point,
>>>> let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
>>>> the device drivers probe the device and get access to this information.
>>>>
>>>> Then, we'll add these ranges to the vmcore, adding more PT_LOAD
>>>> entries and updating the offsets+vmcore size.
>>>>
>>>> Use Kconfig tricks to include this code automatically only if (a) there is
>>>> a device driver compiled that implements the callback
>>>> (PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
>>>> this information (NEED_PROC_VMCORE_DEVICE_RAM).
>>>>
>>>> The current target use case is s390, which only creates an elf64
>>>> elfcore, so focusing on elf64 is sufficient.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> fs/proc/Kconfig | 25 ++++++
>>>> fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
>>>> include/linux/crash_dump.h | 9 +++
>>>> 3 files changed, 190 insertions(+)
>>>>
>>>> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
>>>> index d80a1431ef7b..1e11de5f9380 100644
>>>> --- a/fs/proc/Kconfig
>>>> +++ b/fs/proc/Kconfig
>>>> @@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
>>>> as ELF notes to /proc/vmcore. You can still disable device
>>>> dump using the kernel command line option 'novmcoredd'.
>>>> +config PROVIDE_PROC_VMCORE_DEVICE_RAM
>>>> + def_bool n
>>>> +
>>>> +config NEED_PROC_VMCORE_DEVICE_RAM
>>>> + def_bool n
>>>> +
>>>> +config PROC_VMCORE_DEVICE_RAM
>>>> + def_bool y
>>>> + depends on PROC_VMCORE
>>>> + depends on NEED_PROC_VMCORE_DEVICE_RAM
>>>> + depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
>>>
>>> Kconfig item is always a thing I need learn to master.
>>
>> Yes, it's usually a struggle to get it right. It took me a couple of
>> iterations to get to this point :)
>>
>>> When I checked
>>> this part, I have to write them down to deliberate. I am wondering if
>>> below 'simple version' works too and more understandable. Please help
>>> point out what I have missed.
>>>
>>> ===========simple version======
>>> config PROC_VMCORE_DEVICE_RAM
>>> def_bool y
>>> depends on PROC_VMCORE && VIRTIO_MEM
>>> depends on NEED_PROC_VMCORE_DEVICE_RAM
>>>
>>> config S390
>>> select NEED_PROC_VMCORE_DEVICE_RAM
>>> ============
>
> Sorry, things written down didn't correctly reflect them in my mind.
>
> ===========simple version======
> fs/proc/Kconfig:
> config PROC_VMCORE_DEVICE_RAM
> def_bool y
> depends on PROC_VMCORE && VIRTIO_MEM
> depends on NEED_PROC_VMCORE_DEVICE_RAM
>
> arch/s390/Kconfig:
> config NEED_PROC_VMCORE_DEVICE_RAM
> def y
> ==================================
That would work, but I don't completely like it.
(a) I want s390x to select NEED_PROC_VMCORE_DEVICE_RAM instead. Staring
at a bunch of similar cases (git grep "config NEED" | grep Kconfig, git
grep "config ARCH_WANTS" | grep Kconfig), "select" is the common way to
do it.
So unless there is a pretty good reason, I'll keep
NEED_PROC_VMCORE_DEVICE_RAM as is.
(b) In the context of this patch, "depends on VIRTIO_MEM" does not make
sense. We could have an intermediate:
config PROC_VMCORE_DEVICE_RAM
def_bool n
depends on PROC_VMCORE
depends on NEED_PROC_VMCORE_DEVICE_RAM
And change that with VIRTIO_MEM support in the relevant patch.
I faintly remember that we try avoiding such dependencies and prefer
selecting Kconfigs instead. Just look at the SPLIT_PTE_PTLOCKS mess we
still have to clean up. But as we don't expect that many providers for
now, I don't care.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-20 14:39 ` David Hildenbrand
@ 2024-11-21 4:30 ` Baoquan He
2024-11-21 19:47 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-21 4:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/20/24 at 03:39pm, David Hildenbrand wrote:
> On 20.11.24 15:05, Baoquan He wrote:
> > On 11/20/24 at 11:48am, David Hildenbrand wrote:
> > > On 20.11.24 11:13, Baoquan He wrote:
> > > > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > > > s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
> > > > > the crashed kernel.
> > > > >
> > > > > RAM provided by memory devices such as virtio-mem can only be detected
> > > > > using the device driver; when vmcore_init() is called, these device
> > > > > drivers are usually not loaded yet, or the devices did not get probed
> > > > > yet. Consequently, on s390 these RAM ranges will not be included in
> > > > > the crash dump, which makes the dump partially corrupt and is
> > > > > unfortunate.
> > > > >
> > > > > Instead of deferring the vmcore_init() call, to an (unclear?) later point,
> > > > > let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
> > > > > the device drivers probe the device and get access to this information.
> > > > >
> > > > > Then, we'll add these ranges to the vmcore, adding more PT_LOAD
> > > > > entries and updating the offsets+vmcore size.
> > > > >
> > > > > Use Kconfig tricks to include this code automatically only if (a) there is
> > > > > a device driver compiled that implements the callback
> > > > > (PROVIDE_PROC_VMCORE_DEVICE_RAM) and; (b) the architecture actually needs
> > > > > this information (NEED_PROC_VMCORE_DEVICE_RAM).
> > > > >
> > > > > The current target use case is s390, which only creates an elf64
> > > > > elfcore, so focusing on elf64 is sufficient.
> > > > >
> > > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > > ---
> > > > > fs/proc/Kconfig | 25 ++++++
> > > > > fs/proc/vmcore.c | 156 +++++++++++++++++++++++++++++++++++++
> > > > > include/linux/crash_dump.h | 9 +++
> > > > > 3 files changed, 190 insertions(+)
> > > > >
> > > > > diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> > > > > index d80a1431ef7b..1e11de5f9380 100644
> > > > > --- a/fs/proc/Kconfig
> > > > > +++ b/fs/proc/Kconfig
> > > > > @@ -61,6 +61,31 @@ config PROC_VMCORE_DEVICE_DUMP
> > > > > as ELF notes to /proc/vmcore. You can still disable device
> > > > > dump using the kernel command line option 'novmcoredd'.
> > > > > +config PROVIDE_PROC_VMCORE_DEVICE_RAM
> > > > > + def_bool n
> > > > > +
> > > > > +config NEED_PROC_VMCORE_DEVICE_RAM
> > > > > + def_bool n
> > > > > +
> > > > > +config PROC_VMCORE_DEVICE_RAM
> > > > > + def_bool y
> > > > > + depends on PROC_VMCORE
> > > > > + depends on NEED_PROC_VMCORE_DEVICE_RAM
> > > > > + depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
> > > >
> > > > Kconfig item is always a thing I need learn to master.
> > >
> > > Yes, it's usually a struggle to get it right. It took me a couple of
> > > iterations to get to this point :)
> > >
> > > > When I checked
> > > > this part, I have to write them down to deliberate. I am wondering if
> > > > below 'simple version' works too and more understandable. Please help
> > > > point out what I have missed.
> > > >
> > > > ===========simple version======
> > > > config PROC_VMCORE_DEVICE_RAM
> > > > def_bool y
> > > > depends on PROC_VMCORE && VIRTIO_MEM
> > > > depends on NEED_PROC_VMCORE_DEVICE_RAM
> > > >
> > > > config S390
> > > > select NEED_PROC_VMCORE_DEVICE_RAM
> > > > ============
> >
> > Sorry, things written down didn't correctly reflect them in my mind.
> >
> > ===========simple version======
> > fs/proc/Kconfig:
> > config PROC_VMCORE_DEVICE_RAM
> > def_bool y
> > depends on PROC_VMCORE && VIRTIO_MEM
> > depends on NEED_PROC_VMCORE_DEVICE_RAM
> > config NEED_PROC_VMCORE_DEVICE_RAM
> > def y
> >
> > arch/s390/Kconfig:
> > config NEED_PROC_VMCORE_DEVICE_RAM
> > def y
> > ==================================
>
> That would work, but I don't completely like it.
>
> (a) I want s390x to select NEED_PROC_VMCORE_DEVICE_RAM instead. Staring at a
> bunch of similar cases (git grep "config NEED" | grep Kconfig, git grep
> "config ARCH_WANTS" | grep Kconfig), "select" is the common way to do it.
>
> So unless there is a pretty good reason, I'll keep
> NEED_PROC_VMCORE_DEVICE_RAM as is.
That's easy to satify, see below:
============simple version=====
fs/proc/Kconfig:
config NEED_PROC_VMCORE_DEVICE_RAM
def n
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE && VIRTIO_MEM
depends on NEED_PROC_VMCORE_DEVICE_RAM
arch/s390/Kconfig:
config S390
select NEED_PROC_VMCORE_DEVICE_RAM
==============================
>
> (b) In the context of this patch, "depends on VIRTIO_MEM" does not make
> sense. We could have an intermediate:
>
> config PROC_VMCORE_DEVICE_RAM
> def_bool n
> depends on PROC_VMCORE
> depends on NEED_PROC_VMCORE_DEVICE_RAM
>
> And change that with VIRTIO_MEM support in the relevant patch.
Oh, it's not comment for this patch, I made the simple version based on
the whole patchset. When I had a glance at this patch, I also took
several iterations to get it after I applied the whole patchset and
tried to understand the whole code.
>
>
> I faintly remember that we try avoiding such dependencies and prefer
> selecting Kconfigs instead. Just look at the SPLIT_PTE_PTLOCKS mess we still
> have to clean up. But as we don't expect that many providers for now, I
> don't care.
With the simple version, Kconfig learner as me can easily understand what
they are doing. If it took you a couple of iterations to make them as
you had mentioned earlier, and it took me several iterations to
understand them, I believe there must be room to improve the presented
ones in this patchset. These are only my humble opinion, and I am not
aware of virtio-mem at all, I'll leave this to you and other virtio-mem
dev to decide what should be taken. Thanks for your patience and
provided information, I learned a lot from this discussion.
===================
fs/proc/Kconfig:
config PROVIDE_PROC_VMCORE_DEVICE_RAM
def_bool n
config NEED_PROC_VMCORE_DEVICE_RAM
def_bool n
config PROC_VMCORE_DEVICE_RAM
def_bool y
depends on PROC_VMCORE
depends on NEED_PROC_VMCORE_DEVICE_RAM
depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
drivers/virtio/Kconfig:
config VIRTIO_MEM
select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
~~~~~~~~~~~~~~
arch/s390/Kconfig:
config S390
select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
~~~~~~~~~~~~~~
========================
One last thing I haven't got well, If PROC_VMCORE_DEVICE_RAM has had
dependency on PROC_VMCORE, can we take off the ' if PROC_VMCORE' when
select PROVIDE_PROC_VMCORE_DEVICE_RAM and NEED_PROC_VMCORE_DEVICE_RAM?
Thanks
Baoquan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-21 4:30 ` Baoquan He
@ 2024-11-21 19:47 ` David Hildenbrand
2024-11-22 7:51 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-21 19:47 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
>>
>> That would work, but I don't completely like it.
>>
>> (a) I want s390x to select NEED_PROC_VMCORE_DEVICE_RAM instead. Staring at a
>> bunch of similar cases (git grep "config NEED" | grep Kconfig, git grep
>> "config ARCH_WANTS" | grep Kconfig), "select" is the common way to do it.
>>
>> So unless there is a pretty good reason, I'll keep
>> NEED_PROC_VMCORE_DEVICE_RAM as is.
>
> That's easy to satify, see below:
Yes, this is mostly what I have right now, except
>
> ============simple version=====
> fs/proc/Kconfig:
> config NEED_PROC_VMCORE_DEVICE_RAM
> def n
using "bool" here like other code. (I assume you meant "def_bool n",
"bool" seems to achieve the same thing)
>
> config PROC_VMCORE_DEVICE_RAM
> def_bool y
> depends on PROC_VMCORE && VIRTIO_MEM
> depends on NEED_PROC_VMCORE_DEVICE_RAM
>
> arch/s390/Kconfig:
> config S390
> select NEED_PROC_VMCORE_DEVICE_RAM
> ==============================
>
>>
>> (b) In the context of this patch, "depends on VIRTIO_MEM" does not make
>> sense. We could have an intermediate:
>>
>> config PROC_VMCORE_DEVICE_RAM
>> def_bool n
>> depends on PROC_VMCORE
>> depends on NEED_PROC_VMCORE_DEVICE_RAM
>>
>> And change that with VIRTIO_MEM support in the relevant patch.
>
> Oh, it's not comment for this patch, I made the simple version based on
> the whole patchset. When I had a glance at this patch, I also took
> several iterations to get it after I applied the whole patchset and
> tried to understand the whole code.
Makes sense, I'm figuring out how I can split that up.
If we can avoid the PROVIDE_* thing for now, great. Not a big fan of
that myself.
>
>>
>>
>> I faintly remember that we try avoiding such dependencies and prefer
>> selecting Kconfigs instead. Just look at the SPLIT_PTE_PTLOCKS mess we still
>> have to clean up. But as we don't expect that many providers for now, I
>> don't care.
>
> With the simple version, Kconfig learner as me can easily understand what
> they are doing. If it took you a couple of iterations to make them as
> you had mentioned earlier, and it took me several iterations to
> understand them, I believe there must be room to improve the presented
> ones in this patchset. These are only my humble opinion, and I am not
> aware of virtio-mem at all, I'll leave this to you and other virtio-mem
> dev to decide what should be taken. Thanks for your patience and
> provided information, I learned a lot from this discussion.
I hope I didn't express myself poorly: thanks a lot for the review and
the discussion! It helped to make the Kconfig stuff better. I'll get rid
of the PROVIDE_* thing for now and just depend on virtio-mem.
>
> ===================
> fs/proc/Kconfig:
> config PROVIDE_PROC_VMCORE_DEVICE_RAM
> def_bool n
>
> config NEED_PROC_VMCORE_DEVICE_RAM
> def_bool n
>
> config PROC_VMCORE_DEVICE_RAM
> def_bool y
> depends on PROC_VMCORE
> depends on NEED_PROC_VMCORE_DEVICE_RAM
> depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
>
> drivers/virtio/Kconfig:
> config VIRTIO_MEM
> select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> ~~~~~~~~~~~~~~
>
> arch/s390/Kconfig:
> config S390
> select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> ~~~~~~~~~~~~~~
> ========================
>
> One last thing I haven't got well, If PROC_VMCORE_DEVICE_RAM has had
> dependency on PROC_VMCORE, can we take off the ' if PROC_VMCORE' when
> select PROVIDE_PROC_VMCORE_DEVICE_RAM and NEED_PROC_VMCORE_DEVICE_RAM?
We could; it would mean that in a .config file you would end up with
"NEED_PROC_VMCORE_DEVICE_RAM=y" with "#PROC_VMCORE" and no notion of
"PROC_VMCORE_DEVICE_RAM".
I don't particularly like that -- needing something that apparently does
not exist. Not sure if there is a best practice here, staring at some
examples I don't seem to find a consistent rule. I can just drop it, not
the end of the world.
Did you get to look at the other code changes in this patch set? Your
feedback would be highly appreciated!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-21 19:47 ` David Hildenbrand
@ 2024-11-22 7:51 ` Baoquan He
0 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-22 7:51 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/21/24 at 08:47pm, David Hildenbrand wrote:
> > >
> > > That would work, but I don't completely like it.
> > >
> > > (a) I want s390x to select NEED_PROC_VMCORE_DEVICE_RAM instead. Staring at a
> > > bunch of similar cases (git grep "config NEED" | grep Kconfig, git grep
> > > "config ARCH_WANTS" | grep Kconfig), "select" is the common way to do it.
> > >
> > > So unless there is a pretty good reason, I'll keep
> > > NEED_PROC_VMCORE_DEVICE_RAM as is.
> >
> > That's easy to satify, see below:
>
> Yes, this is mostly what I have right now, except
>
> >
> > ============simple version=====
> > fs/proc/Kconfig:
> > config NEED_PROC_VMCORE_DEVICE_RAM
> > def n
>
> using "bool" here like other code. (I assume you meant "def_bool n", "bool"
> seems to achieve the same thing)
Yes, you are right. I didn't check it carefully.
>
> >
......
> > ===================
> > fs/proc/Kconfig:
> > config PROVIDE_PROC_VMCORE_DEVICE_RAM
> > def_bool n
> >
> > config NEED_PROC_VMCORE_DEVICE_RAM
> > def_bool n
> >
> > config PROC_VMCORE_DEVICE_RAM
> > def_bool y
> > depends on PROC_VMCORE
> > depends on NEED_PROC_VMCORE_DEVICE_RAM
> > depends on PROVIDE_PROC_VMCORE_DEVICE_RAM
> >
> > drivers/virtio/Kconfig:
> > config VIRTIO_MEM
> > select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> > ~~~~~~~~~~~~~~
> >
> > arch/s390/Kconfig:
> > config S390
> > select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
> > ~~~~~~~~~~~~~~
> > ========================
> >
> > One last thing I haven't got well, If PROC_VMCORE_DEVICE_RAM has had
> > dependency on PROC_VMCORE, can we take off the ' if PROC_VMCORE' when
> > select PROVIDE_PROC_VMCORE_DEVICE_RAM and NEED_PROC_VMCORE_DEVICE_RAM?
>
> We could; it would mean that in a .config file you would end up with
> "NEED_PROC_VMCORE_DEVICE_RAM=y" with "#PROC_VMCORE" and no notion of
> "PROC_VMCORE_DEVICE_RAM".
Fair enough. I didn't think of this. Then keeping it is obvisouly
better. Thanks.
>
> I don't particularly like that -- needing something that apparently does not
> exist. Not sure if there is a best practice here, staring at some examples I
> don't seem to find a consistent rule. I can just drop it, not the end of the
> world.
>
>
> Did you get to look at the other code changes in this patch set? Your
> feedback would be highly appreciated!
Will try. While I may not have valuable input about virtio-mem code.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-10-25 15:11 ` [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
2024-11-20 10:13 ` Baoquan He
@ 2024-11-22 7:31 ` Baoquan He
2024-11-22 9:47 ` David Hildenbrand
1 sibling, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-22 7:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
......snip...
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 3e90416ee54e..c332a9a4920b 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -69,6 +69,8 @@ static LIST_HEAD(vmcore_cb_list);
> /* Whether the vmcore has been opened once. */
> static bool vmcore_opened;
>
> +static void vmcore_process_device_ram(struct vmcore_cb *cb);
> +
> void register_vmcore_cb(struct vmcore_cb *cb)
> {
> INIT_LIST_HEAD(&cb->next);
> @@ -80,6 +82,8 @@ void register_vmcore_cb(struct vmcore_cb *cb)
> */
> if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback registration\n");
> + else if (cb->get_device_ram)
> + vmcore_process_device_ram(cb);
Global variable 'vmcore_opened' is used to indicate if /proc/vmcore is
opened. With &vmcore_mutex, we don't need to worry about concurrent
opening and modification. However, if people just open /proc/vmcore and
close it after checking, then s390 will miss the vmcore dumping, is it
acceptable?
> mutex_unlock(&vmcore_mutex);
> }
> EXPORT_SYMBOL_GPL(register_vmcore_cb);
> @@ -1511,6 +1515,158 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
......
> +
> +static void vmcore_process_device_ram(struct vmcore_cb *cb)
> +{
> + unsigned char *e_ident = (unsigned char *)elfcorebuf;
> + struct vmcore_mem_node *first, *m;
> + LIST_HEAD(list);
> + int count;
> +
> + if (cb->get_device_ram(cb, &list)) {
> + pr_err("Kdump: obtaining device ram ranges failed\n");
> + return;
> + }
> + count = list_count_nodes(&list);
> + if (!count)
> + return;
> +
> + /* We only support Elf64 dumps for now. */
> + if (WARN_ON_ONCE(e_ident[EI_CLASS] != ELFCLASS64)) {
> + pr_err("Kdump: device ram ranges only support Elf64\n");
> + goto out_free;
> + }
Only supporting Elf64 dumps seems to be a basic checking, do we need
to put it at the beginning of function? Otherwise, we spend efforts to
call cb->get_device_ram(), then fail.
> +
> + /*
> + * For some reason these ranges are already know? Might happen
> + * with unusual register->unregister->register sequences; we'll simply
> + * sanity check using the first range.
> + */
> + first = list_first_entry(&list, struct vmcore_mem_node, list);
> + list_for_each_entry(m, &vmcore_list, list) {
> + unsigned long long m_end = m->paddr + m->size;
> + unsigned long long first_end = first->paddr + first->size;
> +
> + if (first->paddr < m_end && m->paddr < first_end)
> + goto out_free;
> + }
> +
> + /* If adding the mem nodes succeeds, they must not be freed. */
> + if (!vmcore_add_device_ram_elf64(&list, count))
> + return;
> +out_free:
> + vmcore_free_mem_nodes(&list);
> +}
> +#else /* !CONFIG_PROC_VMCORE_DEVICE_RAM */
> +static void vmcore_process_device_ram(struct vmcore_cb *cb)
> +{
> +}
> +#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
> +
> /* Free all dumps in vmcore device dump list */
> static void vmcore_free_device_dumps(void)
> {
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 722dbcff7371..8e581a053d7f 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
2024-11-22 7:31 ` Baoquan He
@ 2024-11-22 9:47 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-11-22 9:47 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 22.11.24 08:31, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> ......snip...
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 3e90416ee54e..c332a9a4920b 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -69,6 +69,8 @@ static LIST_HEAD(vmcore_cb_list);
>> /* Whether the vmcore has been opened once. */
>> static bool vmcore_opened;
>>
>> +static void vmcore_process_device_ram(struct vmcore_cb *cb);
>> +
>> void register_vmcore_cb(struct vmcore_cb *cb)
>> {
>> INIT_LIST_HEAD(&cb->next);
>> @@ -80,6 +82,8 @@ void register_vmcore_cb(struct vmcore_cb *cb)
>> */
>> if (vmcore_opened)
>> pr_warn_once("Unexpected vmcore callback registration\n");
>> + else if (cb->get_device_ram)
>> + vmcore_process_device_ram(cb);
>
> Global variable 'vmcore_opened' is used to indicate if /proc/vmcore is
> opened. With &vmcore_mutex, we don't need to worry about concurrent
> opening and modification. However, if people just open /proc/vmcore and
> close it after checking, then s390 will miss the vmcore dumping, is it
> acceptable?
See my reply to your other mail (patch #3).
>
>> mutex_unlock(&vmcore_mutex);
>> }
>> EXPORT_SYMBOL_GPL(register_vmcore_cb);
>> @@ -1511,6 +1515,158 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> ......
>> +
>> +static void vmcore_process_device_ram(struct vmcore_cb *cb)
>> +{
>> + unsigned char *e_ident = (unsigned char *)elfcorebuf;
>> + struct vmcore_mem_node *first, *m;
>> + LIST_HEAD(list);
>> + int count;
>> +
>> + if (cb->get_device_ram(cb, &list)) {
>> + pr_err("Kdump: obtaining device ram ranges failed\n");
>> + return;
>> + }
>> + count = list_count_nodes(&list);
>> + if (!count)
>> + return;
>> +
>> + /* We only support Elf64 dumps for now. */
>> + if (WARN_ON_ONCE(e_ident[EI_CLASS] != ELFCLASS64)) {
>> + pr_err("Kdump: device ram ranges only support Elf64\n");
>> + goto out_free;
>> + }
>
> Only supporting Elf64 dumps seems to be a basic checking, do we need
> to put it at the beginning of function? Otherwise, we spend efforts to
> call cb->get_device_ram(), then fail.
The idea was that if there is nothing to add, then the elf class doesn't
matter. But yes, I can move this further up.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 08/11] virtio-mem: mark device ready before registering callbacks in kdump mode
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (6 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 07/11] fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 09/11] virtio-mem: remember usable region size David Hildenbrand
` (4 subsequent siblings)
12 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
After the callbacks are registered we may immediately get a callback. So
mark the device ready before registering the callbacks.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index b0b871441578..126f1d669bb0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2648,6 +2648,7 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
if (rc)
goto out_unreg_pm;
+ virtio_device_ready(vm->vdev);
return 0;
out_unreg_pm:
unregister_pm_notifier(&vm->pm_notifier);
@@ -2729,6 +2730,8 @@ static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb,
static int virtio_mem_init_kdump(struct virtio_mem *vm)
{
+ /* We must be prepared to receive a callback immediately. */
+ virtio_device_ready(vm->vdev);
#ifdef CONFIG_PROC_VMCORE
dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n");
vm->vmcore_cb.pfn_is_ram = virtio_mem_vmcore_pfn_is_ram;
@@ -2870,8 +2873,6 @@ static int virtio_mem_probe(struct virtio_device *vdev)
if (rc)
goto out_del_vq;
- virtio_device_ready(vdev);
-
/* trigger a config update to start processing the requested_size */
if (!vm->in_kdump) {
atomic_set(&vm->config_changed, 1);
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v1 09/11] virtio-mem: remember usable region size
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (7 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 08/11] virtio-mem: mark device ready before registering callbacks in kdump mode David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 10/11] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM David Hildenbrand
` (3 subsequent siblings)
12 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's remember the usable region size, which will be helpful in kdump
mode next.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 126f1d669bb0..73477d5b79cf 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -133,6 +133,8 @@ struct virtio_mem {
uint64_t addr;
/* Maximum region size in bytes. */
uint64_t region_size;
+ /* Usable region size in bytes. */
+ uint64_t usable_region_size;
/* The parent resource for all memory added via this device. */
struct resource *parent_resource;
@@ -2368,7 +2370,7 @@ static int virtio_mem_cleanup_pending_mb(struct virtio_mem *vm)
static void virtio_mem_refresh_config(struct virtio_mem *vm)
{
const struct range pluggable_range = mhp_get_pluggable_range(true);
- uint64_t new_plugged_size, usable_region_size, end_addr;
+ uint64_t new_plugged_size, end_addr;
/* the plugged_size is just a reflection of what _we_ did previously */
virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size,
@@ -2378,8 +2380,8 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
/* calculate the last usable memory block id */
virtio_cread_le(vm->vdev, struct virtio_mem_config,
- usable_region_size, &usable_region_size);
- end_addr = min(vm->addr + usable_region_size - 1,
+ usable_region_size, &vm->usable_region_size);
+ end_addr = min(vm->addr + vm->usable_region_size - 1,
pluggable_range.end);
if (vm->in_sbm) {
@@ -2763,6 +2765,8 @@ static int virtio_mem_init(struct virtio_mem *vm)
virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, &vm->addr);
virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
&vm->region_size);
+ virtio_cread_le(vm->vdev, struct virtio_mem_config, usable_region_size,
+ &vm->usable_region_size);
/* Determine the nid for the device based on the lowest address. */
if (vm->nid == NUMA_NO_NODE)
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v1 10/11] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (8 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 09/11] virtio-mem: remember usable region size David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-10-25 15:11 ` [PATCH v1 11/11] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) David Hildenbrand
` (2 subsequent siblings)
12 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's implement the get_device_ram() vmcore callback, so
architectures that select NEED_PROC_VMCORE_NEED_DEVICE_RAM, like s390
soon, can include that memory in a crash dump.
Merge ranges, and process ranges that might contain a mixture of plugged
and unplugged, to reduce the total number of ranges.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/Kconfig | 1 +
drivers/virtio/virtio_mem.c | 88 +++++++++++++++++++++++++++++++++++++
2 files changed, 89 insertions(+)
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 2eb747311bfd..60fdaf2c2c49 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -128,6 +128,7 @@ config VIRTIO_MEM
depends on MEMORY_HOTREMOVE
depends on CONTIG_ALLOC
depends on EXCLUSIVE_SYSTEM_RAM
+ select PROVIDE_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
help
This driver provides access to virtio-mem paravirtualized memory
devices, allowing to hotplug and hotunplug memory.
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 73477d5b79cf..1ae1199a7617 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2728,6 +2728,91 @@ static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb,
mutex_unlock(&vm->hotplug_mutex);
return is_ram;
}
+
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+static int virtio_mem_vmcore_add_device_ram(struct virtio_mem *vm,
+ struct list_head *list, uint64_t start, uint64_t end)
+{
+ int rc;
+
+ rc = vmcore_alloc_add_mem_node(list, start, end - start);
+ if (rc)
+ dev_err(&vm->vdev->dev,
+ "Error adding device RAM range: %d\n", rc);
+ return rc;
+}
+
+static int virtio_mem_vmcore_get_device_ram(struct vmcore_cb *cb,
+ struct list_head *list)
+{
+ struct virtio_mem *vm = container_of(cb, struct virtio_mem,
+ vmcore_cb);
+ const uint64_t device_start = vm->addr;
+ const uint64_t device_end = vm->addr + vm->usable_region_size;
+ uint64_t chunk_size, cur_start, cur_end, plugged_range_start = 0;
+ LIST_HEAD(tmp_list);
+ int rc;
+
+ if (!vm->plugged_size)
+ return 0;
+
+ /* Process memory sections, unless the device block size is bigger. */
+ chunk_size = max_t(uint64_t, PFN_PHYS(PAGES_PER_SECTION),
+ vm->device_block_size);
+
+ mutex_lock(&vm->hotplug_mutex);
+
+ /*
+ * We process larger chunks and indicate the complete chunk if any
+ * block in there is plugged. This reduces the number of pfn_is_ram()
+ * callbacks and mimic what is effectively being done when the old
+ * kernel would add complete memory sections/blocks to the elfcore hdr.
+ */
+ cur_start = device_start;
+ for (cur_start = device_start; cur_start < device_end; cur_start = cur_end) {
+ cur_end = ALIGN_DOWN(cur_start + chunk_size, chunk_size);
+ cur_end = min_t(uint64_t, cur_end, device_end);
+
+ rc = virtio_mem_send_state_request(vm, cur_start,
+ cur_end - cur_start);
+
+ if (rc < 0) {
+ dev_err(&vm->vdev->dev,
+ "Error querying block states: %d\n", rc);
+ goto out;
+ } else if (rc != VIRTIO_MEM_STATE_UNPLUGGED) {
+ /* Merge ranges with plugged memory. */
+ if (!plugged_range_start)
+ plugged_range_start = cur_start;
+ continue;
+ }
+
+ /* Flush any plugged range. */
+ if (plugged_range_start) {
+ rc = virtio_mem_vmcore_add_device_ram(vm, &tmp_list,
+ plugged_range_start,
+ cur_start);
+ if (rc)
+ goto out;
+ plugged_range_start = 0;
+ }
+ }
+
+ /* Flush any plugged range. */
+ if (plugged_range_start)
+ rc = virtio_mem_vmcore_add_device_ram(vm, &tmp_list,
+ plugged_range_start,
+ cur_start);
+out:
+ mutex_unlock(&vm->hotplug_mutex);
+ if (rc < 0) {
+ vmcore_free_mem_nodes(&tmp_list);
+ return rc;
+ }
+ list_splice_tail(&tmp_list, list);
+ return 0;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
#endif /* CONFIG_PROC_VMCORE */
static int virtio_mem_init_kdump(struct virtio_mem *vm)
@@ -2737,6 +2822,9 @@ static int virtio_mem_init_kdump(struct virtio_mem *vm)
#ifdef CONFIG_PROC_VMCORE
dev_info(&vm->vdev->dev, "memory hot(un)plug disabled in kdump kernel\n");
vm->vmcore_cb.pfn_is_ram = virtio_mem_vmcore_pfn_is_ram;
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+ vm->vmcore_cb.get_device_ram = virtio_mem_vmcore_get_device_ram;
+#endif /* CONFIG_PROC_VMCORE_DEVICE_RAM */
register_vmcore_cb(&vm->vmcore_cb);
return 0;
#else /* CONFIG_PROC_VMCORE */
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v1 11/11] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (9 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 10/11] virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM David Hildenbrand
@ 2024-10-25 15:11 ` David Hildenbrand
2024-11-04 6:21 ` [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 Baoquan He
2024-11-15 8:46 ` Baoquan He
12 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-10-25 15:11 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, linux-s390, virtualization, kvm, linux-fsdevel, kexec,
David Hildenbrand, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Baoquan He, Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
Let's add support for including virtio-mem device RAM in the crash dump,
setting NEED_PROC_VMCORE_DEVICE_RAM, and implementing
elfcorehdr_fill_device_ram_ptload_elf64().
To avoid code duplication, factor out the code to fill a PT_LOAD entry.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/Kconfig | 1 +
arch/s390/kernel/crash_dump.c | 39 ++++++++++++++++++++++++++++-------
2 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index d339fe4fdedf..d80450d957a9 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -230,6 +230,7 @@ config S390
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE if PCI
select NEED_PER_CPU_EMBED_FIRST_CHUNK
+ select NEED_PROC_VMCORE_DEVICE_RAM if PROC_VMCORE
select NEED_SG_DMA_LENGTH if PCI
select OLD_SIGACTION
select OLD_SIGSUSPEND3
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index edae13416196..97b9e71b734d 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -497,6 +497,19 @@ static int get_mem_chunk_cnt(void)
return cnt;
}
+static void fill_ptload(Elf64_Phdr *phdr, unsigned long paddr,
+ unsigned long vaddr, unsigned long size)
+{
+ phdr->p_type = PT_LOAD;
+ phdr->p_vaddr = vaddr;
+ phdr->p_offset = paddr;
+ phdr->p_paddr = paddr;
+ phdr->p_filesz = size;
+ phdr->p_memsz = size;
+ phdr->p_flags = PF_R | PF_W | PF_X;
+ phdr->p_align = PAGE_SIZE;
+}
+
/*
* Initialize ELF loads (new kernel)
*/
@@ -509,14 +522,8 @@ static void loads_init(Elf64_Phdr *phdr, bool os_info_has_vm)
if (os_info_has_vm)
old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
for_each_physmem_range(idx, &oldmem_type, &start, &end) {
- phdr->p_type = PT_LOAD;
- phdr->p_vaddr = old_identity_base + start;
- phdr->p_offset = start;
- phdr->p_paddr = start;
- phdr->p_filesz = end - start;
- phdr->p_memsz = end - start;
- phdr->p_flags = PF_R | PF_W | PF_X;
- phdr->p_align = PAGE_SIZE;
+ fill_ptload(phdr, start, old_identity_base + start,
+ end - start);
phdr++;
}
}
@@ -526,6 +533,22 @@ static bool os_info_has_vm(void)
return os_info_old_value(OS_INFO_KASLR_OFFSET);
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_RAM
+/*
+ * Fill PT_LOAD for a physical memory range owned by a device and detected by
+ * its device driver.
+ */
+void elfcorehdr_fill_device_ram_ptload_elf64(Elf64_Phdr *phdr,
+ unsigned long long paddr, unsigned long long size)
+{
+ unsigned long old_identity_base = 0;
+
+ if (os_info_has_vm())
+ old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE);
+ fill_ptload(phdr, paddr, old_identity_base + paddr, size);
+}
+#endif
+
/*
* Prepare PT_LOAD type program header for kernel image region
*/
--
2.46.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (10 preceding siblings ...)
2024-10-25 15:11 ` [PATCH v1 11/11] s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) David Hildenbrand
@ 2024-11-04 6:21 ` Baoquan He
2024-11-15 8:46 ` Baoquan He
12 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-04 6:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> virtio-mem support on s390.
>
> The only "different than everything else" thing about virtio-mem on s390
> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> crash kernel must detect memory ranges of the crashed/panicked kernel to
> include via PT_LOAD in the vmcore.
>
> On other architectures, all RAM regions (boot + hotplugged) can easily be
> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> the elfcore hdr.
>
> On s390, information about "ordinary" memory (heh, "storage") can be
> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> that information is stored early during boot in the "physmem" memblock
> data structure.
>
> But virtio-mem memory is always detected by as device driver, which is
> usually build as a module. So in the crash kernel, this memory can only be
> properly detected once the virtio-mem driver started up.
>
> The virtio-mem driver already supports the "kdump mode", where it won't
> hotplug any memory but instead queries the device to implement the
> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> the vmcore.
>
> With this series, if the virtio-mem driver is included in the kdump
> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> now detect the device RAM ranges on s390 once it probes the devices, to add
> them to the vmcore using the same callback mechanism we already have for
> pfn_is_ram().
>
> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> add new PT_LOAD entries that describe these memory ranges, and update
> all offsets vmcore size so it is all consistent.
>
> Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things
> (e.g., free page detection) work again while testing as documented in [2].
>
> Creating the dumps using makedumpfile seems to work fine, and the
> dump regions (PT_LOAD) are as expected. I yet have to check in more detail
> if the created dumps are good (IOW, the right memory was dumped, but it
> looks like makedumpfile reads the right memory when interpreting the
> kernel data structures, which is promising).
>
> Patch #1 -- #6 are vmcore preparations and cleanups
Thanks for CC-ing me, I will review the patch 1-6, vmcore part next
week.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390
2024-10-25 15:11 [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 David Hildenbrand
` (11 preceding siblings ...)
2024-11-04 6:21 ` [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390 Baoquan He
@ 2024-11-15 8:46 ` Baoquan He
2024-11-15 8:55 ` David Hildenbrand
12 siblings, 1 reply; 46+ messages in thread
From: Baoquan He @ 2024-11-15 8:46 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> virtio-mem support on s390.
>
> The only "different than everything else" thing about virtio-mem on s390
> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> crash kernel must detect memory ranges of the crashed/panicked kernel to
> include via PT_LOAD in the vmcore.
>
> On other architectures, all RAM regions (boot + hotplugged) can easily be
> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> the elfcore hdr.
>
> On s390, information about "ordinary" memory (heh, "storage") can be
> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> that information is stored early during boot in the "physmem" memblock
> data structure.
>
> But virtio-mem memory is always detected by as device driver, which is
> usually build as a module. So in the crash kernel, this memory can only be
~~~~~~~~~~~
Is it 1st kernel or 2nd kernel?
Usually we call the 1st kernel as panicked kernel, crashed kernel, the
2nd kernel as kdump kernel.
> properly detected once the virtio-mem driver started up.
>
> The virtio-mem driver already supports the "kdump mode", where it won't
> hotplug any memory but instead queries the device to implement the
> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> the vmcore.
>
> With this series, if the virtio-mem driver is included in the kdump
> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> now detect the device RAM ranges on s390 once it probes the devices, to add
> them to the vmcore using the same callback mechanism we already have for
> pfn_is_ram().
Do you mean on s390 virtio-mem memory region will be detected and added
to vmcore in kdump kernel when virtio-mem driver is initialized? Not
sure if I understand it correctly.
>
> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> add new PT_LOAD entries that describe these memory ranges, and update
> all offsets vmcore size so it is all consistent.
>
> Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things
> (e.g., free page detection) work again while testing as documented in [2].
>
> Creating the dumps using makedumpfile seems to work fine, and the
> dump regions (PT_LOAD) are as expected. I yet have to check in more detail
> if the created dumps are good (IOW, the right memory was dumped, but it
> looks like makedumpfile reads the right memory when interpreting the
> kernel data structures, which is promising).
>
> Patch #1 -- #6 are vmcore preparations and cleanups
> Patch #7 adds the infrastructure for drivers to report device RAM
> Patch #8 + #9 are virtio-mem preparations
> Patch #10 implements virtio-mem support to report device RAM
> Patch #11 activates it for s390, implementing a new function to fill
> PT_LOAD entry for device RAM
>
> [1] https://lkml.kernel.org/r/20241025141453.1210600-1-david@redhat.com
> [2] https://github.com/makedumpfile/makedumpfile/issues/16
>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Cc: "Eugenio Pérez" <eperezma@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Eric Farman <farman@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
>
> David Hildenbrand (11):
> fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
> fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
> fs/proc/vmcore: disallow vmcore modifications after the vmcore was
> opened
> fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
> fs/proc/vmcore: factor out allocating a vmcore memory node
> fs/proc/vmcore: factor out freeing a list of vmcore ranges
> fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
> ranges in 2nd kernel
> virtio-mem: mark device ready before registering callbacks in kdump
> mode
> virtio-mem: remember usable region size
> virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
> s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
>
> arch/s390/Kconfig | 1 +
> arch/s390/kernel/crash_dump.c | 39 +++--
> drivers/virtio/Kconfig | 1 +
> drivers/virtio/virtio_mem.c | 103 +++++++++++++-
> fs/proc/Kconfig | 25 ++++
> fs/proc/vmcore.c | 258 +++++++++++++++++++++++++---------
> include/linux/crash_dump.h | 47 +++++++
> include/linux/kcore.h | 13 --
> 8 files changed, 396 insertions(+), 91 deletions(-)
>
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390
2024-11-15 8:46 ` Baoquan He
@ 2024-11-15 8:55 ` David Hildenbrand
2024-11-15 9:48 ` Baoquan He
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-11-15 8:55 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 15.11.24 09:46, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
>> virtio-mem support on s390.
>>
>> The only "different than everything else" thing about virtio-mem on s390
>> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
>> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
>> crash kernel must detect memory ranges of the crashed/panicked kernel to
>> include via PT_LOAD in the vmcore.
>>
>> On other architectures, all RAM regions (boot + hotplugged) can easily be
>> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
>> the elfcore hdr.
>>
>> On s390, information about "ordinary" memory (heh, "storage") can be
>> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
>> that information is stored early during boot in the "physmem" memblock
>> data structure.
>>
>> But virtio-mem memory is always detected by as device driver, which is
>> usually build as a module. So in the crash kernel, this memory can only be
> ~~~~~~~~~~~
> Is it 1st kernel or 2nd kernel?
> Usually we call the 1st kernel as panicked kernel, crashed kernel, the
> 2nd kernel as kdump kernel.
It should have been called "kdump (2nd) kernel" here indeed.
>> properly detected once the virtio-mem driver started up.
>>
>> The virtio-mem driver already supports the "kdump mode", where it won't
>> hotplug any memory but instead queries the device to implement the
>> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
>> the vmcore.
>>
>> With this series, if the virtio-mem driver is included in the kdump
>> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
>> now detect the device RAM ranges on s390 once it probes the devices, to add
>> them to the vmcore using the same callback mechanism we already have for
>> pfn_is_ram().
>
> Do you mean on s390 virtio-mem memory region will be detected and added
> to vmcore in kdump kernel when virtio-mem driver is initialized? Not
> sure if I understand it correctly.
Yes exactly. In the kdump kernel, the driver gets probed and registers
the vmcore callbacks. From there, we detect and add the device regions.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/11] fs/proc/vmcore: kdump support for virtio-mem on s390
2024-11-15 8:55 ` David Hildenbrand
@ 2024-11-15 9:48 ` Baoquan He
0 siblings, 0 replies; 46+ messages in thread
From: Baoquan He @ 2024-11-15 9:48 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-kernel, linux-mm, linux-s390, virtualization, kvm,
linux-fsdevel, kexec, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Vivek Goyal, Dave Young, Thomas Huth, Cornelia Huck,
Janosch Frank, Claudio Imbrenda, Eric Farman, Andrew Morton
On 11/15/24 at 09:55am, David Hildenbrand wrote:
> On 15.11.24 09:46, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> > > virtio-mem support on s390.
> > >
> > > The only "different than everything else" thing about virtio-mem on s390
> > > is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> > > during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> > > crash kernel must detect memory ranges of the crashed/panicked kernel to
> > > include via PT_LOAD in the vmcore.
> > >
> > > On other architectures, all RAM regions (boot + hotplugged) can easily be
> > > observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> > > the elfcore hdr.
> > >
> > > On s390, information about "ordinary" memory (heh, "storage") can be
> > > obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> > > that information is stored early during boot in the "physmem" memblock
> > > data structure.
> > >
> > > But virtio-mem memory is always detected by as device driver, which is
> > > usually build as a module. So in the crash kernel, this memory can only be
> > ~~~~~~~~~~~
> > Is it 1st kernel or 2nd kernel?
> > Usually we call the 1st kernel as panicked kernel, crashed kernel, the
> > 2nd kernel as kdump kernel.
>
> It should have been called "kdump (2nd) kernel" here indeed.
>
> > > properly detected once the virtio-mem driver started up.
> > >
> > > The virtio-mem driver already supports the "kdump mode", where it won't
> > > hotplug any memory but instead queries the device to implement the
> > > pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> > > the vmcore.
> > >
> > > With this series, if the virtio-mem driver is included in the kdump
> > > initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> > > now detect the device RAM ranges on s390 once it probes the devices, to add
> > > them to the vmcore using the same callback mechanism we already have for
> > > pfn_is_ram().
> >
> > Do you mean on s390 virtio-mem memory region will be detected and added
> > to vmcore in kdump kernel when virtio-mem driver is initialized? Not
> > sure if I understand it correctly.
>
> Yes exactly. In the kdump kernel, the driver gets probed and registers the
> vmcore callbacks. From there, we detect and add the device regions.
I see now, thanks for your confirmation.
^ permalink raw reply [flat|nested] 46+ messages in thread