From: Baoquan He <bhe@redhat.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
Alison Schofield <alison.schofield@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"hpa@zytor.com" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Ira Weiny <ira.weiny@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vishal Verma <vishal.l.verma@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"x86@kernel.org" <x86@kernel.org>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [RFC PATCH v3 0/7] device backed vmemmap crash dump support
Date: Thu, 21 Mar 2024 14:17:02 +0800 [thread overview]
Message-ID: <ZfvQ3qbRWCZeSb62@MiWiFi-R3L-srv> (raw)
In-Reply-To: <92644ab5-6467-484c-b8f3-05cba2164cc1@fujitsu.com>
On 03/21/24 at 05:40am, Zhijian Li (Fujitsu) wrote:
> ping
>
>
> Any comment is welcome.
I will have a look at this from kdump side. How do you test your code?
By the way, there's issue reported by test robot.
Thanks
Baoquan
>
>
> On 06/03/2024 18:28, Li Zhijian wrote:
> > Hello folks,
> >
> > Compared with the V2[1] I posted a long time ago, this time it is a
> > completely new proposal design.
> >
> > ### Background and motivate overview ###
> > ---
> > Crash dump is an important feature for troubleshooting the kernel. It is the
> > final way to chase what happened at the kernel panic, slowdown, and so on. It
> > is one of the most important tools for customer support.
> >
> > Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
> > configure the dumpable regions. Generally, (A)iomem resources registered with
> > flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
> > (B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
> > are dumpable.
> >
> > The pmem use cases including fsdax and devdax, could map their vmemmap to
> > their own devices. In this case, these part of vmemmap will not be dumped when
> > crash happened since these regions are satisfied with neither the above (A)
> > nor (B).
> >
> > In fsdax, the vmemmap(struct page array) becomes very important, it is one of
> > the key data to find status of reverse map. Lacking of the information may
> > cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
> > That means troubleshooters are unable to check more details about pmem from
> > the dumpfile.
> >
> > ### Proposal ###
> > ---
> > In this proposal, register the device backed vmemmap as a separate resource.
> > This resource has its own new flag and name, and then teaches kexec_file_load(2)
> > and kexec_load(2) to mark it as dumpable.
> >
> > Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
> > Proposed name: "Device Backed Vmemmap"
> >
> > NOTE: crash-utils also needs to adapt to this new name for kexec_load()
> >
> > With current proposal, the /proc/iomem should show as following for device
> > backed vmemmap
> > # cat /proc/iomem
> > ...
> > fffc0000-ffffffff : Reserved
> > 100000000-13fffffff : Persistent Memory
> > 100000000-10fffffff : namespace0.0
> > 100000000-1005fffff : Device Backed Vmemmap # fsdax
> > a80000000-b7fffffff : CXL Window 0
> > a80000000-affffffff : Persistent Memory
> > a80000000-affffffff : region1
> > a80000000-a811fffff : namespace1.0
> > a80000000-a811fffff : Device Backed Vmemmap # devdax
> > a81200000-abfffffff : dax1.0
> > b80000000-c7fffffff : CXL Window 1
> > c80000000-147fffffff : PCI Bus 0000:00
> > c80000000-c801fffff : PCI Bus 0000:01
> > ...
> >
> > ### Kdump service reloading ###
> > ---
> > Once the kdump service is loaded, if changes to CPUs or memory occur,
> > either by hot un/plug or off/onlining, the crash elfcorehdr should also
> > be updated. There are 2 approaches to make the reloading more efficient.
> > 1) Use udev rules to watch CPU and memory events, then reload kdump
> > 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)
> >
> > This reloading also needed when device backed vmemmap layouts change, Similar
> > to what 1) does now, one could add the following as the first lines to the
> > RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
> >
> > # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> > # devdax <-> system-ram updated: watch daxX.Y of dax
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> >
> > Regarding 2), my idea is that it would need to call the memory_notify() in
> > devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
> > hotplug. This part is not yet mature, but it does not affect the whole feature
> > because we can still use method 1) alternatively.
> >
> > [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
> > --------------------------------------------
> > changes from V2[1]
> > - new proposal design
> >
> > CC: Alison Schofield <alison.schofield@intel.com>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Baoquan He <bhe@redhat.com>
> > CC: Borislav Petkov <bp@alien8.de>
> > CC: Dan Williams <dan.j.williams@intel.com>
> > CC: Dave Hansen <dave.hansen@linux.intel.com>
> > CC: Dave Jiang <dave.jiang@intel.com>
> > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > CC: "H. Peter Anvin" <hpa@zytor.com>
> > CC: Ingo Molnar <mingo@redhat.com>
> > CC: Ira Weiny <ira.weiny@intel.com>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Vishal Verma <vishal.l.verma@intel.com>
> > CC: linux-cxl@vger.kernel.org
> > CC: linux-mm@kvack.org
> > CC: nvdimm@lists.linux.dev
> > CC: x86@kernel.org
> >
> > Li Zhijian (7):
> > mm: memremap: register/unregister altmap region to a separate resource
> > mm: memremap: add pgmap_parent_resource() helper
> > nvdimm: pmem: assign a parent resource for vmemmap region for the
> > fsdax
> > dax: pmem: assign a parent resource for vmemmap region for the devdax
> > resource: Introduce walk device_backed_vmemmap res() helper
> > x86/crash: make device backed vmemmap dumpable for kexec_file_load
> > nvdimm: set force_raw=1 in kdump kernel
> >
> > arch/x86/kernel/crash.c | 5 +++++
> > drivers/dax/pmem.c | 8 ++++++--
> > drivers/nvdimm/namespace_devs.c | 3 +++
> > drivers/nvdimm/pmem.c | 9 ++++++---
> > include/linux/ioport.h | 4 ++++
> > include/linux/memremap.h | 4 ++++
> > kernel/resource.c | 13 +++++++++++++
> > mm/memremap.c | 30 +++++++++++++++++++++++++++++-
> > 8 files changed, 70 insertions(+), 6 deletions(-)
> >
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
Alison Schofield <alison.schofield@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"hpa@zytor.com" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Ira Weiny <ira.weiny@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vishal Verma <vishal.l.verma@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"x86@kernel.org" <x86@kernel.org>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [RFC PATCH v3 0/7] device backed vmemmap crash dump support
Date: Thu, 21 Mar 2024 14:17:02 +0800 [thread overview]
Message-ID: <ZfvQ3qbRWCZeSb62@MiWiFi-R3L-srv> (raw)
In-Reply-To: <92644ab5-6467-484c-b8f3-05cba2164cc1@fujitsu.com>
On 03/21/24 at 05:40am, Zhijian Li (Fujitsu) wrote:
> ping
>
>
> Any comment is welcome.
I will have a look at this from kdump side. How do you test your code?
By the way, there's issue reported by test robot.
Thanks
Baoquan
>
>
> On 06/03/2024 18:28, Li Zhijian wrote:
> > Hello folks,
> >
> > Compared with the V2[1] I posted a long time ago, this time it is a
> > completely new proposal design.
> >
> > ### Background and motivate overview ###
> > ---
> > Crash dump is an important feature for troubleshooting the kernel. It is the
> > final way to chase what happened at the kernel panic, slowdown, and so on. It
> > is one of the most important tools for customer support.
> >
> > Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
> > configure the dumpable regions. Generally, (A)iomem resources registered with
> > flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
> > (B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
> > are dumpable.
> >
> > The pmem use cases including fsdax and devdax, could map their vmemmap to
> > their own devices. In this case, these part of vmemmap will not be dumped when
> > crash happened since these regions are satisfied with neither the above (A)
> > nor (B).
> >
> > In fsdax, the vmemmap(struct page array) becomes very important, it is one of
> > the key data to find status of reverse map. Lacking of the information may
> > cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
> > That means troubleshooters are unable to check more details about pmem from
> > the dumpfile.
> >
> > ### Proposal ###
> > ---
> > In this proposal, register the device backed vmemmap as a separate resource.
> > This resource has its own new flag and name, and then teaches kexec_file_load(2)
> > and kexec_load(2) to mark it as dumpable.
> >
> > Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
> > Proposed name: "Device Backed Vmemmap"
> >
> > NOTE: crash-utils also needs to adapt to this new name for kexec_load()
> >
> > With current proposal, the /proc/iomem should show as following for device
> > backed vmemmap
> > # cat /proc/iomem
> > ...
> > fffc0000-ffffffff : Reserved
> > 100000000-13fffffff : Persistent Memory
> > 100000000-10fffffff : namespace0.0
> > 100000000-1005fffff : Device Backed Vmemmap # fsdax
> > a80000000-b7fffffff : CXL Window 0
> > a80000000-affffffff : Persistent Memory
> > a80000000-affffffff : region1
> > a80000000-a811fffff : namespace1.0
> > a80000000-a811fffff : Device Backed Vmemmap # devdax
> > a81200000-abfffffff : dax1.0
> > b80000000-c7fffffff : CXL Window 1
> > c80000000-147fffffff : PCI Bus 0000:00
> > c80000000-c801fffff : PCI Bus 0000:01
> > ...
> >
> > ### Kdump service reloading ###
> > ---
> > Once the kdump service is loaded, if changes to CPUs or memory occur,
> > either by hot un/plug or off/onlining, the crash elfcorehdr should also
> > be updated. There are 2 approaches to make the reloading more efficient.
> > 1) Use udev rules to watch CPU and memory events, then reload kdump
> > 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)
> >
> > This reloading also needed when device backed vmemmap layouts change, Similar
> > to what 1) does now, one could add the following as the first lines to the
> > RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
> >
> > # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> > # devdax <-> system-ram updated: watch daxX.Y of dax
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> >
> > Regarding 2), my idea is that it would need to call the memory_notify() in
> > devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
> > hotplug. This part is not yet mature, but it does not affect the whole feature
> > because we can still use method 1) alternatively.
> >
> > [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
> > --------------------------------------------
> > changes from V2[1]
> > - new proposal design
> >
> > CC: Alison Schofield <alison.schofield@intel.com>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Baoquan He <bhe@redhat.com>
> > CC: Borislav Petkov <bp@alien8.de>
> > CC: Dan Williams <dan.j.williams@intel.com>
> > CC: Dave Hansen <dave.hansen@linux.intel.com>
> > CC: Dave Jiang <dave.jiang@intel.com>
> > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > CC: "H. Peter Anvin" <hpa@zytor.com>
> > CC: Ingo Molnar <mingo@redhat.com>
> > CC: Ira Weiny <ira.weiny@intel.com>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Vishal Verma <vishal.l.verma@intel.com>
> > CC: linux-cxl@vger.kernel.org
> > CC: linux-mm@kvack.org
> > CC: nvdimm@lists.linux.dev
> > CC: x86@kernel.org
> >
> > Li Zhijian (7):
> > mm: memremap: register/unregister altmap region to a separate resource
> > mm: memremap: add pgmap_parent_resource() helper
> > nvdimm: pmem: assign a parent resource for vmemmap region for the
> > fsdax
> > dax: pmem: assign a parent resource for vmemmap region for the devdax
> > resource: Introduce walk device_backed_vmemmap res() helper
> > x86/crash: make device backed vmemmap dumpable for kexec_file_load
> > nvdimm: set force_raw=1 in kdump kernel
> >
> > arch/x86/kernel/crash.c | 5 +++++
> > drivers/dax/pmem.c | 8 ++++++--
> > drivers/nvdimm/namespace_devs.c | 3 +++
> > drivers/nvdimm/pmem.c | 9 ++++++---
> > include/linux/ioport.h | 4 ++++
> > include/linux/memremap.h | 4 ++++
> > kernel/resource.c | 13 +++++++++++++
> > mm/memremap.c | 30 +++++++++++++++++++++++++++++-
> > 8 files changed, 70 insertions(+), 6 deletions(-)
> >
next prev parent reply other threads:[~2024-03-21 6:17 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-06 10:28 [RFC PATCH v3 0/7] device backed vmemmap crash dump support Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 1/7] mm: memremap: register/unregister altmap region to a separate resource Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 2/7] mm: memremap: add pgmap_parent_resource() helper Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 3/7] nvdimm: pmem: assign a parent resource for vmemmap region for the fsdax Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-07 8:07 ` kernel test robot
2024-03-07 8:07 ` kernel test robot
2024-03-07 11:08 ` kernel test robot
2024-03-07 11:08 ` kernel test robot
2024-03-06 10:28 ` [PATCH v3 4/7] dax: pmem: assign a parent resource for vmemmap region for the devdax Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 5/7] resource: Introduce walk device_backed_vmemmap res() helper Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 6/7] x86/crash: make device backed vmemmap dumpable for kexec_file_load Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-06 10:28 ` [PATCH v3 7/7] nvdimm: set force_raw=1 in kdump kernel Li Zhijian
2024-03-06 10:28 ` Li Zhijian
2024-03-21 5:40 ` [RFC PATCH v3 0/7] device backed vmemmap crash dump support Zhijian Li (Fujitsu)
2024-03-21 5:40 ` Zhijian Li (Fujitsu)
2024-03-21 6:17 ` Baoquan He [this message]
2024-03-21 6:17 ` Baoquan He
2024-03-21 6:57 ` Zhijian Li (Fujitsu)
2024-03-21 6:57 ` Zhijian Li (Fujitsu)
2024-04-19 2:05 ` Zhijian Li (Fujitsu)
2024-04-19 2:05 ` Zhijian Li (Fujitsu)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfvQ3qbRWCZeSb62@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=kexec@lists.infradead.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhijian@fujitsu.com \
--cc=mingo@redhat.com \
--cc=nvdimm@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=vishal.l.verma@intel.com \
--cc=x86@kernel.org \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.