From: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
To: Dan Williams <dan.j.williams@intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
"Xiao Yang (Fujitsu)" <yangx.jy@fujitsu.com>,
"Shiyang Ruan (Fujitsu)" <ruansy.fnst@fujitsu.com>
Subject: Re: [RFC PATCH v2 0/3] pmem memmap dump support
Date: Wed, 10 May 2023 10:41:39 +0000 [thread overview]
Message-ID: <0fe0d69e-e33b-cf45-c957-68a8159d29ab@fujitsu.com> (raw)
In-Reply-To: <774fd596-5481-aeff-aace-8785158728ea@fujitsu.com>
Hi Dan
on 5/8/2023 5:45 PM, Zhijian Li (Fujitsu) wrote:
> Dan,
>
>
> On 29/04/2023 02:59, Dan Williams wrote:
>> Li Zhijian wrote:
>>> Hello folks,
>>>
>>> About 2 months ago, we posted our first RFC[3] and received your kindly feedback. Thank you :)
>>> Now, I'm back with the code.
>>>
>>> Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
>>> deliberately in makedumpfile. It includes changes in 3 source code as below:
>> I think the reason this patchkit is difficult to follow is that it
>> spends a lot of time describing a chosen solution, but not enough time
>> describing the problem and the tradeoffs.
>>
>> For example why is updating /proc/vmcore with pmem metadata the chosen
>> solution? Why not leave the kernel out of it and have makedumpfile
>> tooling aware of how to parse persistent memory namespace info-blocks
>> and retrieve that dump itself? This is what I proposed here:
>>
>> http://lore.kernel.org/r/641484f7ef780_a52e2940@dwillia2-mobl3.amr.corp.intel.com.notmuch
> Sorry for the late reply. I'm just back from the vacation.
> And sorry again for missing your previous *important* information in V1.
>
> Your proposal also sounds to me with less kernel changes, but more ndctl coupling with makedumpfile tools.
> In my current understanding, it will includes following source changes.
The kernel and makedumpfile has updated. It's still in a early stage, but in order to make sure I'm following your proposal.
i want to share the changes with you early. Alternatively, you are able to refer to my github for the full details.
https://github.com/zhijianli88/makedumpfile/commit/8ebfe38c015cfca0545cb3b1d7a6cc9a58fc9bb3
If I'm going the wrong way, fee free to let me know :)
>
> -----------+-------------------------------------------------------------------+
> Source | changes |
> -----------+-------------------------------------------------------------------+
> I. | 1. enter force_raw in kdump kernel automatically(avoid metadata being updated again)|
kernel should adapt it so that the metadata of pmem will be updated again in the kdump kernel:
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index c60ec0b373c5..2e59be8b9c78 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -8,6 +8,7 @@
#include <linux/slab.h>
#include <linux/list.h>
#include <linux/nd.h>
+#include <linux/crash_dump.h>
#include "nd-core.h"
#include "pmem.h"
#include "pfn.h"
@@ -1504,6 +1505,8 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
return ERR_PTR(-ENODEV);
}
+ if (is_kdump_kernel())
+ ndns->force_raw = true;
return ndns;
}
EXPORT_SYMBOL(nvdimm_namespace_common_probe);
> kernel | |
> | 2. mark the whole pmem's PT_LOAD for kexec_file_load(2) syscall |
> -----------+-------------------------------------------------------------------+
> II. kexec- | 1. mark the whole pmem's PT_LOAD for kexe_load(2) syscall |
> tool | |
> -----------+-------------------------------------------------------------------+
> III. | 1. parse the infoblock and calculate the boundaries of userdata and metadata |
> makedump- | 2. skip pmem userdata region |
> file | 3. exclude pmem metadata region if needed |
> -----------+-------------------------------------------------------------------+
>
> I will try rewrite it with your proposal ASAP
inspect_pmem_namespace() will walk the namespaces and the read its resource.start and infoblock. With this
information, we can calculate the boundaries of userdata and metadata easily. But currently this changes are
strongly coupling with the ndctl/pmem which looks a bit messy and ugly.
============makedumpfile=======
diff --git a/Makefile b/Makefile
index a289e41ef44d..4b4ded639cfd 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c
OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
-LIBS = -ldw -lbz2 -ldl -lelf -lz
+LIBS = -ldw -lbz2 -ldl -lelf -lz -lndctl
ifneq ($(LINKTYPE), dynamic)
LIBS := -static $(LIBS) -llzma
endif
diff --git a/makedumpfile.c b/makedumpfile.c
index 98c3b8c7ced9..db68d05a29f9 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,8 @@
#include <limits.h>
#include <assert.h>
#include <zlib.h>
+#include <sys/types.h>
+#include <ndctl/libndctl.h>
+
+#define INFOBLOCK_SZ (8192)
+#define SZ_4K (4096)
+#define PFN_SIG_LEN 16
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef int64_t le64;
+typedef int32_t le32;
+typedef int16_t le16;
+
+struct pfn_sb {
+ u8 signature[PFN_SIG_LEN];
+ u8 uuid[16];
+ u8 parent_uuid[16];
+ le32 flags;
+ le16 version_major;
+ le16 version_minor;
+ le64 dataoff; /* relative to namespace_base + start_pad */
+ le64 npfns;
+ le32 mode;
+ /* minor-version-1 additions for section alignment */
+ le32 start_pad;
+ le32 end_trunc;
+ /* minor-version-2 record the base alignment of the mapping */
+ le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
+ /* minor-version-4 record the page size and struct page size */
+ le32 page_size;
+ le16 page_struct_size;
+ u8 padding[3994];
+ le64 checksum;
+};
+
+static int nd_read_infoblock_dataoff(struct ndctl_namespace *ndns)
+{
+ int fd, rc;
+ char path[50];
+ char buf[INFOBLOCK_SZ + 1];
+ struct pfn_sb *pfn_sb = (struct pfn_sb *)(buf + SZ_4K);
+
+ sprintf(path, "/dev/%s", ndctl_namespace_get_block_device(ndns));
+
+ fd = open(path, O_RDONLY|O_EXCL);
+ if (fd < 0)
+ return -1;
+
+
+ rc = read(fd, buf, INFOBLOCK_SZ);
+ if (rc < INFOBLOCK_SZ) {
+ return -1;
+ }
+
+ return pfn_sb->dataoff;
+}
+
+int inspect_pmem_namespace(void)
+{
+ struct ndctl_ctx *ctx;
+ struct ndctl_bus *bus;
+ int rc = -1;
+
+ fprintf(stderr, "\n\ninspect_pmem_namespace!!\n\n");
+ rc = ndctl_new(&ctx);
+ if (rc)
+ return -1;
+
+ ndctl_bus_foreach(ctx, bus) {
+ struct ndctl_region *region;
+
+ ndctl_region_foreach(bus, region) {
+ struct ndctl_namespace *ndns;
+
+ ndctl_namespace_foreach(region, ndns) {
+ enum ndctl_namespace_mode mode;
+ long long start, end_metadata;
+
+ mode = ndctl_namespace_get_mode(ndns);
+ /* kdump kernel should set force_raw, mode become *safe* */
+ if (mode == NDCTL_NS_MODE_SAFE) {
+ fprintf(stderr, "Only raw can be dumpable\n");
+ continue;
+ }
+
+ start = ndctl_namespace_get_resource(ndns);
+ end_metadata = nd_read_infoblock_dataoff(ndns);
+
+ /* metadata really starts from 2M alignment */
+ if (start != ULLONG_MAX && end_metadata > 2 * 1024 * 1024) // 2M
+ pmem_add_next(start, end_metadata);
+ }
+ }
+ }
+
+ ndctl_unref(ctx);
+ return 0;
+}
+
Thanks
Zhijian
>
> Thanks again
>
> Thanks
> Zhijian
>
>> ...but never got an answer, or I missed the answer.
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2023-05-10 10:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 10:18 [RFC PATCH v2 0/3] pmem memmap dump support Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 1/3] crash: export dev memmap header to vmcoreinfo Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 2/3] drivers/nvdimm: export memmap of namespace " Li Zhijian
2023-04-27 22:50 ` Ira Weiny
2023-04-28 7:01 ` Zhijian Li (Fujitsu)
2023-04-27 10:18 ` [RFC PATCH v2 3/3] resource, crash: Make kexec_file_load support pmem Li Zhijian
2023-04-27 11:39 ` Greg Kroah-Hartman
2023-04-28 7:36 ` Zhijian Li (Fujitsu)
2023-04-27 20:41 ` Jane Chu
2023-04-28 7:10 ` Zhijian Li (Fujitsu)
2023-04-27 10:18 ` [RFC PATCH v2 kexec-tools] kexec: Add and mark pmem region into PT_LOADs Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 1/3] elf_info.c: Introduce is_pmem_pt_load_range Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 2/3] makedumpfile.c: Exclude all pmem pages Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 3/3] makedumpfile.c: Allow excluding metadata of pmem region Li Zhijian
2023-04-28 18:59 ` [RFC PATCH v2 0/3] pmem memmap dump support Dan Williams
2023-05-08 9:45 ` Zhijian Li (Fujitsu)
2023-05-10 10:41 ` Zhijian Li (Fujitsu) [this message]
2023-05-25 5:36 ` Li, Zhijian
-- strict thread matches above, loose matches on Subject: below --
2023-04-27 10:11 Li Zhijian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0fe0d69e-e33b-cf45-c957-68a8159d29ab@fujitsu.com \
--to=lizhijian@fujitsu.com \
--cc=dan.j.williams@intel.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=ruansy.fnst@fujitsu.com \
--cc=x86@kernel.org \
--cc=y-goto@fujitsu.com \
--cc=yangx.jy@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox