* [RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers
2024-06-18 15:41 [RFC PATCH 0/4] Flush nvdimm/pmem to memory before machine restart Mathieu Desnoyers
@ 2024-06-18 15:41 ` Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart Mathieu Desnoyers
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2024-06-18 15:41 UTC (permalink / raw)
To: Dan Williams, Steven Rostedt
Cc: linux-kernel, Mathieu Desnoyers, Vishal Verma, Dave Jiang,
Ira Weiny, nvdimm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas, Will Deacon,
linux-arm-kernel
Introduce a new pre_restart notifier chain for callbacks that need to
be executed after the system has been made quiescent with
syscore_shutdown(), before machine restart.
This pre_restart notifier chain should be invoked on machine restart and
on emergency machine restart.
The use-case for this new notifier chain is to preserve tracing data
within pmem areas on systems where the BIOS does not clear memory across
warm reboots.
Why do we need a new notifier chain ?
1) The reboot and restart_prepare notifiers are called too early in the
reboot sequence: they are invoked before syscore_shutdown(), which
leaves other CPUs actively running threads while those notifiers are
invoked.
2) The "restart" notifier is meant to trigger the actual machine
restart, and is not meant to be invoked as a last step immediately
before restart. It is also not always used: some architecture code
choose to bypass this restart notifier and reboot directly from the
architecture code.
Wiring up the architecture code to call this notifier chain is left to
follow-up arch-specific patches.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: nvdimm@lists.linux.dev
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
---
include/linux/reboot.h | 4 ++++
kernel/reboot.c | 51 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index abcdde4df697..c7f340e81451 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -50,6 +50,10 @@ extern int register_restart_handler(struct notifier_block *);
extern int unregister_restart_handler(struct notifier_block *);
extern void do_kernel_restart(char *cmd);
+extern int register_pre_restart_handler(struct notifier_block *);
+extern int unregister_pre_restart_handler(struct notifier_block *);
+extern void do_kernel_pre_restart(char *cmd);
+
/*
* Architecture-specific implementations of sys_reboot commands.
*/
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 22c16e2564cc..b7287dd48d35 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -235,6 +235,57 @@ void do_kernel_restart(char *cmd)
atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd);
}
+/*
+ * Notifier list for kernel code which wants to be called immediately
+ * before restarting the system.
+ */
+static ATOMIC_NOTIFIER_HEAD(pre_restart_handler_list);
+
+/**
+ * register_pre_restart_handler - Register function to be called in preparation
+ * to reset the system
+ * @nb: Info about handler function to be called
+ *
+ * Registers a function with code to be called in preparation to restart
+ * the system.
+ *
+ * Currently always returns zero, as atomic_notifier_chain_register()
+ * always returns zero.
+ */
+int register_pre_restart_handler(struct notifier_block *nb)
+{
+ return atomic_notifier_chain_register(&pre_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(register_pre_restart_handler);
+
+/**
+ * unregister_pre_restart_handler - Unregister previously registered
+ * pre-restart handler
+ * @nb: Hook to be unregistered
+ *
+ * Unregisters a previously registered pre-restart handler function.
+ *
+ * Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_pre_restart_handler(struct notifier_block *nb)
+{
+ return atomic_notifier_chain_unregister(&pre_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(unregister_pre_restart_handler);
+
+/**
+ * do_kernel_pre_restart - Execute kernel pre-restart handler call chain
+ *
+ * Calls functions registered with register_pre_restart_handler.
+ *
+ * Expected to be called from machine_restart and
+ * machine_emergency_restart before invoking the restart handlers.
+ */
+void do_kernel_pre_restart(char *cmd)
+{
+ atomic_notifier_call_chain(&pre_restart_handler_list, reboot_mode, cmd);
+}
+
void migrate_to_reboot_cpu(void)
{
/* The boot cpu is always logical cpu 0 */
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart
2024-06-18 15:41 [RFC PATCH 0/4] Flush nvdimm/pmem to memory before machine restart Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers Mathieu Desnoyers
@ 2024-06-18 15:41 ` Mathieu Desnoyers
2024-06-19 15:35 ` kernel test robot
2024-06-19 17:46 ` kernel test robot
2024-06-18 15:41 ` [RFC PATCH 3/4] arm64: Invoke pre_restart notifiers Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 4/4] x86: " Mathieu Desnoyers
3 siblings, 2 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2024-06-18 15:41 UTC (permalink / raw)
To: Dan Williams, Steven Rostedt
Cc: linux-kernel, Mathieu Desnoyers, Vishal Verma, Dave Jiang,
Ira Weiny, nvdimm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas, Will Deacon,
linux-arm-kernel
Register pre-restart notifiers to flush pmem areas from CPU data cache
to memory on reboot, immediately before restarting the machine. This
ensures all other CPUs are quiescent before the pmem data is flushed to
memory.
I did an earlier POC that flushed caches on panic/die oops notifiers [1],
but it did not cover the reboot case. I've been made aware that some
distribution vendors have started shipping their own modified version of
my earlier POC patch. This makes a strong argument for upstreaming this
work.
Use the newly introduced "pre-restart" notifiers to flush pmem data to
memory immediately before machine restart.
Delta from my POC patch [1]:
Looking at the panic() code, it invokes emergency_restart() to restart
the machine, which uses the new pre-restart notifiers. There is
therefore no need to hook into panic handlers explicitly.
Looking at the die notifiers, those don't actually end up triggering
a machine restart, so it does not appear to be relevant to flush pmem
to memory there. I must admit I originally looked at how ftrace hooked
into panic/die-oops handlers for its ring buffers, but the use-case it
different here: we only want to cover machine restart use-cases.
Link: https://lore.kernel.org/linux-kernel/f6067e3e-a2bc-483d-b214-6e3fe6691279@efficios.com/ [1]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: nvdimm@lists.linux.dev
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
---
drivers/nvdimm/pmem.c | 29 ++++++++++++++++++++++++++++-
drivers/nvdimm/pmem.h | 2 ++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 598fe2e89bda..bf1d187a9dca 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -26,12 +26,16 @@
#include <linux/dax.h>
#include <linux/nd.h>
#include <linux/mm.h>
+#include <linux/reboot.h>
#include <asm/cacheflush.h>
#include "pmem.h"
#include "btt.h"
#include "pfn.h"
#include "nd.h"
+static int pmem_pre_restart_handler(struct notifier_block *self,
+ unsigned long ev, void *unused);
+
static struct device *to_dev(struct pmem_device *pmem)
{
/*
@@ -423,6 +427,7 @@ static void pmem_release_disk(void *__pmem)
{
struct pmem_device *pmem = __pmem;
+ unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
dax_remove_host(pmem->disk);
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);
@@ -575,9 +580,14 @@ static int pmem_attach_disk(struct device *dev,
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
}
- rc = device_add_disk(dev, disk, pmem_attribute_groups);
+ pmem->pre_restart_notifier.notifier_call = pmem_pre_restart_handler;
+ pmem->pre_restart_notifier.priority = 0;
+ rc = register_pre_restart_notifier(&pmem->pre_restart_notifier);
if (rc)
goto out_remove_host;
+ rc = device_add_disk(dev, disk, pmem_attribute_groups);
+ if (rc)
+ goto out_unregister_reboot;
if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
return -ENOMEM;
@@ -589,6 +599,8 @@ static int pmem_attach_disk(struct device *dev,
dev_warn(dev, "'badblocks' notification disabled\n");
return 0;
+out_unregister_pre_restart:
+ unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
out_remove_host:
dax_remove_host(pmem->disk);
out_cleanup_dax:
@@ -751,6 +763,21 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
}
}
+/*
+ * For volatile memory use-cases where explicit flushing of the data cache is
+ * not useful after stores, the pmem reboot notifier is called on preparation
+ * for restart to make sure the content of the pmem memory area is flushed from
+ * data cache to memory, so it can be preserved across warm reboot.
+ */
+static int pmem_pre_restart_handler(struct notifier_block *self,
+ unsigned long ev, void *unused)
+{
+ struct pmem_device *pmem = container_of(self, struct pmem_device, pre_restart_notifier);
+
+ arch_wb_cache_pmem(pmem->virt_addr, pmem->size);
+ return NOTIFY_DONE;
+}
+
MODULE_ALIAS("pmem");
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 392b0b38acb9..b8a2a518cf82 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -4,6 +4,7 @@
#include <linux/page-flags.h>
#include <linux/badblocks.h>
#include <linux/memremap.h>
+#include <linux/notifier.h>
#include <linux/types.h>
#include <linux/pfn_t.h>
#include <linux/fs.h>
@@ -27,6 +28,7 @@ struct pmem_device {
struct dax_device *dax_dev;
struct gendisk *disk;
struct dev_pagemap pgmap;
+ struct notifier_block pre_restart_notifier;
};
long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart
2024-06-18 15:41 ` [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart Mathieu Desnoyers
@ 2024-06-19 15:35 ` kernel test robot
2024-06-19 17:46 ` kernel test robot
1 sibling, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-06-19 15:35 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: llvm, oe-kbuild-all
Hi Mathieu,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on nvdimm/libnvdimm-for-next]
[also build test ERROR on arm64/for-next/core linus/master v6.10-rc4 next-20240618]
[cannot apply to nvdimm/dax-misc tip/x86/core rostedt-trace/for-next rostedt-trace/for-next-urgent]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/kernel-reboot-Introduce-pre_restart-notifiers/20240618-235520
base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
patch link: https://lore.kernel.org/r/20240618154157.334602-3-mathieu.desnoyers%40efficios.com
patch subject: [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart
config: x86_64-randconfig-014-20240619 (https://download.01.org/0day-ci/archive/20240619/202406192330.5Z2ExUMs-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240619/202406192330.5Z2ExUMs-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406192330.5Z2ExUMs-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
>> drivers/nvdimm/pmem.c:430:2: error: call to undeclared function 'unregister_pre_restart_notifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
430 | unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
| ^
drivers/nvdimm/pmem.c:430:2: note: did you mean 'unregister_pre_restart_handler'?
include/linux/reboot.h:54:12: note: 'unregister_pre_restart_handler' declared here
54 | extern int unregister_pre_restart_handler(struct notifier_block *);
| ^
>> drivers/nvdimm/pmem.c:585:7: error: call to undeclared function 'register_pre_restart_notifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
585 | rc = register_pre_restart_notifier(&pmem->pre_restart_notifier);
| ^
drivers/nvdimm/pmem.c:585:7: note: did you mean 'register_pre_restart_handler'?
include/linux/reboot.h:53:12: note: 'register_pre_restart_handler' declared here
53 | extern int register_pre_restart_handler(struct notifier_block *);
| ^
drivers/nvdimm/pmem.c:603:2: error: call to undeclared function 'unregister_pre_restart_notifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
603 | unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
| ^
>> drivers/nvdimm/pmem.c:590:8: error: use of undeclared label 'out_unregister_reboot'
590 | goto out_unregister_reboot;
| ^
>> drivers/nvdimm/pmem.c:602:1: warning: unused label 'out_unregister_pre_restart' [-Wunused-label]
602 | out_unregister_pre_restart:
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 4 errors generated.
vim +/unregister_pre_restart_notifier +430 drivers/nvdimm/pmem.c
425
426 static void pmem_release_disk(void *__pmem)
427 {
428 struct pmem_device *pmem = __pmem;
429
> 430 unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
431 dax_remove_host(pmem->disk);
432 kill_dax(pmem->dax_dev);
433 put_dax(pmem->dax_dev);
434 del_gendisk(pmem->disk);
435
436 put_disk(pmem->disk);
437 }
438
439 static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap,
440 unsigned long pfn, unsigned long nr_pages, int mf_flags)
441 {
442 struct pmem_device *pmem =
443 container_of(pgmap, struct pmem_device, pgmap);
444 u64 offset = PFN_PHYS(pfn) - pmem->phys_addr - pmem->data_offset;
445 u64 len = nr_pages << PAGE_SHIFT;
446
447 return dax_holder_notify_failure(pmem->dax_dev, offset, len, mf_flags);
448 }
449
450 static const struct dev_pagemap_ops fsdax_pagemap_ops = {
451 .memory_failure = pmem_pagemap_memory_failure,
452 };
453
454 static int pmem_attach_disk(struct device *dev,
455 struct nd_namespace_common *ndns)
456 {
457 struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
458 struct nd_region *nd_region = to_nd_region(dev->parent);
459 struct queue_limits lim = {
460 .logical_block_size = pmem_sector_size(ndns),
461 .physical_block_size = PAGE_SIZE,
462 .max_hw_sectors = UINT_MAX,
463 };
464 int nid = dev_to_node(dev), fua;
465 struct resource *res = &nsio->res;
466 struct range bb_range;
467 struct nd_pfn *nd_pfn = NULL;
468 struct dax_device *dax_dev;
469 struct nd_pfn_sb *pfn_sb;
470 struct pmem_device *pmem;
471 struct request_queue *q;
472 struct gendisk *disk;
473 void *addr;
474 int rc;
475
476 pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
477 if (!pmem)
478 return -ENOMEM;
479
480 rc = devm_namespace_enable(dev, ndns, nd_info_block_reserve());
481 if (rc)
482 return rc;
483
484 /* while nsio_rw_bytes is active, parse a pfn info block if present */
485 if (is_nd_pfn(dev)) {
486 nd_pfn = to_nd_pfn(dev);
487 rc = nvdimm_setup_pfn(nd_pfn, &pmem->pgmap);
488 if (rc)
489 return rc;
490 }
491
492 /* we're attaching a block device, disable raw namespace access */
493 devm_namespace_disable(dev, ndns);
494
495 dev_set_drvdata(dev, pmem);
496 pmem->phys_addr = res->start;
497 pmem->size = resource_size(res);
498 fua = nvdimm_has_flush(nd_region);
499 if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) || fua < 0) {
500 dev_warn(dev, "unable to guarantee persistence of writes\n");
501 fua = 0;
502 }
503
504 if (!devm_request_mem_region(dev, res->start, resource_size(res),
505 dev_name(&ndns->dev))) {
506 dev_warn(dev, "could not reserve region %pR\n", res);
507 return -EBUSY;
508 }
509
510 disk = blk_alloc_disk(&lim, nid);
511 if (IS_ERR(disk))
512 return PTR_ERR(disk);
513 q = disk->queue;
514
515 pmem->disk = disk;
516 pmem->pgmap.owner = pmem;
517 pmem->pfn_flags = PFN_DEV;
518 if (is_nd_pfn(dev)) {
519 pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
520 pmem->pgmap.ops = &fsdax_pagemap_ops;
521 addr = devm_memremap_pages(dev, &pmem->pgmap);
522 pfn_sb = nd_pfn->pfn_sb;
523 pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
524 pmem->pfn_pad = resource_size(res) -
525 range_len(&pmem->pgmap.range);
526 pmem->pfn_flags |= PFN_MAP;
527 bb_range = pmem->pgmap.range;
528 bb_range.start += pmem->data_offset;
529 } else if (pmem_should_map_pages(dev)) {
530 pmem->pgmap.range.start = res->start;
531 pmem->pgmap.range.end = res->end;
532 pmem->pgmap.nr_range = 1;
533 pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
534 pmem->pgmap.ops = &fsdax_pagemap_ops;
535 addr = devm_memremap_pages(dev, &pmem->pgmap);
536 pmem->pfn_flags |= PFN_MAP;
537 bb_range = pmem->pgmap.range;
538 } else {
539 addr = devm_memremap(dev, pmem->phys_addr,
540 pmem->size, ARCH_MEMREMAP_PMEM);
541 bb_range.start = res->start;
542 bb_range.end = res->end;
543 }
544
545 if (IS_ERR(addr)) {
546 rc = PTR_ERR(addr);
547 goto out;
548 }
549 pmem->virt_addr = addr;
550
551 blk_queue_write_cache(q, true, fua);
552 blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
553 blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
554 if (pmem->pfn_flags & PFN_MAP)
555 blk_queue_flag_set(QUEUE_FLAG_DAX, q);
556
557 disk->fops = &pmem_fops;
558 disk->private_data = pmem;
559 nvdimm_namespace_disk_name(ndns, disk->disk_name);
560 set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
561 / 512);
562 if (devm_init_badblocks(dev, &pmem->bb))
563 return -ENOMEM;
564 nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
565 disk->bb = &pmem->bb;
566
567 dax_dev = alloc_dax(pmem, &pmem_dax_ops);
568 if (IS_ERR(dax_dev)) {
569 rc = PTR_ERR(dax_dev);
570 if (rc != -EOPNOTSUPP)
571 goto out;
572 } else {
573 set_dax_nocache(dax_dev);
574 set_dax_nomc(dax_dev);
575 if (is_nvdimm_sync(nd_region))
576 set_dax_synchronous(dax_dev);
577 pmem->dax_dev = dax_dev;
578 rc = dax_add_host(dax_dev, disk);
579 if (rc)
580 goto out_cleanup_dax;
581 dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
582 }
583 pmem->pre_restart_notifier.notifier_call = pmem_pre_restart_handler;
584 pmem->pre_restart_notifier.priority = 0;
> 585 rc = register_pre_restart_notifier(&pmem->pre_restart_notifier);
586 if (rc)
587 goto out_remove_host;
588 rc = device_add_disk(dev, disk, pmem_attribute_groups);
589 if (rc)
> 590 goto out_unregister_reboot;
591 if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
592 return -ENOMEM;
593
594 nvdimm_check_and_set_ro(disk);
595
596 pmem->bb_state = sysfs_get_dirent(disk_to_dev(disk)->kobj.sd,
597 "badblocks");
598 if (!pmem->bb_state)
599 dev_warn(dev, "'badblocks' notification disabled\n");
600 return 0;
601
> 602 out_unregister_pre_restart:
603 unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
604 out_remove_host:
605 dax_remove_host(pmem->disk);
606 out_cleanup_dax:
607 kill_dax(pmem->dax_dev);
608 put_dax(pmem->dax_dev);
609 out:
610 put_disk(pmem->disk);
611 return rc;
612 }
613
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart
2024-06-18 15:41 ` [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart Mathieu Desnoyers
2024-06-19 15:35 ` kernel test robot
@ 2024-06-19 17:46 ` kernel test robot
1 sibling, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-06-19 17:46 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: oe-kbuild-all
Hi Mathieu,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on nvdimm/libnvdimm-for-next]
[also build test ERROR on arm64/for-next/core linus/master v6.10-rc4 next-20240618]
[cannot apply to nvdimm/dax-misc tip/x86/core rostedt-trace/for-next rostedt-trace/for-next-urgent]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/kernel-reboot-Introduce-pre_restart-notifiers/20240618-235520
base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
patch link: https://lore.kernel.org/r/20240618154157.334602-3-mathieu.desnoyers%40efficios.com
patch subject: [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart
config: x86_64-buildonly-randconfig-003-20240619 (https://download.01.org/0day-ci/archive/20240620/202406200157.cl4DjMkB-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240620/202406200157.cl4DjMkB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406200157.cl4DjMkB-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
drivers/nvdimm/pmem.c: In function 'pmem_release_disk':
>> drivers/nvdimm/pmem.c:430:9: error: implicit declaration of function 'unregister_pre_restart_notifier'; did you mean 'unregister_pre_restart_handler'? [-Werror=implicit-function-declaration]
430 | unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| unregister_pre_restart_handler
drivers/nvdimm/pmem.c: In function 'pmem_attach_disk':
>> drivers/nvdimm/pmem.c:585:14: error: implicit declaration of function 'register_pre_restart_notifier'; did you mean 'register_pre_restart_handler'? [-Werror=implicit-function-declaration]
585 | rc = register_pre_restart_notifier(&pmem->pre_restart_notifier);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| register_pre_restart_handler
>> drivers/nvdimm/pmem.c:602:1: warning: label 'out_unregister_pre_restart' defined but not used [-Wunused-label]
602 | out_unregister_pre_restart:
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/nvdimm/pmem.c:590:17: error: label 'out_unregister_reboot' used but not defined
590 | goto out_unregister_reboot;
| ^~~~
cc1: some warnings being treated as errors
vim +430 drivers/nvdimm/pmem.c
425
426 static void pmem_release_disk(void *__pmem)
427 {
428 struct pmem_device *pmem = __pmem;
429
> 430 unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
431 dax_remove_host(pmem->disk);
432 kill_dax(pmem->dax_dev);
433 put_dax(pmem->dax_dev);
434 del_gendisk(pmem->disk);
435
436 put_disk(pmem->disk);
437 }
438
439 static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap,
440 unsigned long pfn, unsigned long nr_pages, int mf_flags)
441 {
442 struct pmem_device *pmem =
443 container_of(pgmap, struct pmem_device, pgmap);
444 u64 offset = PFN_PHYS(pfn) - pmem->phys_addr - pmem->data_offset;
445 u64 len = nr_pages << PAGE_SHIFT;
446
447 return dax_holder_notify_failure(pmem->dax_dev, offset, len, mf_flags);
448 }
449
450 static const struct dev_pagemap_ops fsdax_pagemap_ops = {
451 .memory_failure = pmem_pagemap_memory_failure,
452 };
453
454 static int pmem_attach_disk(struct device *dev,
455 struct nd_namespace_common *ndns)
456 {
457 struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
458 struct nd_region *nd_region = to_nd_region(dev->parent);
459 struct queue_limits lim = {
460 .logical_block_size = pmem_sector_size(ndns),
461 .physical_block_size = PAGE_SIZE,
462 .max_hw_sectors = UINT_MAX,
463 };
464 int nid = dev_to_node(dev), fua;
465 struct resource *res = &nsio->res;
466 struct range bb_range;
467 struct nd_pfn *nd_pfn = NULL;
468 struct dax_device *dax_dev;
469 struct nd_pfn_sb *pfn_sb;
470 struct pmem_device *pmem;
471 struct request_queue *q;
472 struct gendisk *disk;
473 void *addr;
474 int rc;
475
476 pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
477 if (!pmem)
478 return -ENOMEM;
479
480 rc = devm_namespace_enable(dev, ndns, nd_info_block_reserve());
481 if (rc)
482 return rc;
483
484 /* while nsio_rw_bytes is active, parse a pfn info block if present */
485 if (is_nd_pfn(dev)) {
486 nd_pfn = to_nd_pfn(dev);
487 rc = nvdimm_setup_pfn(nd_pfn, &pmem->pgmap);
488 if (rc)
489 return rc;
490 }
491
492 /* we're attaching a block device, disable raw namespace access */
493 devm_namespace_disable(dev, ndns);
494
495 dev_set_drvdata(dev, pmem);
496 pmem->phys_addr = res->start;
497 pmem->size = resource_size(res);
498 fua = nvdimm_has_flush(nd_region);
499 if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) || fua < 0) {
500 dev_warn(dev, "unable to guarantee persistence of writes\n");
501 fua = 0;
502 }
503
504 if (!devm_request_mem_region(dev, res->start, resource_size(res),
505 dev_name(&ndns->dev))) {
506 dev_warn(dev, "could not reserve region %pR\n", res);
507 return -EBUSY;
508 }
509
510 disk = blk_alloc_disk(&lim, nid);
511 if (IS_ERR(disk))
512 return PTR_ERR(disk);
513 q = disk->queue;
514
515 pmem->disk = disk;
516 pmem->pgmap.owner = pmem;
517 pmem->pfn_flags = PFN_DEV;
518 if (is_nd_pfn(dev)) {
519 pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
520 pmem->pgmap.ops = &fsdax_pagemap_ops;
521 addr = devm_memremap_pages(dev, &pmem->pgmap);
522 pfn_sb = nd_pfn->pfn_sb;
523 pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
524 pmem->pfn_pad = resource_size(res) -
525 range_len(&pmem->pgmap.range);
526 pmem->pfn_flags |= PFN_MAP;
527 bb_range = pmem->pgmap.range;
528 bb_range.start += pmem->data_offset;
529 } else if (pmem_should_map_pages(dev)) {
530 pmem->pgmap.range.start = res->start;
531 pmem->pgmap.range.end = res->end;
532 pmem->pgmap.nr_range = 1;
533 pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
534 pmem->pgmap.ops = &fsdax_pagemap_ops;
535 addr = devm_memremap_pages(dev, &pmem->pgmap);
536 pmem->pfn_flags |= PFN_MAP;
537 bb_range = pmem->pgmap.range;
538 } else {
539 addr = devm_memremap(dev, pmem->phys_addr,
540 pmem->size, ARCH_MEMREMAP_PMEM);
541 bb_range.start = res->start;
542 bb_range.end = res->end;
543 }
544
545 if (IS_ERR(addr)) {
546 rc = PTR_ERR(addr);
547 goto out;
548 }
549 pmem->virt_addr = addr;
550
551 blk_queue_write_cache(q, true, fua);
552 blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
553 blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
554 if (pmem->pfn_flags & PFN_MAP)
555 blk_queue_flag_set(QUEUE_FLAG_DAX, q);
556
557 disk->fops = &pmem_fops;
558 disk->private_data = pmem;
559 nvdimm_namespace_disk_name(ndns, disk->disk_name);
560 set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
561 / 512);
562 if (devm_init_badblocks(dev, &pmem->bb))
563 return -ENOMEM;
564 nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
565 disk->bb = &pmem->bb;
566
567 dax_dev = alloc_dax(pmem, &pmem_dax_ops);
568 if (IS_ERR(dax_dev)) {
569 rc = PTR_ERR(dax_dev);
570 if (rc != -EOPNOTSUPP)
571 goto out;
572 } else {
573 set_dax_nocache(dax_dev);
574 set_dax_nomc(dax_dev);
575 if (is_nvdimm_sync(nd_region))
576 set_dax_synchronous(dax_dev);
577 pmem->dax_dev = dax_dev;
578 rc = dax_add_host(dax_dev, disk);
579 if (rc)
580 goto out_cleanup_dax;
581 dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
582 }
583 pmem->pre_restart_notifier.notifier_call = pmem_pre_restart_handler;
584 pmem->pre_restart_notifier.priority = 0;
> 585 rc = register_pre_restart_notifier(&pmem->pre_restart_notifier);
586 if (rc)
587 goto out_remove_host;
588 rc = device_add_disk(dev, disk, pmem_attribute_groups);
589 if (rc)
> 590 goto out_unregister_reboot;
591 if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
592 return -ENOMEM;
593
594 nvdimm_check_and_set_ro(disk);
595
596 pmem->bb_state = sysfs_get_dirent(disk_to_dev(disk)->kobj.sd,
597 "badblocks");
598 if (!pmem->bb_state)
599 dev_warn(dev, "'badblocks' notification disabled\n");
600 return 0;
601
> 602 out_unregister_pre_restart:
603 unregister_pre_restart_notifier(&pmem->pre_restart_notifier);
604 out_remove_host:
605 dax_remove_host(pmem->disk);
606 out_cleanup_dax:
607 kill_dax(pmem->dax_dev);
608 put_dax(pmem->dax_dev);
609 out:
610 put_disk(pmem->disk);
611 return rc;
612 }
613
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 3/4] arm64: Invoke pre_restart notifiers
2024-06-18 15:41 [RFC PATCH 0/4] Flush nvdimm/pmem to memory before machine restart Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart Mathieu Desnoyers
@ 2024-06-18 15:41 ` Mathieu Desnoyers
2024-06-18 15:41 ` [RFC PATCH 4/4] x86: " Mathieu Desnoyers
3 siblings, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2024-06-18 15:41 UTC (permalink / raw)
To: Dan Williams, Steven Rostedt
Cc: linux-kernel, Mathieu Desnoyers, Vishal Verma, Dave Jiang,
Ira Weiny, nvdimm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas, Will Deacon,
linux-arm-kernel
Invoke the pre_restart notifiers after shutdown, before machine restart.
This allows preserving pmem memory across warm reboots.
Invoke the pre_restart notifiers before emergency machine restart as
well to cover the panic() scenario.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: nvdimm@lists.linux.dev
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
---
arch/arm64/kernel/process.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4ae31b7af6c3..4a27397617fb 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -129,6 +129,8 @@ void machine_restart(char *cmd)
local_irq_disable();
smp_send_stop();
+ do_kernel_pre_restart(cmd);
+
/*
* UpdateCapsule() depends on the system being reset via
* ResetSystem().
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 4/4] x86: Invoke pre_restart notifiers
2024-06-18 15:41 [RFC PATCH 0/4] Flush nvdimm/pmem to memory before machine restart Mathieu Desnoyers
` (2 preceding siblings ...)
2024-06-18 15:41 ` [RFC PATCH 3/4] arm64: Invoke pre_restart notifiers Mathieu Desnoyers
@ 2024-06-18 15:41 ` Mathieu Desnoyers
3 siblings, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2024-06-18 15:41 UTC (permalink / raw)
To: Dan Williams, Steven Rostedt
Cc: linux-kernel, Mathieu Desnoyers, Vishal Verma, Dave Jiang,
Ira Weiny, nvdimm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas, Will Deacon,
linux-arm-kernel
Invoke the pre_restart notifiers after shutdown, before machine restart.
This allows preserving pmem memory across warm reboots.
Invoke the pre_restart notifiers on emergency_machine_restart to cover
the panic() scenario.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: nvdimm@lists.linux.dev
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kernel/reboot.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..222619fa63c6 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -631,8 +631,10 @@ static void native_machine_emergency_restart(void)
int orig_reboot_type = reboot_type;
unsigned short mode;
- if (reboot_emergency)
+ if (reboot_emergency) {
+ do_kernel_pre_restart(NULL);
emergency_reboot_disable_virtualization();
+ }
tboot_shutdown(TB_SHUTDOWN_REBOOT);
@@ -760,12 +762,13 @@ static void __machine_emergency_restart(int emergency)
machine_ops.emergency_restart();
}
-static void native_machine_restart(char *__unused)
+static void native_machine_restart(char *cmd)
{
pr_notice("machine restart\n");
if (!reboot_force)
machine_shutdown();
+ do_kernel_pre_restart(cmd);
__machine_emergency_restart(0);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread