* [PATCH v2 00/20] libnd: non-volatile memory device support
@ 2015-04-28 18:24 Dan Williams
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
` (9 more replies)
0 siblings, 10 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm
Cc: Boaz Harrosh, Neil Brown, Dave Chinner, H. Peter Anvin,
Ingo Molnar, Rafael J. Wysocki, Robert Moore, Christoph Hellwig,
linux-acpi, Jeff Moyer, Nicholas Moulin, Matthew Wilcox,
Ross Zwisler, Vishal Verma, Jens Axboe, Borislav Petkov,
Thomas Gleixner, Greg KH, linux-kernel, Andy Lutomirski,
Andrew Morton, Linus Torvalds
Changes since v1 [1]: Incorporates feedback received prior to April 24.
1/ Ingo said [2]:
"So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??"
Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation. That is fixed
now with all NFIT specifics factored out into acpi.c. The NFIT is no
longer required reading to review libnd. Only three concepts are
needed:
i/ PMEM - contiguous memory range where cpu stores are
persistent once they are flushed through the memory
controller.
ii/ BLK - mmio apertures (sliding windows) that can be
programmed to access an aperture's-worth of persistent
media at a time.
iii/ DPA - "dimm-physical-address", address space local to a
dimm. A dimm may provide both PMEM-mode and BLK-mode
access to a range of DPA. libnd manages allocation of DPA
to either PMEM or BLK-namespaces to resolve this aliasing.
The v1..v2 diffstat below shows the migration of nfit-specifics to
acpi.c and the new state of libnd being nfit-free. "nd" now only
refers to "non-volatile devices". Note, reworked documentation will
return once the review has settled.
Documentation/blockdev/nd.txt | 867 ---------------------
MAINTAINERS | 34 +-
arch/ia64/kernel/efi.c | 5 +-
arch/x86/kernel/e820.c | 11 +-
arch/x86/kernel/pmem.c | 2 +-
drivers/block/Makefile | 2 +-
drivers/block/nd/Kconfig | 135 ++--
drivers/block/nd/Makefile | 32 +-
drivers/block/nd/acpi.c | 1506 +++++++++++++++++++++++++++++++------
drivers/block/nd/acpi_nfit.h | 321 ++++++++
drivers/block/nd/blk.c | 27 +-
drivers/block/nd/btt.c | 6 +-
drivers/block/nd/btt_devs.c | 8 +-
drivers/block/nd/bus.c | 337 +++++----
drivers/block/nd/core.c | 574 +-------------
drivers/block/nd/dimm.c | 11 -
drivers/block/nd/dimm_devs.c | 292 ++-----
drivers/block/nd/e820.c | 100 +++
drivers/block/nd/libnd.h | 122 +++
drivers/block/nd/namespace_devs.c | 10 +-
drivers/block/nd/nd-private.h | 107 +--
drivers/block/nd/nd.h | 91 +--
drivers/block/nd/nfit.h | 238 ------
drivers/block/nd/pmem.c | 56 +-
drivers/block/nd/region.c | 78 +-
drivers/block/nd/region_devs.c | 783 +++----------------
drivers/block/nd/test/iomap.c | 86 +--
drivers/block/nd/test/nfit.c | 1115 +++++++++++++++------------
drivers/block/nd/test/nfit_test.h | 15 +-
include/uapi/linux/ndctl.h | 130 ++--
30 files changed, 3166 insertions(+), 3935 deletions(-)
delete mode 100644 Documentation/blockdev/nd.txt
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/libnd.h
delete mode 100644 drivers/block/nd/nfit.h
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
2/ Christoph asked the pmem ida conversion to be moved to its own patch
(done), and to consider leaving the current pmem.c in drivers/block/.
Instead, I converted the e820-type-12 enabling to be the first
non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply
registers e820-type-12 ranges as libnd PMEM regions. Among other
things this conversion enables BTT for these ranges. The alternative
is to move drivers/block/nd/nd.h internals out to include/linux/
which I think is worse.
3/ Toshi reported that the NFIT parsing fails to handle the case of a
PMEM range with a single-dimm (non-aliasing) interleave description.
Support for this case was added and is tested by default by the
nfit_test.1 configuration.
4/ Toshi reported that we should not be treating a missing _STA property
as a "dimm disabled by firmware" case. (fixed).
5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
arch code. It is gone for now and we'll revisit when adding cached
mappings back to the PMEM driver.
6/ Toshi mentioned that the presence of two different nd_bus_probe()
functions was confusing. (cleaned up).
7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).
8/ Linda asked for nfit_test to honor dynamic cma reservations via the
cma= command line (done). The cma requirements have also been
reduced to 128M as only the simulated DAX regions need CMA. The rest
can use vmalloc().
---
Available here:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm nd-v2
---
Dan Williams (18):
e820, efi: add ACPI 6.0 persistent memory types
libnd, nd_acpi: initial libnd infrastructure and NFIT support
nd_acpi, nfit-test: manufactured NFITs for interface development
libnd: ndctl class device, and nd bus attributes
libnd, nd_acpi: dimm/memory-devices
libnd: ndctl.h, the nd ioctl abi
libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
libnd: support for legacy (non-aliasing) nvdimms
pmem: use ida
libnd, nd_pmem: add libnd support to the pmem driver
libnd, nd_acpi: add interleave-set state-tracking infrastructure
libnd: namespace indices: read and validate
libnd: pmem label sets and namespace instantiation.
libnd: blk labels and namespace instantiation
libnd: write pmem label set
libnd: write blk label set
libnd: infrastructure for btt devices
Ross Zwisler (1):
libnd, nd_acpi, nd_blk: driver for BLK-mode access persistent memory
Vishal Verma (1):
nd_btt: atomic sector updates
Documentation/blockdev/btt.txt | 273 ++++++
arch/arm64/kernel/efi.c | 1
arch/ia64/kernel/efi.c | 4
arch/x86/boot/compressed/eboot.c | 4
arch/x86/include/uapi/asm/e820.h | 1
arch/x86/kernel/e820.c | 26 +
arch/x86/kernel/pmem.c | 2
arch/x86/platform/efi/efi.c | 3
drivers/block/Kconfig | 13
drivers/block/Makefile | 2
drivers/block/nd/Kconfig | 129 +++
drivers/block/nd/Makefile | 41 +
drivers/block/nd/acpi.c | 1505 +++++++++++++++++++++++++++++++++
drivers/block/nd/acpi_nfit.h | 321 +++++++
drivers/block/nd/blk.c | 264 ++++++
drivers/block/nd/btt.c | 1423 +++++++++++++++++++++++++++++++
drivers/block/nd/btt.h | 185 ++++
drivers/block/nd/btt_devs.c | 443 ++++++++++
drivers/block/nd/bus.c | 770 +++++++++++++++++
drivers/block/nd/core.c | 471 ++++++++++
drivers/block/nd/dimm.c | 115 +++
drivers/block/nd/dimm_devs.c | 507 +++++++++++
drivers/block/nd/e820.c | 100 ++
drivers/block/nd/label.c | 925 ++++++++++++++++++++
drivers/block/nd/label.h | 143 +++
drivers/block/nd/libnd.h | 122 +++
drivers/block/nd/namespace_devs.c | 1701 +++++++++++++++++++++++++++++++++++++
drivers/block/nd/nd-private.h | 114 ++
drivers/block/nd/nd.h | 261 ++++++
drivers/block/nd/pmem.c | 114 ++
drivers/block/nd/region.c | 159 +++
drivers/block/nd/region_devs.c | 637 ++++++++++++++
drivers/block/nd/test/Makefile | 5
drivers/block/nd/test/iomap.c | 151 +++
drivers/block/nd/test/nfit.c | 1131 +++++++++++++++++++++++++
drivers/block/nd/test/nfit_test.h | 26 +
include/linux/efi.h | 3
include/linux/nd.h | 98 ++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 199 ++++
40 files changed, 12345 insertions(+), 48 deletions(-)
create mode 100644 Documentation/blockdev/btt.txt
create mode 100644 drivers/block/nd/Kconfig
create mode 100644 drivers/block/nd/Makefile
create mode 100644 drivers/block/nd/acpi.c
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/blk.c
create mode 100644 drivers/block/nd/btt.c
create mode 100644 drivers/block/nd/btt.h
create mode 100644 drivers/block/nd/btt_devs.c
create mode 100644 drivers/block/nd/bus.c
create mode 100644 drivers/block/nd/core.c
create mode 100644 drivers/block/nd/dimm.c
create mode 100644 drivers/block/nd/dimm_devs.c
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/label.c
create mode 100644 drivers/block/nd/label.h
create mode 100644 drivers/block/nd/libnd.h
create mode 100644 drivers/block/nd/namespace_devs.c
create mode 100644 drivers/block/nd/nd-private.h
create mode 100644 drivers/block/nd/nd.h
rename drivers/block/{pmem.c => nd/pmem.c} (68%)
create mode 100644 drivers/block/nd/region.c
create mode 100644 drivers/block/nd/region_devs.c
create mode 100644 drivers/block/nd/test/Makefile
create mode 100644 drivers/block/nd/test/iomap.c
create mode 100644 drivers/block/nd/test/nfit.c
create mode 100644 drivers/block/nd/test/nfit_test.h
create mode 100644 include/linux/nd.h
create mode 100644 include/uapi/linux/ndctl.h
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-04-30 23:23 ` Rafael J. Wysocki
2015-05-15 19:44 ` [Linux-nvdimm] " Jeff Moyer
2015-04-28 18:24 ` [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development Dan Williams
` (8 subsequent siblings)
9 siblings, 2 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm; +Cc: linux-acpi, Rafael J. Wysocki, Robert Moore, linux-kernel
1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
"ACPI0012"
2/ libnd bus registration
The NFIT provided by ACPI is one possible method by which platforms will
discover NVDIMM resources. However, the intent of the nd_bus_descriptor
abstraction is to abstract "provider" specific details, leaving libnd
to be independent of the specific NVDIMM resource discovery mechanism.
This flexibility is later exploited later to implement custom-defined nd
buses.
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/Kconfig | 2
drivers/block/Makefile | 1
drivers/block/nd/Kconfig | 40 +++
drivers/block/nd/Makefile | 6 +
drivers/block/nd/acpi.c | 475 +++++++++++++++++++++++++++++++++++++++++
drivers/block/nd/acpi_nfit.h | 254 ++++++++++++++++++++++
drivers/block/nd/core.c | 67 ++++++
drivers/block/nd/libnd.h | 33 +++
drivers/block/nd/nd-private.h | 23 ++
9 files changed, 901 insertions(+)
create mode 100644 drivers/block/nd/Kconfig
create mode 100644 drivers/block/nd/Makefile
create mode 100644 drivers/block/nd/acpi.c
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/core.c
create mode 100644 drivers/block/nd/libnd.h
create mode 100644 drivers/block/nd/nd-private.h
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index eb1fed5bd516..dfe40e5ca9bd 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -321,6 +321,8 @@ config BLK_DEV_NVME
To compile this driver as a module, choose M here: the
module will be called nvme.
+source "drivers/block/nd/Kconfig"
+
config BLK_DEV_SKD
tristate "STEC S1120 Block Driver"
depends on PCI
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 9cc6c18a1c7e..07a6acecf4d8 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
obj-$(CONFIG_MG_DISK) += mg_disk.o
obj-$(CONFIG_SUNVDC) += sunvdc.o
obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
+obj-$(CONFIG_ND_DEVICES) += nd/
obj-$(CONFIG_BLK_DEV_SKD) += skd.o
obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o
diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
new file mode 100644
index 000000000000..6d5d6b732f82
--- /dev/null
+++ b/drivers/block/nd/Kconfig
@@ -0,0 +1,40 @@
+menuconfig ND_DEVICES
+ bool "NVDIMM Support"
+ depends on PHYS_ADDR_T_64BIT
+ help
+ Generic support for non-volatile memory devices including
+ ACPI-6-NFIT defined resources. On platforms that define an
+ NFIT, or otherwise can discover NVDIMM resources, a libnd
+ bus is registered to advertise PMEM (persistent memory)
+ namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
+ namespaces (/dev/ndX). A PMEM namespace refers to a memory
+ resource that may span multiple DIMMs and support DAX (see
+ CONFIG_DAX). A BLK namespace refers to an NVDIMM control
+ region which exposes an mmio register set for windowed
+ access mode to non-volatile memory.
+
+if ND_DEVICES
+
+config LIBND
+ tristate "LIBND: libnd device driver support"
+ help
+ Platform agnostic device model for a libnd bus. Publishes
+ resources for a PMEM (persistent-memory) driver and/or BLK
+ (sliding mmio window(s)) driver to attach. Exposes a device
+ topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
+ message passing interface, and a "/dev/nmemX" dimm-ioctl
+ message interface for each memory device registered on the
+ bus. instance. A userspace library "ndctl" provides an API
+ to enumerate/manage this subsystem.
+
+config ND_ACPI
+ tristate "ACPI: NFIT to libnd bus support"
+ select LIBND
+ depends on ACPI
+ help
+ Infrastructure to probe ACPI 6 compliant platforms for
+ NVDIMMs (NFIT) and register a libnd device tree. In
+ addition to storage devices this also enables libnd craft
+ ACPI._DSM messages for platform/dimm configuration.
+
+endif
diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
new file mode 100644
index 000000000000..944b5947c0cb
--- /dev/null
+++ b/drivers/block/nd/Makefile
@@ -0,0 +1,6 @@
+obj-$(CONFIG_LIBND) += libnd.o
+obj-$(CONFIG_ND_ACPI) += nd_acpi.o
+
+nd_acpi-y := acpi.o
+
+libnd-y := core.o
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
new file mode 100644
index 000000000000..9f0b24390d1b
--- /dev/null
+++ b/drivers/block/nd/acpi.c
@@ -0,0 +1,475 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/list_sort.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/acpi.h>
+#include "acpi_nfit.h"
+#include "libnd.h"
+
+static bool warn_checksum;
+module_param(warn_checksum, bool, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(warn_checksum, "Turn checksum errors into warnings");
+
+enum {
+ NFIT_ACPI_NOTIFY_TABLE = 0x80,
+};
+
+static int nd_acpi_ctl(struct nd_bus_descriptor *nd_desc,
+ struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ return -ENOTTY;
+}
+
+static const char *spa_type_name(u16 type)
+{
+ switch (type) {
+ case NFIT_SPA_VOLATILE: return "volatile";
+ case NFIT_SPA_PM: return "pmem";
+ case NFIT_SPA_DCR: return "dimm-control-region";
+ case NFIT_SPA_BDW: return "block-data-window";
+ default: return "unknown";
+ }
+}
+
+static int nfit_spa_type(struct acpi_nfit_spa *spa)
+{
+ if (memcmp(&nfit_spa_uuid_volatile, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_VOLATILE;
+
+ if (memcmp(&nfit_spa_uuid_pm, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_PM;
+
+ if (memcmp(&nfit_spa_uuid_dcr, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_DCR;
+
+ if (memcmp(&nfit_spa_uuid_bdw, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_BDW;
+
+ if (memcmp(&nfit_spa_uuid_vdisk, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_VDISK;
+
+ if (memcmp(&nfit_spa_uuid_vcd, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_VCD;
+
+ if (memcmp(&nfit_spa_uuid_pdisk, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_PDISK;
+
+ if (memcmp(&nfit_spa_uuid_pcd, spa->type_uuid, 16) == 0)
+ return NFIT_SPA_PCD;
+
+ return -1;
+}
+
+struct nfit_table_header {
+ __le16 type;
+ __le16 length;
+};
+
+static void *add_table(struct acpi_nfit_desc *acpi_desc, void *table, const void *end)
+{
+ struct device *dev = acpi_desc->dev;
+ struct nfit_table_header *hdr;
+ void *err = ERR_PTR(-ENOMEM);
+
+ if (table >= end)
+ return NULL;
+
+ hdr = (struct nfit_table_header *) table;
+ switch (hdr->type) {
+ case NFIT_TABLE_SPA: {
+ struct nfit_spa *nfit_spa = devm_kzalloc(dev, sizeof(*nfit_spa),
+ GFP_KERNEL);
+ struct acpi_nfit_spa *spa = table;
+
+ if (!nfit_spa)
+ return err;
+ INIT_LIST_HEAD(&nfit_spa->list);
+ nfit_spa->spa = spa;
+ list_add_tail(&nfit_spa->list, &acpi_desc->spas);
+ dev_dbg(dev, "%s: spa index: %d type: %s\n", __func__,
+ spa->spa_index,
+ spa_type_name(nfit_spa_type(spa)));
+ break;
+ }
+ case NFIT_TABLE_MEM: {
+ struct nfit_memdev *nfit_memdev = devm_kzalloc(dev,
+ sizeof(*nfit_memdev), GFP_KERNEL);
+ struct acpi_nfit_memdev *memdev = table;
+
+ if (!nfit_memdev)
+ return err;
+ INIT_LIST_HEAD(&nfit_memdev->list);
+ nfit_memdev->memdev = memdev;
+ list_add_tail(&nfit_memdev->list, &acpi_desc->memdevs);
+ dev_dbg(dev, "%s: memdev handle: %#x spa: %d dcr: %d\n",
+ __func__, memdev->nfit_handle, memdev->spa_index,
+ memdev->dcr_index);
+ break;
+ }
+ case NFIT_TABLE_DCR: {
+ struct nfit_dcr *nfit_dcr = devm_kzalloc(dev, sizeof(*nfit_dcr),
+ GFP_KERNEL);
+ struct acpi_nfit_dcr *dcr = table;
+
+ if (!nfit_dcr)
+ return err;
+ INIT_LIST_HEAD(&nfit_dcr->list);
+ nfit_dcr->dcr = dcr;
+ list_add_tail(&nfit_dcr->list, &acpi_desc->dcrs);
+ dev_dbg(dev, "%s: dcr index: %d num_bcw: %d\n", __func__,
+ dcr->dcr_index, dcr->num_bcw);
+ break;
+ }
+ case NFIT_TABLE_BDW: {
+ struct nfit_bdw *nfit_bdw = devm_kzalloc(dev, sizeof(*nfit_bdw),
+ GFP_KERNEL);
+ struct acpi_nfit_bdw *bdw = table;
+
+ if (!nfit_bdw)
+ return err;
+ INIT_LIST_HEAD(&nfit_bdw->list);
+ nfit_bdw->bdw = bdw;
+ list_add_tail(&nfit_bdw->list, &acpi_desc->bdws);
+ dev_dbg(dev, "%s: bdw dcr: %d num_bdw: %d\n", __func__,
+ bdw->dcr_index, bdw->num_bdw);
+ break;
+ }
+ /* TODO */
+ case NFIT_TABLE_IDT:
+ dev_dbg(dev, "%s: idt\n", __func__);
+ break;
+ case NFIT_TABLE_FLUSH:
+ dev_dbg(dev, "%s: flush\n", __func__);
+ break;
+ case NFIT_TABLE_SMBIOS:
+ dev_dbg(dev, "%s: smbios\n", __func__);
+ break;
+ default:
+ dev_err(dev, "unknown table '%d' parsing nfit\n", hdr->type);
+ return ERR_PTR(-ENXIO);
+ }
+
+ return table + hdr->length;
+}
+
+static void nfit_mem_find_spa_bdw(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem)
+{
+ u32 nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
+ u16 dcr_index = nfit_mem->dcr->dcr_index;
+ struct nfit_spa *nfit_spa;
+
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ u16 spa_index = nfit_spa->spa->spa_index;
+ int type = nfit_spa_type(nfit_spa->spa);
+ struct nfit_memdev *nfit_memdev;
+
+ if (type != NFIT_SPA_BDW)
+ continue;
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ if (nfit_memdev->memdev->spa_index != spa_index)
+ continue;
+ if (nfit_memdev->memdev->nfit_handle != nfit_handle)
+ continue;
+ if (nfit_memdev->memdev->dcr_index != dcr_index)
+ continue;
+
+ nfit_mem->spa_bdw = nfit_spa->spa;
+ return;
+ }
+ }
+
+ dev_dbg(acpi_desc->dev, "SPA-BDW not found for SPA-DCR %d\n",
+ nfit_mem->spa_dcr->spa_index);
+ nfit_mem->bdw = NULL;
+}
+
+static int nfit_mem_add(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem, struct acpi_nfit_spa *spa)
+{
+ u16 dcr_index = __to_nfit_memdev(nfit_mem)->dcr_index;
+ struct nfit_dcr *nfit_dcr;
+ struct nfit_bdw *nfit_bdw;
+
+ list_for_each_entry(nfit_dcr, &acpi_desc->dcrs, list) {
+ if (nfit_dcr->dcr->dcr_index != dcr_index)
+ continue;
+ nfit_mem->dcr = nfit_dcr->dcr;
+ break;
+ }
+
+ if (!nfit_mem->dcr) {
+ dev_dbg(acpi_desc->dev, "SPA %d missing:%s%s\n", spa->spa_index,
+ __to_nfit_memdev(nfit_mem) ? "" : " MEMDEV",
+ nfit_mem->dcr ? "" : " DCR");
+ return -ENODEV;
+ }
+
+ /*
+ * We've found enough to create an nd_dimm, optionally
+ * find an associated BDW
+ */
+ list_add(&nfit_mem->list, &acpi_desc->dimms);
+
+ list_for_each_entry(nfit_bdw, &acpi_desc->bdws, list) {
+ if (nfit_bdw->bdw->dcr_index != dcr_index)
+ continue;
+ nfit_mem->bdw = nfit_bdw->bdw;
+ break;
+ }
+
+ if (!nfit_mem->bdw)
+ return 0;
+
+ nfit_mem_find_spa_bdw(acpi_desc, nfit_mem);
+ return 0;
+}
+
+static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
+ struct acpi_nfit_spa *spa)
+{
+ struct nfit_mem *nfit_mem, *found;
+ struct nfit_memdev *nfit_memdev;
+ int type = nfit_spa_type(spa);
+ u16 dcr_index;
+
+ switch (type) {
+ case NFIT_SPA_DCR:
+ case NFIT_SPA_PM:
+ break;
+ default:
+ return 0;
+ }
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ int rc;
+
+ if (nfit_memdev->memdev->spa_index != spa->spa_index)
+ continue;
+ found = NULL;
+ dcr_index = nfit_memdev->memdev->dcr_index;
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
+ if (__to_nfit_memdev(nfit_mem)->dcr_index == dcr_index) {
+ found = nfit_mem;
+ break;
+ }
+
+ if (found)
+ nfit_mem = found;
+ else {
+ nfit_mem = devm_kzalloc(acpi_desc->dev,
+ sizeof(*nfit_mem), GFP_KERNEL);
+ if (!nfit_mem)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&nfit_mem->list);
+ }
+
+ if (type == NFIT_SPA_DCR) {
+ /* multiple dimms may share a SPA when interleaved */
+ nfit_mem->spa_dcr = spa;
+ nfit_mem->memdev_dcr = nfit_memdev->memdev;
+ } else {
+ /*
+ * A single dimm may belong to multiple SPA-PM
+ * ranges, record at least one in addition to
+ * any SPA-DCR range.
+ */
+ nfit_mem->memdev_pmem = nfit_memdev->memdev;
+ }
+
+ if (found)
+ continue;
+
+ rc = nfit_mem_add(acpi_desc, nfit_mem, spa);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int nfit_mem_cmp(void *priv, struct list_head *__a, struct list_head *__b)
+{
+ struct nfit_mem *a = container_of(__a, typeof(*a), list);
+ struct nfit_mem *b = container_of(__b, typeof(*b), list);
+ u32 handleA, handleB;
+
+ handleA = __to_nfit_memdev(a)->nfit_handle;
+ handleB = __to_nfit_memdev(b)->nfit_handle;
+ if (handleA < handleB)
+ return -1;
+ else if (handleA > handleB)
+ return 1;
+ return 0;
+}
+
+static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_spa *nfit_spa;
+
+ /*
+ * For each SPA-DCR or SPA-PMEM address range find its
+ * corresponding MEMDEV(s). From each MEMDEV find the
+ * corresponding DCR. Then, if we're operating on a SPA-DCR,
+ * try to find a SPA-BDW and a corresponding BDW that references
+ * the DCR. Throw it all into an nfit_mem object. Note, that
+ * BDWs are optional.
+ */
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ int rc;
+
+ rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
+ if (rc)
+ return rc;
+ }
+
+ list_sort(NULL, &acpi_desc->dimms, nfit_mem_cmp);
+
+ return 0;
+}
+
+static int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
+{
+ struct device *dev = acpi_desc->dev;
+ const void *end;
+ u8 *data, sum;
+ acpi_size i;
+
+ INIT_LIST_HEAD(&acpi_desc->spas);
+ INIT_LIST_HEAD(&acpi_desc->dcrs);
+ INIT_LIST_HEAD(&acpi_desc->bdws);
+ INIT_LIST_HEAD(&acpi_desc->memdevs);
+ INIT_LIST_HEAD(&acpi_desc->dimms);
+
+ data = (u8 *) acpi_desc->nfit;
+ for (i = 0, sum = 0; i < sz; i++)
+ sum += readb(data + i);
+ if (sum != 0 && !warn_checksum) {
+ dev_dbg(dev, "%s: nfit checksum failure\n", __func__);
+ return -ENXIO;
+ }
+ WARN_TAINT_ONCE(sum != 0, TAINT_FIRMWARE_WORKAROUND,
+ "nfit checksum failure, continuing...\n");
+
+ end = data + sz;
+ data += sizeof(struct acpi_nfit);
+ while (!IS_ERR_OR_NULL(data))
+ data = add_table(acpi_desc, data, end);
+
+ if (IS_ERR(data)) {
+ dev_dbg(dev, "%s: nfit table parsing error: %ld\n", __func__,
+ PTR_ERR(data));
+ return PTR_ERR(data);
+ }
+
+ if (nfit_mem_init(acpi_desc) != 0)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int nd_acpi_add(struct acpi_device *adev)
+{
+ struct nd_bus_descriptor *nd_desc;
+ struct acpi_nfit_desc *acpi_desc;
+ struct device *dev = &adev->dev;
+ struct acpi_table_header *tbl;
+ acpi_status status = AE_OK;
+ acpi_size sz;
+ int rc;
+
+ status = acpi_get_table_with_size("NFIT", 0, &tbl, &sz);
+ if (ACPI_FAILURE(status)) {
+ dev_err(dev, "failed to find NFIT\n");
+ return -ENXIO;
+ }
+
+ acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
+ if (!acpi_desc)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, acpi_desc);
+ acpi_desc->dev = dev;
+ acpi_desc->nfit = (struct acpi_nfit *) tbl;
+ nd_desc = &acpi_desc->nd_desc;
+ nd_desc->provider_name = "ACPI.NFIT";
+ nd_desc->ndctl = nd_acpi_ctl;
+
+ acpi_desc->nd_bus = nd_bus_register(dev, nd_desc);
+ if (!acpi_desc->nd_bus)
+ return -ENXIO;
+
+ rc = nd_acpi_nfit_init(acpi_desc, sz);
+ if (rc) {
+ nd_bus_unregister(acpi_desc->nd_bus);
+ return rc;
+ }
+ return 0;
+}
+
+static int nd_acpi_remove(struct acpi_device *adev)
+{
+ struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
+
+ nd_bus_unregister(acpi_desc->nd_bus);
+ return 0;
+}
+
+static void nd_acpi_notify(struct acpi_device *adev, u32 event)
+{
+ /* TODO: handle ACPI_NOTIFY_BUS_CHECK notification */
+ dev_dbg(&adev->dev, "%s: event: %d\n", __func__, event);
+}
+
+static const struct acpi_device_id nd_acpi_ids[] = {
+ { "ACPI0012", 0 },
+ { "", 0 },
+};
+MODULE_DEVICE_TABLE(acpi, nd_acpi_ids);
+
+static struct acpi_driver nd_acpi_driver = {
+ .name = KBUILD_MODNAME,
+ .ids = nd_acpi_ids,
+ .flags = ACPI_DRIVER_ALL_NOTIFY_EVENTS,
+ .ops = {
+ .add = nd_acpi_add,
+ .remove = nd_acpi_remove,
+ .notify = nd_acpi_notify
+ },
+};
+
+static __init int nd_acpi_init(void)
+{
+ BUILD_BUG_ON(sizeof(struct acpi_nfit) != 40);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_spa) != 56);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_memdev) != 48);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_idt) != 16);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_smbios) != 8);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_dcr) != 80);
+ BUILD_BUG_ON(sizeof(struct acpi_nfit_bdw) != 40);
+
+ return acpi_bus_register_driver(&nd_acpi_driver);
+}
+
+static __exit void nd_acpi_exit(void)
+{
+ acpi_bus_unregister_driver(&nd_acpi_driver);
+}
+
+module_init(nd_acpi_init);
+module_exit(nd_acpi_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
new file mode 100644
index 000000000000..e0b0f12736bf
--- /dev/null
+++ b/drivers/block/nd/acpi_nfit.h
@@ -0,0 +1,254 @@
+/*
+ * NVDIMM Firmware Interface Table - NFIT
+ *
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __NFIT_H__
+#define __NFIT_H__
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <linux/acpi.h>
+#include "libnd.h"
+
+static const uuid_le nfit_spa_uuid_volatile __maybe_unused = UUID_LE(0x7305944f,
+ 0xfdda, 0x44e3, 0xb1, 0x6c, 0x3f, 0x22, 0xd2, 0x52, 0xe5, 0xd0);
+
+static const uuid_le nfit_spa_uuid_pm __maybe_unused = UUID_LE(0x66f0d379,
+ 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb);
+
+static const uuid_le nfit_spa_uuid_dcr __maybe_unused = UUID_LE(0x92f701f6,
+ 0x13b4, 0x405d, 0x91, 0x0b, 0x29, 0x93, 0x67, 0xe8, 0x23, 0x4c);
+
+static const uuid_le nfit_spa_uuid_bdw __maybe_unused = UUID_LE(0x91af0530,
+ 0x5d86, 0x470e, 0xa6, 0xb0, 0x0a, 0x2d, 0xb9, 0x40, 0x82, 0x49);
+
+static const uuid_le nfit_spa_uuid_vdisk __maybe_unused = UUID_LE(0x77ab535a,
+ 0x45fc, 0x624b, 0x55, 0x60, 0xf7, 0xb2, 0x81, 0xd1, 0xf9, 0x6e);
+
+static const uuid_le nfit_spa_uuid_vcd __maybe_unused = UUID_LE(0x3d5abd30,
+ 0x4175, 0x87ce, 0x6d, 0x64, 0xd2, 0xad, 0xe5, 0x23, 0xc4, 0xbb);
+
+static const uuid_le nfit_spa_uuid_pdisk __maybe_unused = UUID_LE(0x5cea02c9,
+ 0x4d07, 0x69d3, 0x26, 0x9f, 0x44, 0x96, 0xfb, 0xe0, 0x96, 0xf9);
+
+static const uuid_le nfit_spa_uuid_pcd __maybe_unused = UUID_LE(0x08018188,
+ 0x42cd, 0xbb48, 0x10, 0x0f, 0x53, 0x87, 0xd5, 0x3d, 0xed, 0x3d);
+
+enum {
+ NFIT_TABLE_SPA = 0,
+ NFIT_TABLE_MEM = 1,
+ NFIT_TABLE_IDT = 2,
+ NFIT_TABLE_SMBIOS = 3,
+ NFIT_TABLE_DCR = 4,
+ NFIT_TABLE_BDW = 5,
+ NFIT_TABLE_FLUSH = 6,
+ NFIT_SPA_VOLATILE = 0,
+ NFIT_SPA_PM = 1,
+ NFIT_SPA_DCR = 2,
+ NFIT_SPA_BDW = 3,
+ NFIT_SPA_VDISK = 4,
+ NFIT_SPA_VCD = 5,
+ NFIT_SPA_PDISK = 6,
+ NFIT_SPA_PCD = 7,
+ NFIT_SPAF_DCR_HOT_ADD = 1 << 0,
+ NFIT_SPAF_PDVALID = 1 << 1,
+ NFIT_MEMF_SAVE_FAIL = 1 << 0,
+ NFIT_MEMF_RESTORE_FAIL = 1 << 1,
+ NFIT_MEMF_FLUSH_FAIL = 1 << 2,
+ NFIT_MEMF_UNARMED = 1 << 3,
+ NFIT_MEMF_NOTIFY_SMART = 1 << 4,
+ NFIT_MEMF_SMART_READY = 1 << 5,
+ NFIT_DCRF_BUFFERED = 1 << 0,
+};
+
+/**
+ * struct acpi_nfit - Nvdimm Firmware Interface Table
+ * @signature: "NFIT"
+ * @length: sum of size of this table plus all appended subtables
+ */
+struct acpi_nfit {
+ u8 signature[4];
+ u32 length;
+ u8 revision;
+ u8 checksum;
+ u8 oemid[6];
+ u64 oem_tbl_id;
+ u32 oem_revision;
+ u32 creator_id;
+ u32 creator_revision;
+ u32 reserved;
+};
+
+/**
+ * struct acpi_nfit_spa - System Physical Address Range Descriptor Table
+ */
+struct acpi_nfit_spa {
+ u16 type;
+ u16 length;
+ u16 spa_index;
+ u16 flags;
+ u32 reserved;
+ u32 proximity_domain;
+ u8 type_uuid[16];
+ u64 spa_base;
+ u64 spa_length;
+ u64 mem_attr;
+};
+
+/**
+ * struct acpi_nfit_mem - Memory Device to SPA Mapping Table
+ */
+struct acpi_nfit_memdev {
+ u16 type;
+ u16 length;
+ u32 nfit_handle;
+ u16 phys_id;
+ u16 region_id;
+ u16 spa_index;
+ u16 dcr_index;
+ u64 region_len;
+ u64 region_spa_offset;
+ u64 region_dpa;
+ u16 idt_index;
+ u16 interleave_ways;
+ u16 flags;
+ u16 reserved;
+};
+
+/**
+ * struct acpi_nfit_idt - Interleave description Table
+ */
+struct acpi_nfit_idt {
+ u16 type;
+ u16 length;
+ u16 idt_index;
+ u16 reserved;
+ u32 num_lines;
+ u32 line_size;
+ u32 line_offset[0];
+};
+
+/**
+ * struct acpi_nfit_smbios - SMBIOS Management Information Table
+ */
+struct acpi_nfit_smbios {
+ u16 type;
+ u16 length;
+ u32 reserved;
+ u8 data[0];
+};
+
+/**
+ * struct acpi_nfit_dcr - NVDIMM Control Region Table
+ * @fic: Format Interface Code
+ * @cmd_offset: command registers relative to block control window
+ * @status_offset: status registers relative to block control window
+ */
+struct acpi_nfit_dcr {
+ u16 type;
+ u16 length;
+ u16 dcr_index;
+ u16 vendor_id;
+ u16 device_id;
+ u16 revision_id;
+ u16 sub_vendor_id;
+ u16 sub_device_id;
+ u16 sub_revision_id;
+ u8 reserved[6];
+ u32 serial_number;
+ u16 fic;
+ u16 num_bcw;
+ u64 bcw_size;
+ u64 cmd_offset;
+ u64 cmd_size;
+ u64 status_offset;
+ u64 status_size;
+ u16 flags;
+ u8 reserved2[6];
+};
+
+/**
+ * struct acpi_nfit_bdw - NVDIMM Block Data Window Region Table
+ */
+struct acpi_nfit_bdw {
+ u16 type;
+ u16 length;
+ u16 dcr_index;
+ u16 num_bdw;
+ u64 bdw_offset;
+ u64 bdw_size;
+ u64 blk_capacity;
+ u64 blk_offset;
+};
+
+/**
+ * struct acpi_nfit_flush - Flush Hint Address Structure
+ */
+struct acpi_nfit_flush {
+ u16 type;
+ u16 length;
+ u32 nfit_handle;
+ u16 num_hints;
+ u8 reserved[6];
+ u64 hint_addr[0];
+};
+
+struct nfit_spa {
+ struct acpi_nfit_spa *spa;
+ struct list_head list;
+};
+
+struct nfit_dcr {
+ struct acpi_nfit_dcr *dcr;
+ struct list_head list;
+};
+
+struct nfit_bdw {
+ struct acpi_nfit_bdw *bdw;
+ struct list_head list;
+};
+
+struct nfit_memdev {
+ struct acpi_nfit_memdev *memdev;
+ struct list_head list;
+};
+
+/* assembled tables for a given dimm/memory-device */
+struct nfit_mem {
+ struct acpi_nfit_memdev *memdev_dcr;
+ struct acpi_nfit_memdev *memdev_pmem;
+ struct acpi_nfit_dcr *dcr;
+ struct acpi_nfit_bdw *bdw;
+ struct acpi_nfit_spa *spa_dcr;
+ struct acpi_nfit_spa *spa_bdw;
+ struct list_head list;
+};
+
+struct acpi_nfit_desc {
+ struct nd_bus_descriptor nd_desc;
+ struct acpi_nfit *nfit;
+ struct list_head memdevs;
+ struct list_head dimms;
+ struct list_head spas;
+ struct list_head dcrs;
+ struct list_head bdws;
+ struct nd_bus *nd_bus;
+ struct device *dev;
+};
+
+static inline struct acpi_nfit_memdev *__to_nfit_memdev(struct nfit_mem *nfit_mem)
+{
+ if (nfit_mem->memdev_dcr)
+ return nfit_mem->memdev_dcr;
+ return nfit_mem->memdev_pmem;
+}
+#endif /* __NFIT_H__ */
diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
new file mode 100644
index 000000000000..3cccdbc0f3b7
--- /dev/null
+++ b/drivers/block/nd/core.c
@@ -0,0 +1,67 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/export.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include "nd-private.h"
+#include "libnd.h"
+
+static DEFINE_IDA(nd_ida);
+
+static void nd_bus_release(struct device *dev)
+{
+ struct nd_bus *nd_bus = container_of(dev, struct nd_bus, dev);
+
+ ida_simple_remove(&nd_ida, nd_bus->id);
+ kfree(nd_bus);
+}
+
+struct nd_bus *nd_bus_register(struct device *parent,
+ struct nd_bus_descriptor *nd_desc)
+{
+ struct nd_bus *nd_bus = kzalloc(sizeof(*nd_bus), GFP_KERNEL);
+ int rc;
+
+ if (!nd_bus)
+ return NULL;
+ nd_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
+ if (nd_bus->id < 0) {
+ kfree(nd_bus);
+ return NULL;
+ }
+ nd_bus->nd_desc = nd_desc;
+ nd_bus->dev.parent = parent;
+ nd_bus->dev.release = nd_bus_release;
+ dev_set_name(&nd_bus->dev, "ndbus%d", nd_bus->id);
+ rc = device_register(&nd_bus->dev);
+ if (rc) {
+ dev_dbg(&nd_bus->dev, "device registration failed: %d\n", rc);
+ put_device(&nd_bus->dev);
+ return NULL;
+ }
+
+ return nd_bus;
+}
+EXPORT_SYMBOL_GPL(nd_bus_register);
+
+void nd_bus_unregister(struct nd_bus *nd_bus)
+{
+ if (!nd_bus)
+ return;
+ device_unregister(&nd_bus->dev);
+}
+EXPORT_SYMBOL_GPL(nd_bus_unregister);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
new file mode 100644
index 000000000000..163832937e9c
--- /dev/null
+++ b/drivers/block/nd/libnd.h
@@ -0,0 +1,33 @@
+/*
+ * libnd - Non-volatile-memory Devices Subsystem
+ *
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __LIBND_H__
+#define __LIBND_H__
+struct nd_dimm;
+struct nd_bus_descriptor;
+typedef int (*ndctl_fn)(struct nd_bus_descriptor *nd_desc,
+ struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
+ unsigned int buf_len);
+
+struct nd_bus_descriptor {
+ unsigned long dsm_mask;
+ char *provider_name;
+ ndctl_fn ndctl;
+};
+
+struct nd_bus;
+struct nd_bus *nd_bus_register(struct device *parent,
+ struct nd_bus_descriptor *nfit_desc);
+void nd_bus_unregister(struct nd_bus *nd_bus);
+#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
new file mode 100644
index 000000000000..3dbab29fa0f9
--- /dev/null
+++ b/drivers/block/nd/nd-private.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ND_PRIVATE_H__
+#define __ND_PRIVATE_H__
+#include <linux/device.h>
+#include "libnd.h"
+
+struct nd_bus {
+ struct nd_bus_descriptor *nd_desc;
+ struct device dev;
+ int id;
+};
+#endif /* __ND_PRIVATE_H__ */
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-05-15 20:25 ` [Linux-nvdimm] " Jeff Moyer
2015-04-28 18:24 ` [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes Dan Williams
` (7 subsequent siblings)
9 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm; +Cc: linux-acpi, Rafael J. Wysocki, Robert Moore, linux-kernel
Manually create and register NFITs to describe 2 topologies. Topology1
is an advanced plausible configuration for BLK/PMEM aliased NVDIMMs.
Topology2 is an example configuration for current platforms that only
ship with a persistent address range.
Kernel provider "nfit_test.0" produces an NFIT with the following attributes:
(a) (b) DIMM BLK-REGION
+-------------------+--------+--------+--------+
+------+ | pm0.0 | blk2.0 | pm1.0 | blk2.1 | 0 region2
| imc0 +--+- - - region0- - - +--------+ +--------+
+--+---+ | pm0.0 | blk3.0 | pm1.0 | blk3.1 | 1 region3
| +-------------------+--------v v--------+
+--+---+ | |
| cpu0 | region1
+--+---+ | |
| +----------------------------^ ^--------+
+--+---+ | blk4.0 | pm1.0 | blk4.0 | 2 region4
| imc1 +--+----------------------------| +--------+
+------+ | blk5.0 | pm1.0 | blk5.0 | 3 region5
+----------------------------+--------+--------+
*) In this layout we have four dimms and two memory controllers in one
socket. Each unique interface ("block" or "pmem") to DPA space
is identified by a region device with a dynamically assigned id.
*) The first portion of dimm0 and dimm1 are interleaved as REGION0.
A single "pmem" namespace is created in the REGION0-"spa"-range
that spans dimm0 and dimm1 with a user-specified name of "pm0.0".
Some of that interleaved "spa" range is reclaimed as "bdw"
accessed space starting at offset (a) into each dimm. In that
reclaimed space we create two "bdw" "namespaces" from REGION2 and
REGION3 where "blk2.0" and "blk3.0" are just human readable names
that could be set to any user-desired name in the label.
*) In the last portion of dimm0 and dimm1 we have an interleaved
"spa" range, REGION1, that spans those two dimms as well as dimm2
and dimm3. Some of REGION1 allocated to a "pmem" namespace named
"pm1.0" the rest is reclaimed in 4 "bdw" namespaces (for each
dimm in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
"blk5.0".
*) The portion of dimm2 and dimm3 that do not participate in the
REGION1 interleaved "spa" range (i.e. the DPA address below
offset (b) are also included in the "blk4.0" and "blk5.0"
namespaces. Note, that this example shows that "bdw" namespaces
don't need to be contiguous in DPA-space.
Kernel provider "nfit_test.1" produces an NFIT with the following attributes:
region2
+---------------------+
|---------------------|
|| pm2.0 ||
|---------------------|
+---------------------+
*) Describes a simple system-physical-address range with no backing
dimm or interleave description.
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/Kconfig | 19 +
drivers/block/nd/Makefile | 15 +
drivers/block/nd/acpi.c | 3
drivers/block/nd/acpi_nfit.h | 11
drivers/block/nd/test/Makefile | 5
drivers/block/nd/test/iomap.c | 151 +++++
drivers/block/nd/test/nfit.c | 1025 +++++++++++++++++++++++++++++++++++++
drivers/block/nd/test/nfit_test.h | 26 +
8 files changed, 1254 insertions(+), 1 deletion(-)
create mode 100644 drivers/block/nd/test/Makefile
create mode 100644 drivers/block/nd/test/iomap.c
create mode 100644 drivers/block/nd/test/nfit.c
create mode 100644 drivers/block/nd/test/nfit_test.h
diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
index 6d5d6b732f82..09f0135147ca 100644
--- a/drivers/block/nd/Kconfig
+++ b/drivers/block/nd/Kconfig
@@ -37,4 +37,23 @@ config ND_ACPI
addition to storage devices this also enables libnd craft
ACPI._DSM messages for platform/dimm configuration.
+config NFIT_TEST
+ tristate "NFIT TEST: Manufactured NFIT for interface testing"
+ depends on DMA_CMA
+ depends on LIBND=m
+ depends on ND_ACPI
+ depends on m
+ help
+ For development purposes register a manufactured
+ NFIT table to verify the resulting device model topology.
+ Note, this module arranges for ioremap_cache() to be
+ overridden locally to allow simulation of system-memory as an
+ io-memory-resource.
+
+ Note, this test expects to be able to find at least 256MB of
+ CMA space (CONFIG_CMA_SIZE_MBYTES, cma=) or it will fail to
+ load.
+
+ Say N unless you are doing development of the 'nd' subsystem.
+
endif
diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
index 944b5947c0cb..cf064db92589 100644
--- a/drivers/block/nd/Makefile
+++ b/drivers/block/nd/Makefile
@@ -1,5 +1,20 @@
+ifdef CONFIG_NFIT_TEST
+# This obviously will cause symbol collisions if another
+# driver/sub-system attempts a similar mocked io-memory implementation.
+# When that happens we can either add a 'choice' kconfig option to
+# select one mocked instance at a time, or push for the linker to
+# include an option of the form "--wrap-prefix=<prefix>" to allow for
+# separate namespaces of mocked functions.
+ldflags-y += --wrap=ioremap_cache
+ldflags-y += --wrap=ioremap_nocache
+ldflags-y += --wrap=iounmap
+ldflags-y += --wrap=__request_region
+ldflags-y += --wrap=__release_region
+endif
+
obj-$(CONFIG_LIBND) += libnd.o
obj-$(CONFIG_ND_ACPI) += nd_acpi.o
+obj-$(CONFIG_NFIT_TEST) += test/
nd_acpi-y := acpi.o
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index 9f0b24390d1b..54344ef9c837 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -341,7 +341,7 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
return 0;
}
-static int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
+int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
const void *end;
@@ -380,6 +380,7 @@ static int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
return 0;
}
+EXPORT_SYMBOL_GPL(nd_acpi_nfit_init);
static int nd_acpi_add(struct acpi_device *adev)
{
diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
index e0b0f12736bf..a26f69e32244 100644
--- a/drivers/block/nd/acpi_nfit.h
+++ b/drivers/block/nd/acpi_nfit.h
@@ -124,6 +124,15 @@ struct acpi_nfit_memdev {
u16 reserved;
};
+#define NFIT_DIMM_HANDLE(node, socket, imc, chan, dimm) \
+ (((node & 0xfff) << 16) | ((socket & 0xf) << 12) \
+ | ((imc & 0xf) << 8) | ((chan & 0xf) << 4) | (dimm & 0xf))
+#define NFIT_DIMM_NODE(handle) ((handle) >> 16 & 0xfff)
+#define NFIT_DIMM_SOCKET(handle) ((handle) >> 12 & 0xf)
+#define NFIT_DIMM_CHAN(handle) ((handle) >> 8 & 0xf)
+#define NFIT_DIMM_IMC(handle) ((handle) >> 4 & 0xf)
+#define NFIT_DIMM_DIMM(handle) ((handle) & 0xf)
+
/**
* struct acpi_nfit_idt - Interleave description Table
*/
@@ -251,4 +260,6 @@ static inline struct acpi_nfit_memdev *__to_nfit_memdev(struct nfit_mem *nfit_me
return nfit_mem->memdev_dcr;
return nfit_mem->memdev_pmem;
}
+
+int nd_acpi_nfit_init(struct acpi_nfit_desc *nfit, acpi_size sz);
#endif /* __NFIT_H__ */
diff --git a/drivers/block/nd/test/Makefile b/drivers/block/nd/test/Makefile
new file mode 100644
index 000000000000..c7f319cbd082
--- /dev/null
+++ b/drivers/block/nd/test/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_NFIT_TEST) += nfit_test.o
+obj-$(CONFIG_NFIT_TEST) += nfit_test_iomap.o
+
+nfit_test-y := nfit.o
+nfit_test_iomap-y := iomap.o
diff --git a/drivers/block/nd/test/iomap.c b/drivers/block/nd/test/iomap.c
new file mode 100644
index 000000000000..c85a6f6ba559
--- /dev/null
+++ b/drivers/block/nd/test/iomap.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/rculist.h>
+#include <linux/export.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/io.h>
+#include "nfit_test.h"
+
+static LIST_HEAD(iomap_head);
+
+static struct iomap_ops {
+ nfit_test_lookup_fn nfit_test_lookup;
+ struct list_head list;
+} iomap_ops = {
+ .list = LIST_HEAD_INIT(iomap_ops.list),
+};
+
+void nfit_test_setup(nfit_test_lookup_fn lookup)
+{
+ iomap_ops.nfit_test_lookup = lookup;
+ list_add_rcu(&iomap_ops.list, &iomap_head);
+}
+EXPORT_SYMBOL(nfit_test_setup);
+
+void nfit_test_teardown(void)
+{
+ list_del_rcu(&iomap_ops.list);
+ synchronize_rcu();
+}
+EXPORT_SYMBOL(nfit_test_teardown);
+
+static struct nfit_test_resource *get_nfit_res(resource_size_t resource)
+{
+ struct iomap_ops *ops;
+
+ ops = list_first_or_null_rcu(&iomap_head, typeof(*ops), list);
+ if (ops)
+ return ops->nfit_test_lookup(resource);
+ return NULL;
+}
+
+void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned long size,
+ void __iomem *(*fallback_fn)(resource_size_t, unsigned long))
+{
+ struct nfit_test_resource *nfit_res;
+
+ rcu_read_lock();
+ nfit_res = get_nfit_res(offset);
+ rcu_read_unlock();
+ if (nfit_res)
+ return (void __iomem *) nfit_res->buf + offset
+ - nfit_res->res->start;
+ return fallback_fn(offset, size);
+}
+
+void __iomem *__wrap_ioremap_cache(resource_size_t offset, unsigned long size)
+{
+ return __nfit_test_ioremap(offset, size, ioremap_cache);
+}
+EXPORT_SYMBOL(__wrap_ioremap_cache);
+
+void __iomem *__wrap_ioremap_nocache(resource_size_t offset, unsigned long size)
+{
+ return __nfit_test_ioremap(offset, size, ioremap_nocache);
+}
+EXPORT_SYMBOL(__wrap_ioremap_nocache);
+
+void __wrap_iounmap(volatile void __iomem *addr)
+{
+ struct nfit_test_resource *nfit_res;
+
+ rcu_read_lock();
+ nfit_res = get_nfit_res((unsigned long) addr);
+ rcu_read_unlock();
+ if (nfit_res)
+ return;
+ return iounmap(addr);
+}
+EXPORT_SYMBOL(__wrap_iounmap);
+
+struct resource *__wrap___request_region(struct resource *parent,
+ resource_size_t start, resource_size_t n, const char *name,
+ int flags)
+{
+ struct nfit_test_resource *nfit_res;
+
+ if (parent == &iomem_resource) {
+ rcu_read_lock();
+ nfit_res = get_nfit_res(start);
+ rcu_read_unlock();
+ if (nfit_res) {
+ struct resource *res = nfit_res->res + 1;
+
+ if (start + n > nfit_res->res->start
+ + resource_size(nfit_res->res)) {
+ pr_debug("%s: start: %llx n: %llx overflow: %pr\n",
+ __func__, start, n,
+ nfit_res->res);
+ return NULL;
+ }
+
+ res->start = start;
+ res->end = start + n - 1;
+ res->name = name;
+ res->flags = resource_type(parent);
+ res->flags |= IORESOURCE_BUSY | flags;
+ pr_debug("%s: %pr\n", __func__, res);
+ return res;
+ }
+ }
+ return __request_region(parent, start, n, name, flags);
+}
+EXPORT_SYMBOL(__wrap___request_region);
+
+void __wrap___release_region(struct resource *parent, resource_size_t start,
+ resource_size_t n)
+{
+ struct nfit_test_resource *nfit_res;
+
+ if (parent == &iomem_resource) {
+ rcu_read_lock();
+ nfit_res = get_nfit_res(start);
+ rcu_read_unlock();
+ if (nfit_res) {
+ struct resource *res = nfit_res->res + 1;
+
+ if (start != res->start || resource_size(res) != n)
+ pr_info("%s: start: %llx n: %llx mismatch: %pr\n",
+ __func__, start, n, res);
+ else
+ memset(res, 0, sizeof(*res));
+ return;
+ }
+ }
+ __release_region(parent, start, n);
+}
+EXPORT_SYMBOL(__wrap___release_region);
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/block/nd/test/nfit.c b/drivers/block/nd/test/nfit.c
new file mode 100644
index 000000000000..8691a903515b
--- /dev/null
+++ b/drivers/block/nd/test/nfit.c
@@ -0,0 +1,1025 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/platform_device.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include "nfit_test.h"
+
+#include "../acpi_nfit.h"
+#include "../libnd.h"
+
+/*
+ * Generate an NFIT table to describe the following topology:
+ *
+ * BUS0: Interleaved PMEM regions, and aliasing with BLK regions
+ *
+ * (a) (b) DIMM BLK-REGION
+ * +----------+--------------+----------+---------+
+ * +------+ | blk2.0 | pm0.0 | blk2.1 | pm1.0 | 0 region2
+ * | imc0 +--+- - - - - region0 - - - -+----------+ +
+ * +--+---+ | blk3.0 | pm0.0 | blk3.1 | pm1.0 | 1 region3
+ * | +----------+--------------v----------v v
+ * +--+---+ | |
+ * | cpu0 | region1
+ * +--+---+ | |
+ * | +-------------------------^----------^ ^
+ * +--+---+ | blk4.0 | pm1.0 | 2 region4
+ * | imc1 +--+-------------------------+----------+ +
+ * +------+ | blk5.0 | pm1.0 | 3 region5
+ * +-------------------------+----------+-+-------+
+ *
+ * *) In this layout we have four dimms and two memory controllers in one
+ * socket. Each unique interface (BLK or PMEM) to DPA space
+ * is identified by a region device with a dynamically assigned id.
+ *
+ * *) The first portion of dimm0 and dimm1 are interleaved as REGION0.
+ * A single PMEM namespace "pm0.0" is created using half of the
+ * REGION0 SPA-range. REGION0 spans dimm0 and dimm1. PMEM namespace
+ * allocate from from the bottom of a region. The unallocated
+ * portion of REGION0 aliases with REGION2 and REGION3. That
+ * unallacted capacity is reclaimed as BLK namespaces ("blk2.0" and
+ * "blk3.0") starting at the base of each DIMM to offset (a) in those
+ * DIMMs. "pm0.0", "blk2.0" and "blk3.0" are free-form readable
+ * names that can be assigned to a namespace.
+ *
+ * *) In the last portion of dimm0 and dimm1 we have an interleaved
+ * SPA range, REGION1, that spans those two dimms as well as dimm2
+ * and dimm3. Some of REGION1 allocated to a PMEM namespace named
+ * "pm1.0" the rest is reclaimed in 4 BLK namespaces (for each
+ * dimm in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
+ * "blk5.0".
+ *
+ * *) The portion of dimm2 and dimm3 that do not participate in the
+ * REGION1 interleaved SPA range (i.e. the DPA address below offset
+ * (b) are also included in the "blk4.0" and "blk5.0" namespaces.
+ * Note, that BLK namespaces need not be contiguous in DPA-space, and
+ * can consume aliased capacity from multiple interleave sets.
+ *
+ * BUS1: Legacy NVDIMM (single contiguous range)
+ *
+ * region2
+ * +---------------------+
+ * |---------------------|
+ * || pm2.0 ||
+ * |---------------------|
+ * +---------------------+
+ *
+ * *) A NFIT-table may describe a simple system-physical-address range
+ * with no BLK aliasing. This type of region may optionally
+ * reference an NVDIMM.
+ */
+enum {
+ NUM_PM = 2,
+ NUM_DCR = 4,
+ NUM_BDW = NUM_DCR,
+ NUM_SPA = NUM_PM + NUM_DCR + NUM_BDW,
+ NUM_MEM = NUM_DCR + NUM_BDW + 2 /* spa0 iset */ + 4 /* spa1 iset */,
+ DIMM_SIZE = SZ_32M,
+ LABEL_SIZE = SZ_128K,
+ SPA0_SIZE = DIMM_SIZE,
+ SPA1_SIZE = DIMM_SIZE*2,
+ SPA2_SIZE = DIMM_SIZE,
+ BDW_SIZE = 64 << 8,
+ DCR_SIZE = 12,
+ NUM_NFITS = 2, /* permit testing multiple NFITs per system */
+};
+
+struct nfit_test_dcr {
+ __le64 bdw_addr;
+ __le32 bdw_status;
+ __u8 aperature[BDW_SIZE];
+};
+
+static u32 handle[NUM_DCR] = {
+ [0] = NFIT_DIMM_HANDLE(0, 0, 0, 0, 0),
+ [1] = NFIT_DIMM_HANDLE(0, 0, 0, 0, 1),
+ [2] = NFIT_DIMM_HANDLE(0, 0, 1, 0, 0),
+ [3] = NFIT_DIMM_HANDLE(0, 0, 1, 0, 1),
+};
+
+struct nfit_test {
+ struct acpi_nfit_desc acpi_desc;
+ struct platform_device pdev;
+ struct list_head resources;
+ void *nfit_buf;
+ dma_addr_t nfit_dma;
+ size_t nfit_size;
+ int num_dcr;
+ int num_pm;
+ void **dimm;
+ dma_addr_t *dimm_dma;
+ void **label;
+ dma_addr_t *label_dma;
+ void **spa_set;
+ dma_addr_t *spa_set_dma;
+ struct nfit_test_dcr **dcr;
+ dma_addr_t *dcr_dma;
+ int (*alloc)(struct nfit_test *t);
+ void (*setup)(struct nfit_test *t);
+};
+
+static struct nfit_test *to_nfit_test(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+
+ return container_of(pdev, struct nfit_test, pdev);
+}
+
+static int nfit_test_ctl(struct nd_bus_descriptor *nd_desc,
+ struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ return -ENOTTY;
+}
+
+static DEFINE_SPINLOCK(nfit_test_lock);
+static struct nfit_test *instances[NUM_NFITS];
+
+static void release_nfit_res(void *data)
+{
+ struct nfit_test_resource *nfit_res = data;
+ struct resource *res = nfit_res->res;
+
+ spin_lock(&nfit_test_lock);
+ list_del(&nfit_res->list);
+ spin_unlock(&nfit_test_lock);
+
+ if (is_vmalloc_addr(nfit_res->buf))
+ vfree(nfit_res->buf);
+ else
+ dma_free_coherent(nfit_res->dev, resource_size(res),
+ nfit_res->buf, res->start);
+ kfree(res);
+ kfree(nfit_res);
+}
+
+static void *__test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma,
+ void *buf)
+{
+ struct device *dev = &t->pdev.dev;
+ struct resource *res = kzalloc(sizeof(*res) * 2, GFP_KERNEL);
+ struct nfit_test_resource *nfit_res = kzalloc(sizeof(*nfit_res),
+ GFP_KERNEL);
+ int rc;
+
+ if (!res || !buf || !nfit_res)
+ goto err;
+ rc = devm_add_action(dev, release_nfit_res, nfit_res);
+ if (rc)
+ goto err;
+ INIT_LIST_HEAD(&nfit_res->list);
+ memset(buf, 0, size);
+ nfit_res->dev = dev;
+ nfit_res->buf = buf;
+ nfit_res->res = res;
+ res->start = *dma;
+ res->end = *dma + size - 1;
+ res->name = "NFIT";
+ spin_lock(&nfit_test_lock);
+ list_add(&nfit_res->list, &t->resources);
+ spin_unlock(&nfit_test_lock);
+
+ return nfit_res->buf;
+ err:
+ if (buf && !is_vmalloc_addr(buf))
+ dma_free_coherent(dev, size, buf, *dma);
+ else if (buf)
+ vfree(buf);
+ kfree(res);
+ kfree(nfit_res);
+ return NULL;
+}
+
+static void *test_alloc(struct nfit_test *t, size_t size, dma_addr_t *dma)
+{
+ void *buf = vmalloc(size);
+
+ *dma = (unsigned long) buf;
+ return __test_alloc(t, size, dma, buf);
+}
+
+static void *test_alloc_coherent(struct nfit_test *t, size_t size, dma_addr_t *dma)
+{
+ struct device *dev = &t->pdev.dev;
+ void *buf = dma_alloc_coherent(dev, size, dma, GFP_KERNEL);
+
+ return __test_alloc(t, size, dma, buf);
+}
+
+static struct nfit_test_resource *nfit_test_lookup(resource_size_t addr)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(instances); i++) {
+ struct nfit_test_resource *n, *nfit_res = NULL;
+ struct nfit_test *t = instances[i];
+
+ if (!t)
+ continue;
+ spin_lock(&nfit_test_lock);
+ list_for_each_entry(n, &t->resources, list) {
+ if (addr >= n->res->start && (addr < n->res->start
+ + resource_size(n->res))) {
+ nfit_res = n;
+ break;
+ } else if (addr >= (unsigned long) n->buf
+ && (addr < (unsigned long) n->buf
+ + resource_size(n->res))) {
+ nfit_res = n;
+ break;
+ }
+ }
+ spin_unlock(&nfit_test_lock);
+ if (nfit_res)
+ return nfit_res;
+ }
+
+ return NULL;
+}
+
+static int nfit_test0_alloc(struct nfit_test *t)
+{
+ size_t nfit_size = sizeof(struct acpi_nfit)
+ + sizeof(struct acpi_nfit_spa) * NUM_SPA
+ + sizeof(struct acpi_nfit_memdev) * NUM_MEM
+ + sizeof(struct acpi_nfit_dcr) * NUM_DCR
+ + sizeof(struct acpi_nfit_bdw) * NUM_BDW;
+ int i;
+
+ t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
+ if (!t->nfit_buf)
+ return -ENOMEM;
+ t->nfit_size = nfit_size;
+
+ t->spa_set[0] = test_alloc_coherent(t, SPA0_SIZE, &t->spa_set_dma[0]);
+ if (!t->spa_set[0])
+ return -ENOMEM;
+
+ t->spa_set[1] = test_alloc_coherent(t, SPA1_SIZE, &t->spa_set_dma[1]);
+ if (!t->spa_set[1])
+ return -ENOMEM;
+
+ for (i = 0; i < NUM_DCR; i++) {
+ t->dimm[i] = test_alloc(t, DIMM_SIZE, &t->dimm_dma[i]);
+ if (!t->dimm[i])
+ return -ENOMEM;
+
+ t->label[i] = test_alloc(t, LABEL_SIZE, &t->label_dma[i]);
+ if (!t->label[i])
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < NUM_DCR; i++) {
+ t->dcr[i] = test_alloc(t, LABEL_SIZE, &t->dcr_dma[i]);
+ if (!t->dcr[i])
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static u8 nfit_checksum(void *buf, size_t size)
+{
+ u8 sum, *data = buf;
+ size_t i;
+
+ for (sum = 0, i = 0; i < size; i++)
+ sum += data[i];
+ return 0 - sum;
+}
+
+static int nfit_test1_alloc(struct nfit_test *t)
+{
+ size_t nfit_size = sizeof(struct acpi_nfit)
+ + sizeof(struct acpi_nfit_spa) + sizeof(struct acpi_nfit_memdev)
+ + sizeof(struct acpi_nfit_dcr);
+
+ t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
+ if (!t->nfit_buf)
+ return -ENOMEM;
+ t->nfit_size = nfit_size;
+
+ t->spa_set[0] = test_alloc_coherent(t, SPA2_SIZE, &t->spa_set_dma[0]);
+ if (!t->spa_set[0])
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void nfit_test0_setup(struct nfit_test *t)
+{
+ struct nd_bus_descriptor *nd_desc;
+ struct acpi_nfit_memdev *memdev;
+ void *nfit_buf = t->nfit_buf;
+ size_t size = t->nfit_size;
+ struct acpi_nfit_spa *spa;
+ struct acpi_nfit_dcr *dcr;
+ struct acpi_nfit_bdw *bdw;
+ struct acpi_nfit *nfit;
+ unsigned int offset;
+
+ /* nfit header */
+ nfit = nfit_buf;
+ memcpy(nfit->signature, "NFIT", 4);
+ nfit->length = size;
+ nfit->revision = 1;
+ memcpy(nfit->oemid, "NDTEST", 6);
+ nfit->oem_tbl_id = 0x1234;
+ nfit->oem_revision = 1;
+ nfit->creator_id = 0xabcd0000;
+ nfit->creator_revision = 1;
+
+ /*
+ * spa0 (interleave first half of dimm0 and dimm1, note storage
+ * does not actually alias the related block-data-window
+ * regions)
+ */
+ spa = nfit_buf + sizeof(*nfit);
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_pm, 16);
+ spa->spa_index = 0+1;
+ spa->spa_base = t->spa_set_dma[0];
+ spa->spa_length = SPA0_SIZE;
+
+ /*
+ * spa1 (interleave last half of the 4 DIMMS, note storage
+ * does not actually alias the related block-data-window
+ * regions)
+ */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa);
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_pm, 16);
+ spa->spa_index = 1+1;
+ spa->spa_base = t->spa_set_dma[1];
+ spa->spa_length = SPA1_SIZE;
+
+ /* spa2 (dcr0) dimm0 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 2;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_dcr, 16);
+ spa->spa_index = 2+1;
+ spa->spa_base = t->dcr_dma[0];
+ spa->spa_length = DCR_SIZE;
+
+ /* spa3 (dcr1) dimm1 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 3;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_dcr, 16);
+ spa->spa_index = 3+1;
+ spa->spa_base = t->dcr_dma[1];
+ spa->spa_length = DCR_SIZE;
+
+ /* spa4 (dcr2) dimm2 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 4;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_dcr, 16);
+ spa->spa_index = 4+1;
+ spa->spa_base = t->dcr_dma[2];
+ spa->spa_length = DCR_SIZE;
+
+ /* spa5 (dcr3) dimm3 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 5;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_dcr, 16);
+ spa->spa_index = 5+1;
+ spa->spa_base = t->dcr_dma[3];
+ spa->spa_length = DCR_SIZE;
+
+ /* spa6 (bdw for dcr0) dimm0 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 6;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_bdw, 16);
+ spa->spa_index = 6+1;
+ spa->spa_base = t->dimm_dma[0];
+ spa->spa_length = DIMM_SIZE;
+
+ /* spa7 (bdw for dcr1) dimm1 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 7;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_bdw, 16);
+ spa->spa_index = 7+1;
+ spa->spa_base = t->dimm_dma[1];
+ spa->spa_length = DIMM_SIZE;
+
+ /* spa8 (bdw for dcr2) dimm2 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 8;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_bdw, 16);
+ spa->spa_index = 8+1;
+ spa->spa_base = t->dimm_dma[2];
+ spa->spa_length = DIMM_SIZE;
+
+ /* spa9 (bdw for dcr3) dimm3 */
+ spa = nfit_buf + sizeof(*nfit) + sizeof(*spa) * 9;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_bdw, 16);
+ spa->spa_index = 9+1;
+ spa->spa_base = t->dimm_dma[3];
+ spa->spa_length = DIMM_SIZE;
+
+ offset = sizeof(*nfit) + sizeof(*spa) * 10;
+ /* mem-region0 (spa0, dimm0) */
+ memdev = nfit_buf + offset;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[0];
+ memdev->phys_id = 0;
+ memdev->region_id = 0;
+ memdev->spa_index = 0+1;
+ memdev->dcr_index = 0+1;
+ memdev->region_len = SPA0_SIZE/2;
+ memdev->region_spa_offset = t->spa_set_dma[0];
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 2;
+
+ /* mem-region1 (spa0, dimm1) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev);
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[1];
+ memdev->phys_id = 1;
+ memdev->region_id = 0;
+ memdev->spa_index = 0+1;
+ memdev->dcr_index = 1+1;
+ memdev->region_len = SPA0_SIZE/2;
+ memdev->region_spa_offset = t->spa_set_dma[0] + SPA0_SIZE/2;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 2;
+
+ /* mem-region2 (spa1, dimm0) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 2;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[0];
+ memdev->phys_id = 0;
+ memdev->region_id = 1;
+ memdev->spa_index = 1+1;
+ memdev->dcr_index = 0+1;
+ memdev->region_len = SPA1_SIZE/4;
+ memdev->region_spa_offset = t->spa_set_dma[1];
+ memdev->region_dpa = SPA0_SIZE/2;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 4;
+
+ /* mem-region3 (spa1, dimm1) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 3;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[1];
+ memdev->phys_id = 1;
+ memdev->region_id = 1;
+ memdev->spa_index = 1+1;
+ memdev->dcr_index = 1+1;
+ memdev->region_len = SPA1_SIZE/4;
+ memdev->region_spa_offset = t->spa_set_dma[1] + SPA1_SIZE/4;
+ memdev->region_dpa = SPA0_SIZE/2;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 4;
+
+ /* mem-region4 (spa1, dimm2) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 4;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[2];
+ memdev->phys_id = 2;
+ memdev->region_id = 0;
+ memdev->spa_index = 1+1;
+ memdev->dcr_index = 2+1;
+ memdev->region_len = SPA1_SIZE/4;
+ memdev->region_spa_offset = t->spa_set_dma[1] + 2*SPA1_SIZE/4;
+ memdev->region_dpa = SPA0_SIZE/2;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 4;
+
+ /* mem-region5 (spa1, dimm3) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 5;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[3];
+ memdev->phys_id = 3;
+ memdev->region_id = 0;
+ memdev->spa_index = 1+1;
+ memdev->dcr_index = 3+1;
+ memdev->region_len = SPA1_SIZE/4;
+ memdev->region_spa_offset = t->spa_set_dma[1] + 3*SPA1_SIZE/4;
+ memdev->region_dpa = SPA0_SIZE/2;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 4;
+
+ /* mem-region6 (spa/dcr0, dimm0) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 6;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[0];
+ memdev->phys_id = 0;
+ memdev->region_id = 0;
+ memdev->spa_index = 2+1;
+ memdev->dcr_index = 0+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region7 (spa/dcr1, dimm1) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 7;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[1];
+ memdev->phys_id = 1;
+ memdev->region_id = 0;
+ memdev->spa_index = 3+1;
+ memdev->dcr_index = 1+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region8 (spa/dcr2, dimm2) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 8;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[2];
+ memdev->phys_id = 2;
+ memdev->region_id = 0;
+ memdev->spa_index = 4+1;
+ memdev->dcr_index = 2+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region9 (spa/dcr3, dimm3) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 9;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[3];
+ memdev->phys_id = 3;
+ memdev->region_id = 0;
+ memdev->spa_index = 5+1;
+ memdev->dcr_index = 3+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region10 (spa/bdw0, dimm0) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 10;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[0];
+ memdev->phys_id = 0;
+ memdev->region_id = 0;
+ memdev->spa_index = 6+1;
+ memdev->dcr_index = 0+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region11 (spa/bdw1, dimm1) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 11;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[1];
+ memdev->phys_id = 1;
+ memdev->region_id = 0;
+ memdev->spa_index = 7+1;
+ memdev->dcr_index = 1+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region12 (spa/bdw2, dimm2) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 12;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[2];
+ memdev->phys_id = 2;
+ memdev->region_id = 0;
+ memdev->spa_index = 8+1;
+ memdev->dcr_index = 2+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ /* mem-region13 (spa/dcr3, dimm3) */
+ memdev = nfit_buf + offset + sizeof(struct acpi_nfit_memdev) * 13;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = handle[3];
+ memdev->phys_id = 3;
+ memdev->region_id = 0;
+ memdev->spa_index = 9+1;
+ memdev->dcr_index = 3+1;
+ memdev->region_len = 0;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ offset = offset + sizeof(struct acpi_nfit_memdev) * 14;
+ /* dcr-descriptor0 */
+ dcr = nfit_buf + offset;
+ dcr->type = NFIT_TABLE_DCR;
+ dcr->length = sizeof(struct acpi_nfit_dcr);
+ dcr->dcr_index = 0+1;
+ dcr->vendor_id = 0xabcd;
+ dcr->device_id = 0;
+ dcr->revision_id = 1;
+ dcr->serial_number = ~handle[0];
+ dcr->num_bcw = 1;
+ dcr->bcw_size = DCR_SIZE;
+ dcr->cmd_offset = 0;
+ dcr->cmd_size = 8;
+ dcr->status_offset = 8;
+ dcr->status_size = 4;
+
+ /* dcr-descriptor1 */
+ dcr = nfit_buf + offset + sizeof(struct acpi_nfit_dcr);
+ dcr->type = NFIT_TABLE_DCR;
+ dcr->length = sizeof(struct acpi_nfit_dcr);
+ dcr->dcr_index = 1+1;
+ dcr->vendor_id = 0xabcd;
+ dcr->device_id = 0;
+ dcr->revision_id = 1;
+ dcr->serial_number = ~handle[1];
+ dcr->num_bcw = 1;
+ dcr->bcw_size = DCR_SIZE;
+ dcr->cmd_offset = 0;
+ dcr->cmd_size = 8;
+ dcr->status_offset = 8;
+ dcr->status_size = 4;
+
+ /* dcr-descriptor2 */
+ dcr = nfit_buf + offset + sizeof(struct acpi_nfit_dcr) * 2;
+ dcr->type = NFIT_TABLE_DCR;
+ dcr->length = sizeof(struct acpi_nfit_dcr);
+ dcr->dcr_index = 2+1;
+ dcr->vendor_id = 0xabcd;
+ dcr->device_id = 0;
+ dcr->revision_id = 1;
+ dcr->serial_number = ~handle[2];
+ dcr->num_bcw = 1;
+ dcr->bcw_size = DCR_SIZE;
+ dcr->cmd_offset = 0;
+ dcr->cmd_size = 8;
+ dcr->status_offset = 8;
+ dcr->status_size = 4;
+
+ /* dcr-descriptor3 */
+ dcr = nfit_buf + offset + sizeof(struct acpi_nfit_dcr) * 3;
+ dcr->type = NFIT_TABLE_DCR;
+ dcr->length = sizeof(struct acpi_nfit_dcr);
+ dcr->dcr_index = 3+1;
+ dcr->vendor_id = 0xabcd;
+ dcr->device_id = 0;
+ dcr->revision_id = 1;
+ dcr->serial_number = ~handle[3];
+ dcr->num_bcw = 1;
+ dcr->bcw_size = DCR_SIZE;
+ dcr->cmd_offset = 0;
+ dcr->cmd_size = 8;
+ dcr->status_offset = 8;
+ dcr->status_size = 4;
+
+ offset = offset + sizeof(struct acpi_nfit_dcr) * 4;
+ /* bdw0 (spa/dcr0, dimm0) */
+ bdw = nfit_buf + offset;
+ bdw->type = NFIT_TABLE_BDW;
+ bdw->length = sizeof(struct acpi_nfit_bdw);
+ bdw->dcr_index = 0+1;
+ bdw->num_bdw = 1;
+ bdw->bdw_offset = 0;
+ bdw->bdw_size = BDW_SIZE;
+ bdw->blk_capacity = DIMM_SIZE;
+ bdw->blk_offset = 0;
+
+ /* bdw1 (spa/dcr1, dimm1) */
+ bdw = nfit_buf + offset + sizeof(struct acpi_nfit_bdw);
+ bdw->type = NFIT_TABLE_BDW;
+ bdw->length = sizeof(struct acpi_nfit_bdw);
+ bdw->dcr_index = 1+1;
+ bdw->num_bdw = 1;
+ bdw->bdw_offset = 0;
+ bdw->bdw_size = BDW_SIZE;
+ bdw->blk_capacity = DIMM_SIZE;
+ bdw->blk_offset = 0;
+
+ /* bdw2 (spa/dcr2, dimm2) */
+ bdw = nfit_buf + offset + sizeof(struct acpi_nfit_bdw) * 2;
+ bdw->type = NFIT_TABLE_BDW;
+ bdw->length = sizeof(struct acpi_nfit_bdw);
+ bdw->dcr_index = 2+1;
+ bdw->num_bdw = 1;
+ bdw->bdw_offset = 0;
+ bdw->bdw_size = BDW_SIZE;
+ bdw->blk_capacity = DIMM_SIZE;
+ bdw->blk_offset = 0;
+
+ /* bdw3 (spa/dcr3, dimm3) */
+ bdw = nfit_buf + offset + sizeof(struct acpi_nfit_bdw) * 3;
+ bdw->type = NFIT_TABLE_BDW;
+ bdw->length = sizeof(struct acpi_nfit_bdw);
+ bdw->dcr_index = 3+1;
+ bdw->num_bdw = 1;
+ bdw->bdw_offset = 0;
+ bdw->bdw_size = BDW_SIZE;
+ bdw->blk_capacity = DIMM_SIZE;
+ bdw->blk_offset = 0;
+
+ nfit->checksum = nfit_checksum(nfit_buf, size);
+
+ nd_desc = &t->acpi_desc.nd_desc;
+ nd_desc->ndctl = nfit_test_ctl;
+}
+
+static void nfit_test1_setup(struct nfit_test *t)
+{
+ size_t size = t->nfit_size, offset;
+ void *nfit_buf = t->nfit_buf;
+ struct acpi_nfit_memdev *memdev;
+ struct acpi_nfit_dcr *dcr;
+ struct acpi_nfit_spa *spa;
+ struct acpi_nfit *nfit;
+
+ /* nfit header */
+ nfit = nfit_buf;
+ memcpy(nfit->signature, "NFIT", 4);
+ nfit->length = size;
+ nfit->revision = 1;
+ memcpy(nfit->oemid, "NDTEST", 6);
+ nfit->oem_tbl_id = 0x1234;
+ nfit->oem_revision = 1;
+ nfit->creator_id = 0xabcd0000;
+ nfit->creator_revision = 1;
+
+ offset = sizeof(*nfit);
+ /* spa0 (flat range with no bdw aliasing) */
+ spa = nfit_buf + offset;
+ spa->type = NFIT_TABLE_SPA;
+ spa->length = sizeof(*spa);
+ memcpy(spa->type_uuid, &nfit_spa_uuid_pm, 16);
+ spa->spa_index = 0+1;
+ spa->spa_base = t->spa_set_dma[0];
+ spa->spa_length = SPA2_SIZE;
+
+ offset += sizeof(*spa);
+ /* mem-region0 (spa0, dimm0) */
+ memdev = nfit_buf + offset;
+ memdev->type = NFIT_TABLE_MEM;
+ memdev->length = sizeof(*memdev);
+ memdev->nfit_handle = 0;
+ memdev->phys_id = 0;
+ memdev->region_id = 0;
+ memdev->spa_index = 0+1;
+ memdev->dcr_index = 0+1;
+ memdev->region_len = SPA2_SIZE;
+ memdev->region_spa_offset = 0;
+ memdev->region_dpa = 0;
+ memdev->idt_index = 0;
+ memdev->interleave_ways = 1;
+
+ offset += sizeof(*memdev);
+ /* dcr-descriptor0 */
+ dcr = nfit_buf + offset;
+ dcr->type = NFIT_TABLE_DCR;
+ dcr->length = sizeof(struct acpi_nfit_dcr);
+ dcr->dcr_index = 0+1;
+ dcr->vendor_id = 0xabcd;
+ dcr->device_id = 0;
+ dcr->revision_id = 1;
+ dcr->serial_number = ~0;
+ dcr->num_bcw = 0;
+ dcr->bcw_size = 0;
+ dcr->cmd_offset = 0;
+ dcr->cmd_size = 0;
+ dcr->status_offset = 0;
+ dcr->status_size = 0;
+
+ nfit->checksum = nfit_checksum(nfit_buf, size);
+}
+
+static int nfit_test_probe(struct platform_device *pdev)
+{
+ struct nd_bus_descriptor *nd_desc;
+ struct acpi_nfit_desc *acpi_desc;
+ struct device *dev = &pdev->dev;
+ struct nfit_test *nfit_test;
+ int rc;
+
+ nfit_test = to_nfit_test(&pdev->dev);
+
+ /* common alloc */
+ if (nfit_test->num_dcr) {
+ int num = nfit_test->num_dcr;
+
+ nfit_test->dimm = devm_kcalloc(dev, num, sizeof(void *), GFP_KERNEL);
+ nfit_test->dimm_dma = devm_kcalloc(dev, num, sizeof(dma_addr_t), GFP_KERNEL);
+ nfit_test->label = devm_kcalloc(dev, num, sizeof(void *), GFP_KERNEL);
+ nfit_test->label_dma = devm_kcalloc(dev, num, sizeof(dma_addr_t), GFP_KERNEL);
+ nfit_test->dcr = devm_kcalloc(dev, num, sizeof(struct nfit_test_dcr *), GFP_KERNEL);
+ nfit_test->dcr_dma = devm_kcalloc(dev, num, sizeof(dma_addr_t), GFP_KERNEL);
+ if (nfit_test->dimm && nfit_test->dimm_dma && nfit_test->label
+ && nfit_test->label_dma && nfit_test->dcr
+ && nfit_test->dcr_dma)
+ /* pass */;
+ else
+ return -ENOMEM;
+ }
+
+ if (nfit_test->num_pm) {
+ int num = nfit_test->num_pm;
+
+ nfit_test->spa_set = devm_kcalloc(dev, num, sizeof(void *), GFP_KERNEL);
+ nfit_test->spa_set_dma = devm_kcalloc(dev, num,
+ sizeof(dma_addr_t), GFP_KERNEL);
+ if (nfit_test->spa_set && nfit_test->spa_set_dma)
+ /* pass */;
+ else
+ return -ENOMEM;
+ }
+
+ /* per-nfit specific alloc */
+ if (nfit_test->alloc(nfit_test))
+ return -ENOMEM;
+
+ nfit_test->setup(nfit_test);
+ acpi_desc = &nfit_test->acpi_desc;
+ acpi_desc->dev = &pdev->dev;
+ acpi_desc->nfit = nfit_test->nfit_buf;
+ nd_desc = &acpi_desc->nd_desc;
+ acpi_desc->nd_bus = nd_bus_register(&pdev->dev, nd_desc);
+ if (!acpi_desc->nd_bus)
+ return -ENXIO;
+
+ rc = nd_acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
+ if (rc) {
+ nd_bus_unregister(acpi_desc->nd_bus);
+ return rc;
+ }
+
+ return 0;
+}
+
+static int nfit_test_remove(struct platform_device *pdev)
+{
+ struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
+ struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
+
+ nd_bus_unregister(acpi_desc->nd_bus);
+
+ return 0;
+}
+
+static void nfit_test_release(struct device *dev)
+{
+ struct nfit_test *nfit_test = to_nfit_test(dev);
+
+ kfree(nfit_test);
+}
+
+static const struct platform_device_id nfit_test_id[] = {
+ { KBUILD_MODNAME },
+ { },
+};
+
+static struct platform_driver nfit_test_driver = {
+ .probe = nfit_test_probe,
+ .remove = nfit_test_remove,
+ .driver = {
+ .name = KBUILD_MODNAME,
+ },
+ .id_table = nfit_test_id,
+};
+
+#ifdef CONFIG_CMA_SIZE_MBYTES
+#define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_MBYTES 0
+#endif
+
+static __init int nfit_test_init(void)
+{
+ int rc, i;
+
+ nfit_test_setup(nfit_test_lookup);
+
+ for (i = 0; i < NUM_NFITS; i++) {
+ struct nfit_test *nfit_test;
+ struct platform_device *pdev;
+ static int once;
+
+ nfit_test = kzalloc(sizeof(*nfit_test), GFP_KERNEL);
+ if (!nfit_test) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+ INIT_LIST_HEAD(&nfit_test->resources);
+ switch (i) {
+ case 0:
+ nfit_test->num_pm = NUM_PM;
+ nfit_test->num_dcr = NUM_DCR;
+ nfit_test->alloc = nfit_test0_alloc;
+ nfit_test->setup = nfit_test0_setup;
+ break;
+ case 1:
+ nfit_test->num_pm = 1;
+ nfit_test->alloc = nfit_test1_alloc;
+ nfit_test->setup = nfit_test1_setup;
+ break;
+ default:
+ rc = -EINVAL;
+ goto err_register;
+ }
+ pdev = &nfit_test->pdev;
+ pdev->name = KBUILD_MODNAME;
+ pdev->id = i;
+ pdev->dev.release = nfit_test_release;
+ rc = platform_device_register(pdev);
+ if (rc) {
+ put_device(&pdev->dev);
+ goto err_register;
+ }
+
+ rc = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ if (rc)
+ goto err_register;
+
+ instances[i] = nfit_test;
+
+ if (!once++) {
+ dma_addr_t dma;
+ void *buf;
+
+ buf = dma_alloc_coherent(&pdev->dev, SZ_128M, &dma,
+ GFP_KERNEL);
+ if (!buf) {
+ rc = -ENOMEM;
+ dev_warn(&pdev->dev, "need 128M of free cma\n");
+ goto err_register;
+ }
+ dma_free_coherent(&pdev->dev, SZ_128M, buf, dma);
+ }
+ }
+
+ rc = platform_driver_register(&nfit_test_driver);
+ if (rc)
+ goto err_register;
+ return 0;
+
+ err_register:
+ for (i = 0; i < NUM_NFITS; i++)
+ if (instances[i])
+ platform_device_unregister(&instances[i]->pdev);
+ nfit_test_teardown();
+ return rc;
+}
+
+static __exit void nfit_test_exit(void)
+{
+ int i;
+
+ platform_driver_unregister(&nfit_test_driver);
+ for (i = 0; i < NUM_NFITS; i++)
+ platform_device_unregister(&instances[i]->pdev);
+ nfit_test_teardown();
+}
+
+module_init(nfit_test_init);
+module_exit(nfit_test_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/block/nd/test/nfit_test.h b/drivers/block/nd/test/nfit_test.h
new file mode 100644
index 000000000000..7b071478eb94
--- /dev/null
+++ b/drivers/block/nd/test/nfit_test.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __NFIT_TEST_H__
+#define __NFIT_TEST_H__
+
+struct nfit_test_resource {
+ struct list_head list;
+ struct resource *res;
+ struct device *dev;
+ void *buf;
+};
+
+typedef struct nfit_test_resource *(*nfit_test_lookup_fn)(resource_size_t);
+void nfit_test_setup(nfit_test_lookup_fn fn);
+void nfit_test_teardown(void);
+#endif
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
2015-04-28 18:24 ` [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-05-15 21:00 ` [Linux-nvdimm] " Jeff Moyer
2015-04-28 18:24 ` [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices Dan Williams
` (6 subsequent siblings)
9 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm
Cc: Neil Brown, Greg KH, Rafael J. Wysocki, Robert Moore,
linux-kernel, linux-acpi
This is the position (device topology) independent method to find all
the libnd buses in the system. The expectation is that there will only
ever be one "nd" bus discovered via /sys/class/nd/ndctl0. However, we
allow for the possibility of multiple buses and they will listed in
discovery order as ndctl0...ndctlN. This character device hosts the
ioctl for passing control messages (inspired by the ACPI-NFIT DSM
interface commands).
Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
a subsequent patch.
Cc: Neil Brown <neilb@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/Makefile | 1
drivers/block/nd/acpi.c | 29 ++++++++++++++
drivers/block/nd/acpi_nfit.h | 5 ++
drivers/block/nd/bus.c | 83 +++++++++++++++++++++++++++++++++++++++
drivers/block/nd/core.c | 87 ++++++++++++++++++++++++++++++++++++++++-
drivers/block/nd/libnd.h | 5 ++
drivers/block/nd/nd-private.h | 6 +++
drivers/block/nd/test/nfit.c | 3 +
8 files changed, 217 insertions(+), 2 deletions(-)
create mode 100644 drivers/block/nd/bus.c
diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
index cf064db92589..7defe18ed009 100644
--- a/drivers/block/nd/Makefile
+++ b/drivers/block/nd/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_NFIT_TEST) += test/
nd_acpi-y := acpi.o
libnd-y := core.o
+libnd-y += bus.o
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index 54344ef9c837..dd8505f766ed 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -341,6 +341,34 @@ static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
return 0;
}
+static ssize_t revision_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_bus *nd_bus = to_nd_bus(dev);
+ struct nd_bus_descriptor *nd_desc = to_nd_desc(nd_bus);
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+ return sprintf(buf, "%d\n", acpi_desc->nfit->revision);
+}
+static DEVICE_ATTR_RO(revision);
+
+static struct attribute *nd_acpi_attributes[] = {
+ &dev_attr_revision.attr,
+ NULL,
+};
+
+static struct attribute_group nd_acpi_attribute_group = {
+ .name = "nfit",
+ .attrs = nd_acpi_attributes,
+};
+
+const struct attribute_group *nd_acpi_attribute_groups[] = {
+ &nd_bus_attribute_group,
+ &nd_acpi_attribute_group,
+ NULL,
+};
+EXPORT_SYMBOL_GPL(nd_acpi_attribute_groups);
+
int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -408,6 +436,7 @@ static int nd_acpi_add(struct acpi_device *adev)
nd_desc = &acpi_desc->nd_desc;
nd_desc->provider_name = "ACPI.NFIT";
nd_desc->ndctl = nd_acpi_ctl;
+ nd_desc->attr_groups = nd_acpi_attribute_groups;
acpi_desc->nd_bus = nd_bus_register(dev, nd_desc);
if (!acpi_desc->nd_bus)
diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
index a26f69e32244..b65745ca3cbc 100644
--- a/drivers/block/nd/acpi_nfit.h
+++ b/drivers/block/nd/acpi_nfit.h
@@ -261,5 +261,10 @@ static inline struct acpi_nfit_memdev *__to_nfit_memdev(struct nfit_mem *nfit_me
return nfit_mem->memdev_pmem;
}
+static inline struct acpi_nfit_desc *to_acpi_desc(struct nd_bus_descriptor *nd_desc)
+{
+ return container_of(nd_desc, struct acpi_nfit_desc, nd_desc);
+}
+
int nd_acpi_nfit_init(struct acpi_nfit_desc *nfit, acpi_size sz);
#endif /* __NFIT_H__ */
diff --git a/drivers/block/nd/bus.c b/drivers/block/nd/bus.c
new file mode 100644
index 000000000000..635f2e926426
--- /dev/null
+++ b/drivers/block/nd/bus.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uaccess.h>
+#include <linux/fcntl.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+#include "nd-private.h"
+
+static int nd_bus_major;
+static struct class *nd_class;
+
+int nd_bus_create_ndctl(struct nd_bus *nd_bus)
+{
+ dev_t devt = MKDEV(nd_bus_major, nd_bus->id);
+ struct device *dev;
+
+ dev = device_create(nd_class, &nd_bus->dev, devt, nd_bus, "ndctl%d",
+ nd_bus->id);
+
+ if (IS_ERR(dev)) {
+ dev_dbg(&nd_bus->dev, "failed to register ndctl%d: %ld\n",
+ nd_bus->id, PTR_ERR(dev));
+ return PTR_ERR(dev);
+ }
+ return 0;
+}
+
+void nd_bus_destroy_ndctl(struct nd_bus *nd_bus)
+{
+ device_destroy(nd_class, MKDEV(nd_bus_major, nd_bus->id));
+}
+
+static long nd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ return -ENXIO;
+}
+
+static const struct file_operations nd_bus_fops = {
+ .owner = THIS_MODULE,
+ .open = nonseekable_open,
+ .unlocked_ioctl = nd_ioctl,
+ .compat_ioctl = nd_ioctl,
+ .llseek = noop_llseek,
+};
+
+int __init nd_bus_init(void)
+{
+ int rc;
+
+ rc = register_chrdev(0, "ndctl", &nd_bus_fops);
+ if (rc < 0)
+ return rc;
+ nd_bus_major = rc;
+
+ nd_class = class_create(THIS_MODULE, "nd");
+ if (IS_ERR(nd_class))
+ goto err_class;
+
+ return 0;
+
+ err_class:
+ unregister_chrdev(nd_bus_major, "ndctl");
+
+ return rc;
+}
+
+void __exit nd_bus_exit(void)
+{
+ class_destroy(nd_class);
+ unregister_chrdev(nd_bus_major, "ndctl");
+}
diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
index 3cccdbc0f3b7..55603ff264ff 100644
--- a/drivers/block/nd/core.c
+++ b/drivers/block/nd/core.c
@@ -13,10 +13,13 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/mutex.h>
#include <linux/slab.h>
#include "nd-private.h"
#include "libnd.h"
+LIST_HEAD(nd_bus_list);
+DEFINE_MUTEX(nd_bus_list_mutex);
static DEFINE_IDA(nd_ida);
static void nd_bus_release(struct device *dev)
@@ -27,6 +30,54 @@ static void nd_bus_release(struct device *dev)
kfree(nd_bus);
}
+struct nd_bus *to_nd_bus(struct device *dev)
+{
+ struct nd_bus *nd_bus = container_of(dev, struct nd_bus, dev);
+
+ WARN_ON(nd_bus->dev.release != nd_bus_release);
+ return nd_bus;
+}
+EXPORT_SYMBOL_GPL(to_nd_bus);
+
+struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus)
+{
+ /* struct nd_bus definition is private to libnd */
+ return nd_bus->nd_desc;
+}
+EXPORT_SYMBOL_GPL(to_nd_desc);
+
+static const char *nd_bus_provider(struct nd_bus *nd_bus)
+{
+ struct nd_bus_descriptor *nd_desc = nd_bus->nd_desc;
+ struct device *parent = nd_bus->dev.parent;
+
+ if (nd_desc->provider_name)
+ return nd_desc->provider_name;
+ else if (parent)
+ return dev_name(parent);
+ else
+ return "unknown";
+}
+
+static ssize_t provider_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_bus *nd_bus = to_nd_bus(dev);
+
+ return sprintf(buf, "%s\n", nd_bus_provider(nd_bus));
+}
+static DEVICE_ATTR_RO(provider);
+
+static struct attribute *nd_bus_attributes[] = {
+ &dev_attr_provider.attr,
+ NULL,
+};
+
+struct attribute_group nd_bus_attribute_group = {
+ .attrs = nd_bus_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_bus_attribute_group);
+
struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nd_desc)
{
@@ -35,6 +86,7 @@ struct nd_bus *nd_bus_register(struct device *parent,
if (!nd_bus)
return NULL;
+ INIT_LIST_HEAD(&nd_bus->list);
nd_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
if (nd_bus->id < 0) {
kfree(nd_bus);
@@ -43,15 +95,26 @@ struct nd_bus *nd_bus_register(struct device *parent,
nd_bus->nd_desc = nd_desc;
nd_bus->dev.parent = parent;
nd_bus->dev.release = nd_bus_release;
+ nd_bus->dev.groups = nd_desc->attr_groups;
dev_set_name(&nd_bus->dev, "ndbus%d", nd_bus->id);
rc = device_register(&nd_bus->dev);
if (rc) {
dev_dbg(&nd_bus->dev, "device registration failed: %d\n", rc);
- put_device(&nd_bus->dev);
- return NULL;
+ goto err;
}
+ rc = nd_bus_create_ndctl(nd_bus);
+ if (rc)
+ goto err;
+
+ mutex_lock(&nd_bus_list_mutex);
+ list_add_tail(&nd_bus->list, &nd_bus_list);
+ mutex_unlock(&nd_bus_list_mutex);
+
return nd_bus;
+ err:
+ put_device(&nd_bus->dev);
+ return NULL;
}
EXPORT_SYMBOL_GPL(nd_bus_register);
@@ -59,9 +122,29 @@ void nd_bus_unregister(struct nd_bus *nd_bus)
{
if (!nd_bus)
return;
+
+ mutex_lock(&nd_bus_list_mutex);
+ list_del_init(&nd_bus->list);
+ mutex_unlock(&nd_bus_list_mutex);
+
+ nd_bus_destroy_ndctl(nd_bus);
+
device_unregister(&nd_bus->dev);
}
EXPORT_SYMBOL_GPL(nd_bus_unregister);
+static __init int libnd_init(void)
+{
+ return nd_bus_init();
+}
+
+static __exit void libnd_exit(void)
+{
+ WARN_ON(!list_empty(&nd_bus_list));
+ nd_bus_exit();
+}
+
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
+module_init(libnd_init);
+module_exit(libnd_exit);
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
index 163832937e9c..86cf3e0573b0 100644
--- a/drivers/block/nd/libnd.h
+++ b/drivers/block/nd/libnd.h
@@ -14,6 +14,8 @@
*/
#ifndef __LIBND_H__
#define __LIBND_H__
+extern struct attribute_group nd_bus_attribute_group;
+
struct nd_dimm;
struct nd_bus_descriptor;
typedef int (*ndctl_fn)(struct nd_bus_descriptor *nd_desc,
@@ -21,6 +23,7 @@ typedef int (*ndctl_fn)(struct nd_bus_descriptor *nd_desc,
unsigned int buf_len);
struct nd_bus_descriptor {
+ const struct attribute_group **attr_groups;
unsigned long dsm_mask;
char *provider_name;
ndctl_fn ndctl;
@@ -30,4 +33,6 @@ struct nd_bus;
struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nfit_desc);
void nd_bus_unregister(struct nd_bus *nd_bus);
+struct nd_bus *to_nd_bus(struct device *dev);
+struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus);
#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index 3dbab29fa0f9..960dd2f29cdd 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -17,7 +17,13 @@
struct nd_bus {
struct nd_bus_descriptor *nd_desc;
+ struct list_head list;
struct device dev;
int id;
};
+
+int __init nd_bus_init(void);
+void __exit nd_bus_exit(void);
+int nd_bus_create_ndctl(struct nd_bus *nd_bus);
+void nd_bus_destroy_ndctl(struct nd_bus *nd_bus);
#endif /* __ND_PRIVATE_H__ */
diff --git a/drivers/block/nd/test/nfit.c b/drivers/block/nd/test/nfit.c
index 8691a903515b..1c79f32376fc 100644
--- a/drivers/block/nd/test/nfit.c
+++ b/drivers/block/nd/test/nfit.c
@@ -833,6 +833,8 @@ static void nfit_test1_setup(struct nfit_test *t)
nfit->checksum = nfit_checksum(nfit_buf, size);
}
+extern const struct attribute_group *nd_acpi_attribute_groups[];
+
static int nfit_test_probe(struct platform_device *pdev)
{
struct nd_bus_descriptor *nd_desc;
@@ -882,6 +884,7 @@ static int nfit_test_probe(struct platform_device *pdev)
acpi_desc->dev = &pdev->dev;
acpi_desc->nfit = nfit_test->nfit_buf;
nd_desc = &acpi_desc->nd_desc;
+ nd_desc->attr_groups = nd_acpi_attribute_groups;
acpi_desc->nd_bus = nd_bus_register(&pdev->dev, nd_desc);
if (!acpi_desc->nd_bus)
return -ENXIO;
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (2 preceding siblings ...)
2015-04-28 18:24 ` [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-05-01 17:48 ` [Linux-nvdimm] " Toshi Kani
2015-04-28 18:24 ` [PATCH v2 06/20] libnd: ndctl.h, the nd ioctl abi Dan Williams
` (5 subsequent siblings)
9 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm
Cc: Neil Brown, Greg KH, Rafael J. Wysocki, Robert Moore,
linux-kernel, linux-acpi
Register the memory devices described in the nfit as libnd 'dimm'
devices on an nd bus. The kernel assigned device id for dimms is
dynamic. If userspace needs a more static identifier it should consult
a provider-specific attribute. In the case where NFIT is the provider,
the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
for this purpose.
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/Makefile | 1
drivers/block/nd/acpi.c | 160 +++++++++++++++++++++++++++++++++++++++++
drivers/block/nd/acpi_nfit.h | 1
drivers/block/nd/bus.c | 14 +++-
drivers/block/nd/core.c | 29 +++++++
drivers/block/nd/dimm_devs.c | 92 ++++++++++++++++++++++++
drivers/block/nd/libnd.h | 11 +++
drivers/block/nd/nd-private.h | 12 +++
8 files changed, 318 insertions(+), 2 deletions(-)
create mode 100644 drivers/block/nd/dimm_devs.c
diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
index 7defe18ed009..35e4c1a7a8ff 100644
--- a/drivers/block/nd/Makefile
+++ b/drivers/block/nd/Makefile
@@ -20,3 +20,4 @@ nd_acpi-y := acpi.o
libnd-y := core.o
libnd-y += bus.o
+libnd-y += dimm_devs.o
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index dd8505f766ed..af6684341c9b 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -369,6 +369,164 @@ const struct attribute_group *nd_acpi_attribute_groups[] = {
};
EXPORT_SYMBOL_GPL(nd_acpi_attribute_groups);
+static struct acpi_nfit_memdev *to_nfit_memdev(struct device *dev)
+{
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+ struct nfit_mem *nfit_mem = nd_dimm_provider_data(nd_dimm);
+
+ return __to_nfit_memdev(nfit_mem);
+}
+
+static struct acpi_nfit_dcr *to_nfit_dcr(struct device *dev)
+{
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+ struct nfit_mem *nfit_mem = nd_dimm_provider_data(nd_dimm);
+
+ return nfit_mem->dcr;
+}
+
+static ssize_t handle_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_memdev *memdev = to_nfit_memdev(dev);
+
+ return sprintf(buf, "%#x\n", memdev->nfit_handle);
+}
+static DEVICE_ATTR_RO(handle);
+
+static ssize_t phys_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_memdev *memdev = to_nfit_memdev(dev);
+
+ return sprintf(buf, "%#x\n", memdev->phys_id);
+}
+static DEVICE_ATTR_RO(phys_id);
+
+static ssize_t vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_dcr *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->vendor_id);
+}
+static DEVICE_ATTR_RO(vendor);
+
+static ssize_t rev_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_dcr *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->revision_id);
+}
+static DEVICE_ATTR_RO(rev_id);
+
+static ssize_t device_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_dcr *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->device_id);
+}
+static DEVICE_ATTR_RO(device);
+
+static ssize_t format_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_dcr *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->fic);
+}
+static DEVICE_ATTR_RO(format);
+
+static ssize_t serial_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct acpi_nfit_dcr *dcr = to_nfit_dcr(dev);
+
+ return sprintf(buf, "%#x\n", dcr->serial_number);
+}
+static DEVICE_ATTR_RO(serial);
+
+static struct attribute *nd_acpi_dimm_attributes[] = {
+ &dev_attr_handle.attr,
+ &dev_attr_phys_id.attr,
+ &dev_attr_vendor.attr,
+ &dev_attr_device.attr,
+ &dev_attr_format.attr,
+ &dev_attr_serial.attr,
+ &dev_attr_rev_id.attr,
+ NULL,
+};
+
+static umode_t nd_acpi_dimm_attr_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+
+ if (to_nfit_dcr(dev))
+ return a->mode;
+ else
+ return 0;
+}
+
+static struct attribute_group nd_acpi_dimm_attribute_group = {
+ .name = "nfit",
+ .attrs = nd_acpi_dimm_attributes,
+ .is_visible = nd_acpi_dimm_attr_visible,
+};
+
+static const struct attribute_group *nd_acpi_dimm_attribute_groups[] = {
+ &nd_acpi_dimm_attribute_group,
+ NULL,
+};
+
+static struct nd_dimm *nd_acpi_dimm_by_handle(struct acpi_nfit_desc *acpi_desc,
+ u32 nfit_handle)
+{
+ struct nfit_mem *nfit_mem;
+
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
+ if (__to_nfit_memdev(nfit_mem)->nfit_handle == nfit_handle)
+ return nfit_mem->nd_dimm;
+
+ return NULL;
+}
+
+static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_mem *nfit_mem;
+
+ list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
+ struct nd_dimm *nd_dimm;
+ unsigned long flags = 0;
+ u32 nfit_handle;
+
+ nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
+ nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
+ if (nd_dimm) {
+ /*
+ * If for some reason we find multiple DCRs the
+ * first one wins
+ */
+ dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
+ nd_dimm_name(nd_dimm));
+ continue;
+ }
+
+ if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+ flags |= NDD_ALIASING;
+
+ nd_dimm = nd_dimm_create(acpi_desc->nd_bus, nfit_mem,
+ nd_acpi_dimm_attribute_groups, flags);
+ if (!nd_dimm)
+ return -ENOMEM;
+
+ nfit_mem->nd_dimm = nd_dimm;
+ }
+
+ return 0;
+}
+
int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -406,7 +564,7 @@ int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
if (nfit_mem_init(acpi_desc) != 0)
return -ENOMEM;
- return 0;
+ return nd_acpi_register_dimms(acpi_desc);
}
EXPORT_SYMBOL_GPL(nd_acpi_nfit_init);
diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
index b65745ca3cbc..00aaaee5953b 100644
--- a/drivers/block/nd/acpi_nfit.h
+++ b/drivers/block/nd/acpi_nfit.h
@@ -233,6 +233,7 @@ struct nfit_memdev {
/* assembled tables for a given dimm/memory-device */
struct nfit_mem {
+ struct nd_dimm *nd_dimm;
struct acpi_nfit_memdev *memdev_dcr;
struct acpi_nfit_memdev *memdev_pmem;
struct acpi_nfit_dcr *dcr;
diff --git a/drivers/block/nd/bus.c b/drivers/block/nd/bus.c
index 635f2e926426..ee56aa1ab2ad 100644
--- a/drivers/block/nd/bus.c
+++ b/drivers/block/nd/bus.c
@@ -13,6 +13,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/uaccess.h>
#include <linux/fcntl.h>
+#include <linux/async.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/io.h>
@@ -21,6 +22,10 @@
static int nd_bus_major;
static struct class *nd_class;
+struct bus_type nd_bus_type = {
+ .name = "nd",
+};
+
int nd_bus_create_ndctl(struct nd_bus *nd_bus)
{
dev_t devt = MKDEV(nd_bus_major, nd_bus->id);
@@ -59,9 +64,13 @@ int __init nd_bus_init(void)
{
int rc;
+ rc = bus_register(&nd_bus_type);
+ if (rc)
+ return rc;
+
rc = register_chrdev(0, "ndctl", &nd_bus_fops);
if (rc < 0)
- return rc;
+ goto err_chrdev;
nd_bus_major = rc;
nd_class = class_create(THIS_MODULE, "nd");
@@ -72,6 +81,8 @@ int __init nd_bus_init(void)
err_class:
unregister_chrdev(nd_bus_major, "ndctl");
+ err_chrdev:
+ bus_unregister(&nd_bus_type);
return rc;
}
@@ -80,4 +91,5 @@ void __exit nd_bus_exit(void)
{
class_destroy(nd_class);
unregister_chrdev(nd_bus_major, "ndctl");
+ bus_unregister(&nd_bus_type);
}
diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
index 55603ff264ff..24046db897c1 100644
--- a/drivers/block/nd/core.c
+++ b/drivers/block/nd/core.c
@@ -46,6 +46,19 @@ struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus)
}
EXPORT_SYMBOL_GPL(to_nd_desc);
+struct nd_bus *walk_to_nd_bus(struct device *nd_dev)
+{
+ struct device *dev;
+
+ for (dev = nd_dev; dev; dev = dev->parent)
+ if (dev->release == nd_bus_release)
+ break;
+ dev_WARN_ONCE(nd_dev, !dev, "invalid dev, not on nd bus\n");
+ if (dev)
+ return to_nd_bus(dev);
+ return NULL;
+}
+
static const char *nd_bus_provider(struct nd_bus *nd_bus)
{
struct nd_bus_descriptor *nd_desc = nd_bus->nd_desc;
@@ -118,6 +131,21 @@ struct nd_bus *nd_bus_register(struct device *parent,
}
EXPORT_SYMBOL_GPL(nd_bus_register);
+static int child_unregister(struct device *dev, void *data)
+{
+ /*
+ * the singular ndctl class device per bus needs to be
+ * "device_destroy"ed, so skip it here
+ *
+ * i.e. remove classless children
+ */
+ if (dev->class)
+ /* pass */;
+ else
+ device_unregister(dev);
+ return 0;
+}
+
void nd_bus_unregister(struct nd_bus *nd_bus)
{
if (!nd_bus)
@@ -127,6 +155,7 @@ void nd_bus_unregister(struct nd_bus *nd_bus)
list_del_init(&nd_bus->list);
mutex_unlock(&nd_bus_list_mutex);
+ device_for_each_child(&nd_bus->dev, NULL, child_unregister);
nd_bus_destroy_ndctl(nd_bus);
device_unregister(&nd_bus->dev);
diff --git a/drivers/block/nd/dimm_devs.c b/drivers/block/nd/dimm_devs.c
new file mode 100644
index 000000000000..19b081392f2f
--- /dev/null
+++ b/drivers/block/nd/dimm_devs.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include "nd-private.h"
+
+static DEFINE_IDA(dimm_ida);
+
+static void nd_dimm_release(struct device *dev)
+{
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+
+ ida_simple_remove(&dimm_ida, nd_dimm->id);
+ kfree(nd_dimm);
+}
+
+static struct device_type nd_dimm_device_type = {
+ .name = "nd_dimm",
+ .release = nd_dimm_release,
+};
+
+static bool is_nd_dimm(struct device *dev)
+{
+ return dev->type == &nd_dimm_device_type;
+}
+
+struct nd_dimm *to_nd_dimm(struct device *dev)
+{
+ struct nd_dimm *nd_dimm = container_of(dev, struct nd_dimm, dev);
+
+ WARN_ON(!is_nd_dimm(dev));
+ return nd_dimm;
+}
+EXPORT_SYMBOL_GPL(to_nd_dimm);
+
+const char *nd_dimm_name(struct nd_dimm *nd_dimm)
+{
+ return dev_name(&nd_dimm->dev);
+}
+EXPORT_SYMBOL_GPL(nd_dimm_name);
+
+void *nd_dimm_provider_data(struct nd_dimm *nd_dimm)
+{
+ return nd_dimm->provider_data;
+}
+EXPORT_SYMBOL_GPL(nd_dimm_provider_data);
+
+struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
+ const struct attribute_group **groups, unsigned long flags)
+{
+ struct nd_dimm *nd_dimm = kzalloc(sizeof(*nd_dimm), GFP_KERNEL);
+ struct device *dev;
+
+ if (!nd_dimm)
+ return NULL;
+
+ nd_dimm->id = ida_simple_get(&dimm_ida, 0, 0, GFP_KERNEL);
+ if (nd_dimm->id < 0) {
+ kfree(nd_dimm);
+ return NULL;
+ }
+ nd_dimm->provider_data = provider_data;
+ nd_dimm->flags = flags;
+
+ dev = &nd_dimm->dev;
+ dev_set_name(dev, "nmem%d", nd_dimm->id);
+ dev->parent = &nd_bus->dev;
+ dev->type = &nd_dimm_device_type;
+ dev->bus = &nd_bus_type;
+ dev->groups = groups;
+ if (device_register(dev) != 0) {
+ put_device(dev);
+ return NULL;
+ }
+
+ return nd_dimm;
+}
+EXPORT_SYMBOL_GPL(nd_dimm_create);
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
index 86cf3e0573b0..4f8f2ebbcf7b 100644
--- a/drivers/block/nd/libnd.h
+++ b/drivers/block/nd/libnd.h
@@ -14,6 +14,12 @@
*/
#ifndef __LIBND_H__
#define __LIBND_H__
+
+enum {
+ /* when a dimm supports both PMEM and BLK access a label is required */
+ NDD_ALIASING = 1 << 0,
+};
+
extern struct attribute_group nd_bus_attribute_group;
struct nd_dimm;
@@ -34,5 +40,10 @@ struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nfit_desc);
void nd_bus_unregister(struct nd_bus *nd_bus);
struct nd_bus *to_nd_bus(struct device *dev);
+struct nd_dimm *to_nd_dimm(struct device *dev);
struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus);
+const char *nd_dimm_name(struct nd_dimm *nd_dimm);
+void *nd_dimm_provider_data(struct nd_dimm *nd_dimm);
+struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
+ const struct attribute_group **groups, unsigned long flags);
#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index 960dd2f29cdd..cfb5a7241ed1 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -15,6 +15,10 @@
#include <linux/device.h>
#include "libnd.h"
+extern struct list_head nd_bus_list;
+extern struct mutex nd_bus_list_mutex;
+extern struct bus_type nd_bus_type;
+
struct nd_bus {
struct nd_bus_descriptor *nd_desc;
struct list_head list;
@@ -22,6 +26,14 @@ struct nd_bus {
int id;
};
+struct nd_dimm {
+ unsigned long flags;
+ void *provider_data;
+ struct device dev;
+ int id;
+};
+
+struct nd_bus *walk_to_nd_bus(struct device *nd_dev);
int __init nd_bus_init(void);
void __exit nd_bus_exit(void);
int nd_bus_create_ndctl(struct nd_bus *nd_bus);
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 06/20] libnd: ndctl.h, the nd ioctl abi
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (3 preceding siblings ...)
2015-04-28 18:24 ` [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-04-28 18:24 ` [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory) Dan Williams
` (4 subsequent siblings)
9 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm
Cc: linux-acpi, Rafael J. Wysocki, Robert Moore, Nicholas Moulin,
linux-kernel
Most configuration of the nd-subsystem is done via nd-sysfs attributes.
However, some nd buses, particularly the ACPI.NFIT bus, define a small
set of messages that can be passed to the platform. For convenience we
derivce the initial nd-ioctl-command formats directly from the NFIT DSM
formats.
ND_CMD_SMART: media health and diagnostics
ND_CMD_GET_CONFIG_SIZE: size of the label space
ND_CMD_GET_CONFIG_DATA: read label space
ND_CMD_SET_CONFIG_DATA: write label space
ND_CMD_VENDOR: vendor-specific command passthrough
ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
ND_CMD_START_ARS: initiate scrubbing
ND_CMD_QUERY_ARS: report on scrubbing state
ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events
If a platform later defines different commands than this set it is
straightforward to extend support to those formats.
Most of the commands target a specific dimm. However, the
address-range-scrubbing commands target the bus. The 'commands'
attribute in sysfs of an nd-bus, or an nd-dimm enumerate the supported
commands for that object.
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/Kconfig | 12 ++
drivers/block/nd/acpi.c | 237 ++++++++++++++++++++++++++++++
drivers/block/nd/acpi_nfit.h | 3
drivers/block/nd/bus.c | 324 ++++++++++++++++++++++++++++++++++++++++-
drivers/block/nd/core.c | 16 ++
drivers/block/nd/dimm_devs.c | 38 ++++-
drivers/block/nd/libnd.h | 25 +++
drivers/block/nd/nd-private.h | 3
drivers/block/nd/test/nfit.c | 78 ++++++++++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 178 +++++++++++++++++++++++
11 files changed, 903 insertions(+), 12 deletions(-)
create mode 100644 include/uapi/linux/ndctl.h
diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
index 09f0135147ca..d2d84451e82c 100644
--- a/drivers/block/nd/Kconfig
+++ b/drivers/block/nd/Kconfig
@@ -37,6 +37,18 @@ config ND_ACPI
addition to storage devices this also enables libnd craft
ACPI._DSM messages for platform/dimm configuration.
+config ND_ACPI_DEBUG
+ bool "ACPI: Extra nd_acpi debugging"
+ depends on ND_ACPI
+ depends on DYNAMIC_DEBUG
+ default n
+ help
+ Enabling this option causes the nd_acpi driver to dump the
+ input and output buffers of _DSM operations on the ACPI0012
+ device and its children. This can be very verbose, so leave
+ it disabled unless you are debugging a hardware / firmware
+ issue.
+
config NFIT_TEST
tristate "NFIT TEST: Manufactured NFIT for interface testing"
depends on DMA_CMA
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index af6684341c9b..c46e166695f7 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -12,6 +12,7 @@
*/
#include <linux/list_sort.h>
#include <linux/module.h>
+#include <linux/ndctl.h>
#include <linux/list.h>
#include <linux/acpi.h>
#include "acpi_nfit.h"
@@ -25,11 +26,160 @@ enum {
NFIT_ACPI_NOTIFY_TABLE = 0x80,
};
+static u8 nd_acpi_uuids[2][16]; /* initialized at nd_acpi_init */
+
+static u8 *nd_acpi_bus_uuid(void)
+{
+ return nd_acpi_uuids[0];
+}
+
+static u8 *nd_acpi_dimm_uuid(void)
+{
+ return nd_acpi_uuids[1];
+}
+
+static struct acpi_nfit_desc *to_acpi_nfit_desc(struct nd_bus_descriptor *nd_desc)
+{
+ return container_of(nd_desc, struct acpi_nfit_desc, nd_desc);
+}
+
+static struct acpi_device *to_acpi_dev(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nd_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
+
+ /*
+ * If provider == 'ACPI.NFIT' we can assume 'dev' is a struct
+ * acpi_device.
+ */
+ if (!nd_desc->provider_name
+ || strcmp(nd_desc->provider_name, "ACPI.NFIT") != 0)
+ return NULL;
+
+ return to_acpi_device(acpi_desc->dev);
+}
+
static int nd_acpi_ctl(struct nd_bus_descriptor *nd_desc,
struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
unsigned int buf_len)
{
- return -ENOTTY;
+ struct acpi_nfit_desc *acpi_desc = to_acpi_nfit_desc(nd_desc);
+ const struct nd_cmd_desc const *desc = NULL;
+ union acpi_object in_obj, in_buf, *out_obj;
+ struct device *dev = acpi_desc->dev;
+ const char *cmd_name, *dimm_name;
+ unsigned long dsm_mask;
+ acpi_handle handle;
+ u32 offset;
+ int rc, i;
+ u8 *uuid;
+
+ if (nd_dimm) {
+ struct nfit_mem *nfit_mem = nd_dimm_provider_data(nd_dimm);
+ struct acpi_device *adev = nfit_mem->adev;
+
+ if (!adev)
+ return -ENOTTY;
+ dimm_name = dev_name(&adev->dev);
+ cmd_name = nd_dimm_cmd_name(cmd);
+ dsm_mask = nfit_mem->dsm_mask;
+ desc = nd_cmd_dimm_desc(cmd);
+ uuid = nd_acpi_dimm_uuid();
+ handle = adev->handle;
+ } else {
+ struct acpi_device *adev = to_acpi_dev(acpi_desc);
+
+ cmd_name = nd_bus_cmd_name(cmd);
+ dsm_mask = nd_desc->dsm_mask;
+ desc = nd_cmd_bus_desc(cmd);
+ uuid = nd_acpi_bus_uuid();
+ handle = adev->handle;
+ dimm_name = "bus";
+ }
+
+ if (!desc || (cmd && (desc->out_num + desc->in_num == 0)))
+ return -ENOTTY;
+
+ if (!test_bit(cmd, &dsm_mask))
+ return -ENOTTY;
+
+ in_obj.type = ACPI_TYPE_PACKAGE;
+ in_obj.package.count = 1;
+ in_obj.package.elements = &in_buf;
+ in_buf.type = ACPI_TYPE_BUFFER;
+ in_buf.buffer.pointer = buf;
+ in_buf.buffer.length = 0;
+
+ /* libnd has already validated the input envelope */
+ for (i = 0; i < desc->in_num; i++)
+ in_buf.buffer.length += nd_cmd_in_size(nd_dimm, cmd, desc, i, buf);
+
+ dev_dbg(dev, "%s:%s cmd: %s input length: %d\n", __func__, dimm_name,
+ cmd_name, in_buf.buffer.length);
+ if (IS_ENABLED(CONFIG_ND_ACPI_DEBUG))
+ print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4,
+ 4, in_buf.buffer.pointer, min_t(u32, 128,
+ in_buf.buffer.length), true);
+
+ out_obj = acpi_evaluate_dsm(handle, uuid, 1, cmd, &in_obj);
+ if (!out_obj) {
+ dev_dbg(dev, "%s:%s _DSM failed cmd: %s\n", __func__, dimm_name,
+ cmd_name);
+ return -EINVAL;
+ }
+
+ if (out_obj->package.type != ACPI_TYPE_BUFFER) {
+ dev_dbg(dev, "%s:%s unexpected output object type cmd: %s type: %d\n",
+ __func__, dimm_name, cmd_name, out_obj->type);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ dev_dbg(dev, "%s:%s cmd: %s output length: %d\n", __func__, dimm_name,
+ cmd_name, out_obj->buffer.length);
+ if (IS_ENABLED(CONFIG_ND_ACPI_DEBUG))
+ print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4,
+ 4, out_obj->buffer.pointer, min_t(u32, 128,
+ out_obj->buffer.length), true);
+
+ for (i = 0, offset = 0; i < desc->out_num; i++) {
+ u32 out_size = nd_cmd_out_size(nd_dimm, cmd, desc, i, buf,
+ (u32 *) out_obj->buffer.pointer);
+
+ if (offset + out_size > out_obj->buffer.length) {
+ dev_dbg(dev, "%s:%s output object underflow cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ break;
+ }
+
+ if (in_buf.buffer.length + offset + out_size > buf_len) {
+ dev_dbg(dev, "%s:%s output overrun cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ rc = -ENXIO;
+ goto out;
+ }
+ memcpy(buf + in_buf.buffer.length + offset,
+ out_obj->buffer.pointer + offset, out_size);
+ offset += out_size;
+ }
+ if (offset + in_buf.buffer.length < buf_len) {
+ if (i >= 1) {
+ /*
+ * status valid, return the number of bytes left
+ * unfilled in the output buffer
+ */
+ rc = buf_len - offset - in_buf.buffer.length;
+ } else {
+ dev_err(dev, "%s:%s underrun cmd: %s buf_len: %d out_len: %d\n",
+ __func__, dimm_name, cmd_name, buf_len, offset);
+ rc = -ENXIO;
+ }
+ } else
+ rc = 0;
+
+ out:
+ ACPI_FREE(out_obj);
+
+ return rc;
}
static const char *spa_type_name(u16 type)
@@ -476,6 +626,7 @@ static struct attribute_group nd_acpi_dimm_attribute_group = {
};
static const struct attribute_group *nd_acpi_dimm_attribute_groups[] = {
+ &nd_dimm_attribute_group,
&nd_acpi_dimm_attribute_group,
NULL,
};
@@ -492,6 +643,50 @@ static struct nd_dimm *nd_acpi_dimm_by_handle(struct acpi_nfit_desc *acpi_desc,
return NULL;
}
+static int nd_acpi_add_dimm(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_mem *nfit_mem, u32 nfit_handle)
+{
+ struct acpi_device *adev, *adev_dimm;
+ struct device *dev = acpi_desc->dev;
+ u8 *uuid = nd_acpi_dimm_uuid();
+ unsigned long long sta;
+ int i, rc = -ENODEV;
+ acpi_status status;
+
+ nfit_mem->dsm_mask = acpi_desc->dimm_dsm_force_en;
+ adev = to_acpi_dev(acpi_desc);
+ if (!adev)
+ return 0;
+
+ adev_dimm = acpi_find_child_device(adev, nfit_handle, false);
+ nfit_mem->adev = adev_dimm;
+ if (!adev_dimm) {
+ dev_err(dev, "no ACPI.NFIT device with _ADR %#x, disabling...\n",
+ nfit_handle);
+ return -ENODEV;
+ }
+
+ status = acpi_evaluate_integer(adev_dimm->handle, "_STA", NULL, &sta);
+ if (status == AE_NOT_FOUND) {
+ dev_dbg(dev, "%s missing _STA, assuming enabled...\n",
+ dev_name(&adev_dimm->dev));
+ rc = 0;
+ } else if (ACPI_FAILURE(status))
+ dev_err(dev, "%s failed to retrieve_STA, disabling...\n",
+ dev_name(&adev_dimm->dev));
+ else if ((sta & ACPI_STA_DEVICE_ENABLED) == 0)
+ dev_info(dev, "%s disabled by firmware\n",
+ dev_name(&adev_dimm->dev));
+ else
+ rc = 0;
+
+ for (i = ND_CMD_SMART; i <= ND_CMD_VENDOR; i++)
+ if (acpi_check_dsm(adev_dimm->handle, uuid, 1, 1ULL << i))
+ set_bit(i, &nfit_mem->dsm_mask);
+
+ return rc;
+}
+
static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
{
struct nfit_mem *nfit_mem;
@@ -500,6 +695,7 @@ static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
struct nd_dimm *nd_dimm;
unsigned long flags = 0;
u32 nfit_handle;
+ int rc;
nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
@@ -516,8 +712,13 @@ static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
if (nfit_mem->bdw && nfit_mem->memdev_pmem)
flags |= NDD_ALIASING;
+ rc = nd_acpi_add_dimm(acpi_desc, nfit_mem, nfit_handle);
+ if (rc)
+ continue;
+
nd_dimm = nd_dimm_create(acpi_desc->nd_bus, nfit_mem,
- nd_acpi_dimm_attribute_groups, flags);
+ nd_acpi_dimm_attribute_groups,
+ flags, &nfit_mem->dsm_mask);
if (!nd_dimm)
return -ENOMEM;
@@ -527,6 +728,22 @@ static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
return 0;
}
+static void nd_acpi_init_dsms(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nd_bus_descriptor *nd_desc = &acpi_desc->nd_desc;
+ u8 *uuid = nd_acpi_bus_uuid();
+ struct acpi_device *adev;
+ int i;
+
+ adev = to_acpi_dev(acpi_desc);
+ if (!adev)
+ return;
+
+ for (i = ND_CMD_ARS_CAP; i <= ND_CMD_ARS_QUERY; i++)
+ if (acpi_check_dsm(adev->handle, uuid, 1, 1ULL << i))
+ set_bit(i, &nd_desc->dsm_mask);
+}
+
int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
@@ -564,6 +781,8 @@ int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
if (nfit_mem_init(acpi_desc) != 0)
return -ENOMEM;
+ nd_acpi_init_dsms(acpi_desc);
+
return nd_acpi_register_dimms(acpi_desc);
}
EXPORT_SYMBOL_GPL(nd_acpi_nfit_init);
@@ -641,6 +860,14 @@ static struct acpi_driver nd_acpi_driver = {
static __init int nd_acpi_init(void)
{
+ char *uuids[] = {
+ /* bus interface */
+ "2f10e7a4-9e91-11e4-89d3-123b93f75cba",
+ /* per-dimm interface */
+ "4309ac30-0d11-11e4-9191-0800200c9a66",
+ };
+ int i;
+
BUILD_BUG_ON(sizeof(struct acpi_nfit) != 40);
BUILD_BUG_ON(sizeof(struct acpi_nfit_spa) != 56);
BUILD_BUG_ON(sizeof(struct acpi_nfit_memdev) != 48);
@@ -649,6 +876,12 @@ static __init int nd_acpi_init(void)
BUILD_BUG_ON(sizeof(struct acpi_nfit_dcr) != 80);
BUILD_BUG_ON(sizeof(struct acpi_nfit_bdw) != 40);
+ for (i = 0; i < ARRAY_SIZE(uuids); i++)
+ if (acpi_str_to_uuid(uuids[i], nd_acpi_uuids[i]) != AE_OK) {
+ WARN_ON_ONCE(1);
+ return -ENXIO;
+ }
+
return acpi_bus_register_driver(&nd_acpi_driver);
}
diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
index 00aaaee5953b..2faac336c07d 100644
--- a/drivers/block/nd/acpi_nfit.h
+++ b/drivers/block/nd/acpi_nfit.h
@@ -241,6 +241,8 @@ struct nfit_mem {
struct acpi_nfit_spa *spa_dcr;
struct acpi_nfit_spa *spa_bdw;
struct list_head list;
+ struct acpi_device *adev;
+ unsigned long dsm_mask;
};
struct acpi_nfit_desc {
@@ -253,6 +255,7 @@ struct acpi_nfit_desc {
struct list_head bdws;
struct nd_bus *nd_bus;
struct device *dev;
+ unsigned long dimm_dsm_force_en;
};
static inline struct acpi_nfit_memdev *__to_nfit_memdev(struct nfit_mem *nfit_mem)
diff --git a/drivers/block/nd/bus.c b/drivers/block/nd/bus.c
index ee56aa1ab2ad..a271e01af4a9 100644
--- a/drivers/block/nd/bus.c
+++ b/drivers/block/nd/bus.c
@@ -11,14 +11,18 @@
* General Public License for more details.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/vmalloc.h>
#include <linux/uaccess.h>
#include <linux/fcntl.h>
#include <linux/async.h>
+#include <linux/ndctl.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/io.h>
+#include <linux/mm.h>
#include "nd-private.h"
+int nd_dimm_major;
static int nd_bus_major;
static struct class *nd_class;
@@ -47,19 +51,323 @@ void nd_bus_destroy_ndctl(struct nd_bus *nd_bus)
device_destroy(nd_class, MKDEV(nd_bus_major, nd_bus->id));
}
+static const struct nd_cmd_desc const __nd_cmd_dimm_descs[] = {
+ [ND_CMD_IMPLEMENTED] = { },
+ [ND_CMD_SMART] = {
+ .out_num = 2,
+ .out_sizes = { 4, 8, },
+ },
+ [ND_CMD_SMART_THRESHOLD] = {
+ .out_num = 2,
+ .out_sizes = { 4, 8, },
+ },
+ [ND_CMD_DIMM_FLAGS] = {
+ .out_num = 2,
+ .out_sizes = { 4, 4 },
+ },
+ [ND_CMD_GET_CONFIG_SIZE] = {
+ .out_num = 3,
+ .out_sizes = { 4, 4, 4, },
+ },
+ [ND_CMD_GET_CONFIG_DATA] = {
+ .in_num = 2,
+ .in_sizes = { 4, 4, },
+ .out_num = 2,
+ .out_sizes = { 4, UINT_MAX, },
+ },
+ [ND_CMD_SET_CONFIG_DATA] = {
+ .in_num = 3,
+ .in_sizes = { 4, 4, UINT_MAX, },
+ .out_num = 1,
+ .out_sizes = { 4, },
+ },
+ [ND_CMD_VENDOR] = {
+ .in_num = 3,
+ .in_sizes = { 4, 4, UINT_MAX, },
+ .out_num = 3,
+ .out_sizes = { 4, 4, UINT_MAX, },
+ },
+};
+
+struct nd_cmd_desc const *nd_cmd_dimm_desc(int cmd)
+{
+ if (cmd < ARRAY_SIZE(__nd_cmd_dimm_descs))
+ return &__nd_cmd_dimm_descs[cmd];
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_dimm_desc);
+
+static const struct nd_cmd_desc const __nd_cmd_bus_descs[] = {
+ [ND_CMD_IMPLEMENTED] = { },
+ [ND_CMD_ARS_CAP] = {
+ .in_num = 2,
+ .in_sizes = { 8, 8, },
+ .out_num = 2,
+ .out_sizes = { 4, 4, },
+ },
+ [ND_CMD_ARS_START] = {
+ .in_num = 4,
+ .in_sizes = { 8, 8, 2, 6, },
+ .out_num = 1,
+ .out_sizes = { 4, },
+ },
+ [ND_CMD_ARS_QUERY] = {
+ .out_num = 2,
+ .out_sizes = { 4, UINT_MAX, },
+ },
+};
+
+const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd)
+{
+ if (cmd < ARRAY_SIZE(__nd_cmd_bus_descs))
+ return &__nd_cmd_bus_descs[cmd];
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_bus_desc);
+
+u32 nd_cmd_in_size(struct nd_dimm *nd_dimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, void *buf)
+{
+ if (idx >= desc->in_num)
+ return UINT_MAX;
+
+ if (desc->in_sizes[idx] < UINT_MAX)
+ return desc->in_sizes[idx];
+
+ if (nd_dimm && cmd == ND_CMD_SET_CONFIG_DATA && idx == 2) {
+ struct nd_cmd_set_config_hdr *hdr = buf;
+
+ return hdr->in_length;
+ } else if (nd_dimm && cmd == ND_CMD_VENDOR && idx == 2) {
+ struct nd_cmd_vendor_hdr *hdr = buf;
+
+ return hdr->in_length;
+ }
+
+ return UINT_MAX;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_in_size);
+
+u32 nd_cmd_out_size(struct nd_dimm *nd_dimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
+ const u32 *out_field)
+{
+ if (idx >= desc->out_num)
+ return UINT_MAX;
+
+ if (desc->out_sizes[idx] < UINT_MAX)
+ return desc->out_sizes[idx];
+
+ if (nd_dimm && cmd == ND_CMD_GET_CONFIG_DATA && idx == 1)
+ return in_field[1];
+ else if (nd_dimm && cmd == ND_CMD_VENDOR && idx == 2)
+ return out_field[1];
+ else if (!nd_dimm && cmd == ND_CMD_ARS_QUERY && idx == 1)
+ return ND_CMD_ARS_QUERY_MAX;
+
+ return UINT_MAX;
+}
+EXPORT_SYMBOL_GPL(nd_cmd_out_size);
+
+static int __nd_ioctl(struct nd_bus *nd_bus, struct nd_dimm *nd_dimm,
+ int read_only, unsigned int ioctl_cmd, unsigned long arg)
+{
+ struct nd_bus_descriptor *nd_desc = nd_bus->nd_desc;
+ size_t buf_len = 0, in_len = 0, out_len = 0;
+ static char out_env[ND_CMD_MAX_ENVELOPE];
+ static char in_env[ND_CMD_MAX_ENVELOPE];
+ const struct nd_cmd_desc *desc = NULL;
+ unsigned int cmd = _IOC_NR(ioctl_cmd);
+ void __user *p = (void __user *) arg;
+ struct device *dev = &nd_bus->dev;
+ const char *cmd_name, *dimm_name;
+ unsigned long dsm_mask;
+ void *buf;
+ int rc, i;
+
+ if (nd_dimm) {
+ desc = nd_cmd_dimm_desc(cmd);
+ cmd_name = nd_dimm_cmd_name(cmd);
+ dsm_mask = nd_dimm->dsm_mask ? *(nd_dimm->dsm_mask) : 0;
+ dimm_name = dev_name(&nd_dimm->dev);
+ } else {
+ desc = nd_cmd_bus_desc(cmd);
+ cmd_name = nd_bus_cmd_name(cmd);
+ dsm_mask = nd_desc->dsm_mask;
+ dimm_name = "bus";
+ }
+
+ if (!desc || (desc->out_num + desc->in_num == 0) ||
+ !test_bit(cmd, &dsm_mask))
+ return -ENOTTY;
+
+ /* fail write commands (when read-only) */
+ if (read_only)
+ switch (ioctl_cmd) {
+ case ND_IOCTL_VENDOR:
+ case ND_IOCTL_SET_CONFIG_DATA:
+ case ND_IOCTL_ARS_START:
+ dev_dbg(&nd_bus->dev, "'%s' command while read-only.\n",
+ nd_dimm ? nd_dimm_cmd_name(cmd)
+ : nd_bus_cmd_name(cmd));
+ return -EPERM;
+ default:
+ break;
+ }
+
+ /* process an input envelope */
+ for (i = 0; i < desc->in_num; i++) {
+ u32 in_size, copy;
+
+ in_size = nd_cmd_in_size(nd_dimm, cmd, desc, i, in_env);
+ if (in_size == UINT_MAX) {
+ dev_err(dev, "%s:%s unknown input size cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ return -ENXIO;
+ }
+ if (!access_ok(VERIFY_READ, p + in_len, in_size))
+ return -EFAULT;
+ if (in_len < sizeof(in_env))
+ copy = min_t(u32, sizeof(in_env) - in_len, in_size);
+ else
+ copy = 0;
+ if (copy && copy_from_user(&in_env[in_len], p + in_len, copy))
+ return -EFAULT;
+ in_len += in_size;
+ }
+
+ /* process an output envelope */
+ for (i = 0; i < desc->out_num; i++) {
+ u32 out_size = nd_cmd_out_size(nd_dimm, cmd, desc, i,
+ (u32 *) in_env, (u32 *) out_env);
+ u32 copy;
+
+ if (out_size == UINT_MAX) {
+ dev_dbg(dev, "%s:%s unknown output size cmd: %s field: %d\n",
+ __func__, dimm_name, cmd_name, i);
+ return -EFAULT;
+ }
+ if (!access_ok(VERIFY_WRITE, p + in_len + out_len, out_size))
+ return -EFAULT;
+ if (out_len < sizeof(out_env))
+ copy = min_t(u32, sizeof(out_env) - out_len, out_size);
+ else
+ copy = 0;
+ if (copy && copy_from_user(&out_env[out_len], p + in_len + out_len,
+ copy))
+ return -EFAULT;
+ out_len += out_size;
+ }
+
+ buf_len = out_len + in_len;
+ if (!access_ok(VERIFY_WRITE, p, sizeof(buf_len)))
+ return -EFAULT;
+
+ if (buf_len > ND_IOCTL_MAX_BUFLEN) {
+ dev_dbg(dev, "%s:%s cmd: %s buf_len: %zd > %d\n", __func__,
+ dimm_name, cmd_name, buf_len,
+ ND_IOCTL_MAX_BUFLEN);
+ return -EINVAL;
+ }
+
+ buf = vmalloc(buf_len);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, p, buf_len)) {
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = nd_desc->ndctl(nd_desc, nd_dimm, cmd, buf, buf_len);
+ if (rc < 0)
+ goto out;
+ if (copy_to_user(p, buf, buf_len))
+ rc = -EFAULT;
+ out:
+ vfree(buf);
+ return rc;
+}
+
static long nd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- return -ENXIO;
+ long id = (long) file->private_data;
+ int rc = -ENXIO, read_only;
+ struct nd_bus *nd_bus;
+
+ read_only = (O_RDWR != (file->f_flags & O_ACCMODE));
+ mutex_lock(&nd_bus_list_mutex);
+ list_for_each_entry(nd_bus, &nd_bus_list, list) {
+ if (nd_bus->id == id) {
+ rc = __nd_ioctl(nd_bus, NULL, read_only, cmd, arg);
+ break;
+ }
+ }
+ mutex_unlock(&nd_bus_list_mutex);
+
+ return rc;
+}
+
+static int match_dimm(struct device *dev, void *data)
+{
+ long id = (long) data;
+
+ if (is_nd_dimm(dev)) {
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+
+ return nd_dimm->id == id;
+ }
+
+ return 0;
+}
+
+static long nd_dimm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ int rc = -ENXIO, read_only;
+ struct nd_bus *nd_bus;
+
+ read_only = (O_RDWR != (file->f_flags & O_ACCMODE));
+ mutex_lock(&nd_bus_list_mutex);
+ list_for_each_entry(nd_bus, &nd_bus_list, list) {
+ struct device *dev = device_find_child(&nd_bus->dev,
+ file->private_data, match_dimm);
+
+ if (!dev)
+ continue;
+
+ rc = __nd_ioctl(nd_bus, to_nd_dimm(dev), read_only, cmd, arg);
+ put_device(dev);
+ break;
+ }
+ mutex_unlock(&nd_bus_list_mutex);
+
+ return rc;
+}
+
+static int nd_open(struct inode *inode, struct file *file)
+{
+ long minor = iminor(inode);
+
+ file->private_data = (void *) minor;
+ return 0;
}
static const struct file_operations nd_bus_fops = {
.owner = THIS_MODULE,
- .open = nonseekable_open,
+ .open = nd_open,
.unlocked_ioctl = nd_ioctl,
.compat_ioctl = nd_ioctl,
.llseek = noop_llseek,
};
+static const struct file_operations nd_dimm_fops = {
+ .owner = THIS_MODULE,
+ .open = nd_open,
+ .unlocked_ioctl = nd_dimm_ioctl,
+ .compat_ioctl = nd_dimm_ioctl,
+ .llseek = noop_llseek,
+};
+
int __init nd_bus_init(void)
{
int rc;
@@ -70,9 +378,14 @@ int __init nd_bus_init(void)
rc = register_chrdev(0, "ndctl", &nd_bus_fops);
if (rc < 0)
- goto err_chrdev;
+ goto err_bus_chrdev;
nd_bus_major = rc;
+ rc = register_chrdev(0, "dimmctl", &nd_dimm_fops);
+ if (rc < 0)
+ goto err_dimm_chrdev;
+ nd_dimm_major = rc;
+
nd_class = class_create(THIS_MODULE, "nd");
if (IS_ERR(nd_class))
goto err_class;
@@ -80,8 +393,10 @@ int __init nd_bus_init(void)
return 0;
err_class:
+ unregister_chrdev(nd_dimm_major, "dimmctl");
+ err_dimm_chrdev:
unregister_chrdev(nd_bus_major, "ndctl");
- err_chrdev:
+ err_bus_chrdev:
bus_unregister(&nd_bus_type);
return rc;
@@ -91,5 +406,6 @@ void __exit nd_bus_exit(void)
{
class_destroy(nd_class);
unregister_chrdev(nd_bus_major, "ndctl");
+ unregister_chrdev(nd_dimm_major, "dimmctl");
bus_unregister(&nd_bus_type);
}
diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
index 24046db897c1..b14bf47dc292 100644
--- a/drivers/block/nd/core.c
+++ b/drivers/block/nd/core.c
@@ -13,6 +13,7 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/device.h>
+#include <linux/ndctl.h>
#include <linux/mutex.h>
#include <linux/slab.h>
#include "nd-private.h"
@@ -59,6 +60,20 @@ struct nd_bus *walk_to_nd_bus(struct device *nd_dev)
return NULL;
}
+static ssize_t commands_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int cmd, len = 0;
+ struct nd_bus *nd_bus = to_nd_bus(dev);
+ struct nd_bus_descriptor *nd_desc = nd_bus->nd_desc;
+
+ for_each_set_bit(cmd, &nd_desc->dsm_mask, BITS_PER_LONG)
+ len += sprintf(buf + len, "%s ", nd_bus_cmd_name(cmd));
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+static DEVICE_ATTR_RO(commands);
+
static const char *nd_bus_provider(struct nd_bus *nd_bus)
{
struct nd_bus_descriptor *nd_desc = nd_bus->nd_desc;
@@ -82,6 +97,7 @@ static ssize_t provider_show(struct device *dev,
static DEVICE_ATTR_RO(provider);
static struct attribute *nd_bus_attributes[] = {
+ &dev_attr_commands.attr,
&dev_attr_provider.attr,
NULL,
};
diff --git a/drivers/block/nd/dimm_devs.c b/drivers/block/nd/dimm_devs.c
index 19b081392f2f..3fa26f61c3db 100644
--- a/drivers/block/nd/dimm_devs.c
+++ b/drivers/block/nd/dimm_devs.c
@@ -12,6 +12,7 @@
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/device.h>
+#include <linux/ndctl.h>
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/fs.h>
@@ -33,7 +34,7 @@ static struct device_type nd_dimm_device_type = {
.release = nd_dimm_release,
};
-static bool is_nd_dimm(struct device *dev)
+bool is_nd_dimm(struct device *dev)
{
return dev->type == &nd_dimm_device_type;
}
@@ -55,12 +56,41 @@ EXPORT_SYMBOL_GPL(nd_dimm_name);
void *nd_dimm_provider_data(struct nd_dimm *nd_dimm)
{
- return nd_dimm->provider_data;
+ if (nd_dimm)
+ return nd_dimm->provider_data;
+ return NULL;
}
EXPORT_SYMBOL_GPL(nd_dimm_provider_data);
+static ssize_t commands_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+ int cmd, len = 0;
+
+ if (!nd_dimm->dsm_mask)
+ return sprintf(buf, "\n");
+
+ for_each_set_bit(cmd, nd_dimm->dsm_mask, BITS_PER_LONG)
+ len += sprintf(buf + len, "%s ", nd_dimm_cmd_name(cmd));
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+static DEVICE_ATTR_RO(commands);
+
+static struct attribute *nd_dimm_attributes[] = {
+ &dev_attr_commands.attr,
+ NULL,
+};
+
+struct attribute_group nd_dimm_attribute_group = {
+ .attrs = nd_dimm_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_dimm_attribute_group);
+
struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
- const struct attribute_group **groups, unsigned long flags)
+ const struct attribute_group **groups, unsigned long flags,
+ unsigned long *dsm_mask)
{
struct nd_dimm *nd_dimm = kzalloc(sizeof(*nd_dimm), GFP_KERNEL);
struct device *dev;
@@ -75,12 +105,14 @@ struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
}
nd_dimm->provider_data = provider_data;
nd_dimm->flags = flags;
+ nd_dimm->dsm_mask = dsm_mask;
dev = &nd_dimm->dev;
dev_set_name(dev, "nmem%d", nd_dimm->id);
dev->parent = &nd_bus->dev;
dev->type = &nd_dimm_device_type;
dev->bus = &nd_bus_type;
+ dev->devt = MKDEV(nd_dimm_major, nd_dimm->id);
dev->groups = groups;
if (device_register(dev) != 0) {
put_device(dev);
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
index 4f8f2ebbcf7b..23dae8c1e65b 100644
--- a/drivers/block/nd/libnd.h
+++ b/drivers/block/nd/libnd.h
@@ -14,13 +14,21 @@
*/
#ifndef __LIBND_H__
#define __LIBND_H__
+#include <linux/sizes.h>
enum {
/* when a dimm supports both PMEM and BLK access a label is required */
NDD_ALIASING = 1 << 0,
+
+ /* need to set a limit somewhere, but yes, this is likely overkill */
+ ND_IOCTL_MAX_BUFLEN = SZ_4M,
+ ND_CMD_MAX_ELEM = 4,
+ ND_CMD_MAX_ENVELOPE = 16,
+ ND_CMD_ARS_QUERY_MAX = SZ_4K,
};
extern struct attribute_group nd_bus_attribute_group;
+extern struct attribute_group nd_dimm_attribute_group;
struct nd_dimm;
struct nd_bus_descriptor;
@@ -35,6 +43,13 @@ struct nd_bus_descriptor {
ndctl_fn ndctl;
};
+struct nd_cmd_desc {
+ int in_num;
+ int out_num;
+ u32 in_sizes[ND_CMD_MAX_ELEM];
+ int out_sizes[ND_CMD_MAX_ELEM];
+};
+
struct nd_bus;
struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nfit_desc);
@@ -45,5 +60,13 @@ struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus);
const char *nd_dimm_name(struct nd_dimm *nd_dimm);
void *nd_dimm_provider_data(struct nd_dimm *nd_dimm);
struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
- const struct attribute_group **groups, unsigned long flags);
+ const struct attribute_group **groups, unsigned long flags,
+ unsigned long *dsm_mask);
+const struct nd_cmd_desc const *nd_cmd_dimm_desc(int cmd);
+const struct nd_cmd_desc const *nd_cmd_bus_desc(int cmd);
+u32 nd_cmd_in_size(struct nd_dimm *nd_dimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, void *buf);
+u32 nd_cmd_out_size(struct nd_dimm *nd_dimm, int cmd,
+ const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
+ const u32 *out_field);
#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index cfb5a7241ed1..968e5089f72c 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -18,6 +18,7 @@
extern struct list_head nd_bus_list;
extern struct mutex nd_bus_list_mutex;
extern struct bus_type nd_bus_type;
+extern int nd_dimm_major;
struct nd_bus {
struct nd_bus_descriptor *nd_desc;
@@ -29,10 +30,12 @@ struct nd_bus {
struct nd_dimm {
unsigned long flags;
void *provider_data;
+ unsigned long *dsm_mask;
struct device dev;
int id;
};
+bool is_nd_dimm(struct device *dev);
struct nd_bus *walk_to_nd_bus(struct device *nd_dev);
int __init nd_bus_init(void);
void __exit nd_bus_exit(void);
diff --git a/drivers/block/nd/test/nfit.c b/drivers/block/nd/test/nfit.c
index 1c79f32376fc..50916f0ca901 100644
--- a/drivers/block/nd/test/nfit.c
+++ b/drivers/block/nd/test/nfit.c
@@ -14,6 +14,7 @@
#include <linux/platform_device.h>
#include <linux/dma-mapping.h>
#include <linux/module.h>
+#include <linux/ndctl.h>
#include <linux/sizes.h>
#include <linux/slab.h>
#include "nfit_test.h"
@@ -142,7 +143,74 @@ static int nfit_test_ctl(struct nd_bus_descriptor *nd_desc,
struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
unsigned int buf_len)
{
- return -ENOTTY;
+ struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+ struct nfit_test *t = container_of(acpi_desc, typeof(*t), acpi_desc);
+ struct nfit_mem *nfit_mem = nd_dimm_provider_data(nd_dimm);
+ int i, rc;
+
+ if (!nfit_mem || !test_bit(cmd, &nfit_mem->dsm_mask))
+ return -ENXIO;
+
+ /* lookup label space for the given dimm */
+ for (i = 0; i < ARRAY_SIZE(handle); i++)
+ if (__to_nfit_memdev(nfit_mem)->nfit_handle == handle[i])
+ break;
+ if (i >= ARRAY_SIZE(handle))
+ return -ENXIO;
+
+ switch (cmd) {
+ case ND_CMD_GET_CONFIG_SIZE: {
+ struct nd_cmd_get_config_size *nd_cmd = buf;
+
+ if (buf_len < sizeof(*nd_cmd))
+ return -EINVAL;
+ nd_cmd->status = 0;
+ nd_cmd->config_size = LABEL_SIZE;
+ nd_cmd->max_xfer = SZ_4K;
+ rc = 0;
+ break;
+ }
+ case ND_CMD_GET_CONFIG_DATA: {
+ struct nd_cmd_get_config_data_hdr *nd_cmd = buf;
+ unsigned int len, offset = nd_cmd->in_offset;
+
+ if (buf_len < sizeof(*nd_cmd))
+ return -EINVAL;
+ if (offset >= LABEL_SIZE)
+ return -EINVAL;
+ if (nd_cmd->in_length + sizeof(*nd_cmd) > buf_len)
+ return -EINVAL;
+
+ nd_cmd->status = 0;
+ len = min(nd_cmd->in_length, LABEL_SIZE - offset);
+ memcpy(nd_cmd->out_buf, t->label[i] + offset, len);
+ rc = buf_len - sizeof(*nd_cmd) - len;
+ break;
+ }
+ case ND_CMD_SET_CONFIG_DATA: {
+ struct nd_cmd_set_config_hdr *nd_cmd = buf;
+ unsigned int len, offset = nd_cmd->in_offset;
+ u32 *status;
+
+ if (buf_len < sizeof(*nd_cmd))
+ return -EINVAL;
+ if (offset >= LABEL_SIZE)
+ return -EINVAL;
+ if (nd_cmd->in_length + sizeof(*nd_cmd) + 4 > buf_len)
+ return -EINVAL;
+
+ status = buf + nd_cmd->in_length + sizeof(*nd_cmd);
+ *status = 0;
+ len = min(nd_cmd->in_length, LABEL_SIZE - offset);
+ memcpy(t->label[i] + offset, nd_cmd->in_buf, len);
+ rc = buf_len - sizeof(*nd_cmd) - (len + 4);
+ break;
+ }
+ default:
+ return -ENOTTY;
+ }
+
+ return rc;
}
static DEFINE_SPINLOCK(nfit_test_lock);
@@ -280,6 +348,7 @@ static int nfit_test0_alloc(struct nfit_test *t)
t->label[i] = test_alloc(t, LABEL_SIZE, &t->label_dma[i]);
if (!t->label[i])
return -ENOMEM;
+ sprintf(t->label[i], "label%d", i);
}
for (i = 0; i < NUM_DCR; i++) {
@@ -322,6 +391,7 @@ static int nfit_test1_alloc(struct nfit_test *t)
static void nfit_test0_setup(struct nfit_test *t)
{
struct nd_bus_descriptor *nd_desc;
+ struct acpi_nfit_desc *acpi_desc;
struct acpi_nfit_memdev *memdev;
void *nfit_buf = t->nfit_buf;
size_t size = t->nfit_size;
@@ -763,7 +833,11 @@ static void nfit_test0_setup(struct nfit_test *t)
nfit->checksum = nfit_checksum(nfit_buf, size);
- nd_desc = &t->acpi_desc.nd_desc;
+ acpi_desc = &t->acpi_desc;
+ set_bit(ND_CMD_GET_CONFIG_SIZE, &acpi_desc->dimm_dsm_force_en);
+ set_bit(ND_CMD_GET_CONFIG_DATA, &acpi_desc->dimm_dsm_force_en);
+ set_bit(ND_CMD_SET_CONFIG_DATA, &acpi_desc->dimm_dsm_force_en);
+ nd_desc = &acpi_desc->nd_desc;
nd_desc->ndctl = nfit_test_ctl;
}
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 68ceb97c458c..384e8d212b04 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -270,6 +270,7 @@ header-y += ncp_fs.h
header-y += ncp.h
header-y += ncp_mount.h
header-y += ncp_no.h
+header-y += ndctl.h
header-y += neighbour.h
header-y += netconf.h
header-y += netdevice.h
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
new file mode 100644
index 000000000000..62c01bf76198
--- /dev/null
+++ b/include/uapi/linux/ndctl.h
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2014-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#ifndef __NDCTL_H__
+#define __NDCTL_H__
+
+#include <linux/types.h>
+
+struct nd_cmd_smart {
+ __u32 status;
+ __u8 data[128];
+} __packed;
+
+struct nd_cmd_smart_threshold {
+ __u32 status;
+ __u8 data[8];
+} __packed;
+
+struct nd_cmd_dimm_flags {
+ __u32 status;
+ __u32 flags;
+} __packed;
+
+struct nd_cmd_get_config_size {
+ __u32 status;
+ __u32 config_size;
+ __u32 max_xfer;
+} __packed;
+
+struct nd_cmd_get_config_data_hdr {
+ __u32 in_offset;
+ __u32 in_length;
+ __u32 status;
+ __u8 out_buf[0];
+} __packed;
+
+struct nd_cmd_set_config_hdr {
+ __u32 in_offset;
+ __u32 in_length;
+ __u8 in_buf[0];
+} __packed;
+
+struct nd_cmd_vendor_hdr {
+ __u32 opcode;
+ __u32 in_length;
+ __u8 in_buf[0];
+} __packed;
+
+struct nd_cmd_vendor_tail {
+ __u32 status;
+ __u32 out_length;
+ __u8 out_buf[0];
+} __packed;
+
+struct nd_cmd_ars_cap {
+ __u64 address;
+ __u64 length;
+ __u32 status;
+ __u32 max_ars_out;
+} __packed;
+
+struct nd_cmd_ars_start {
+ __u64 address;
+ __u64 length;
+ __u16 type;
+ __u8 reserved[6];
+ __u32 status;
+} __packed;
+
+struct nd_cmd_ars_query {
+ __u32 status;
+ __u32 out_length;
+ __u64 address;
+ __u64 length;
+ __u16 type;
+ __u32 num_records;
+ struct nd_ars_record {
+ __u32 handle;
+ __u32 flags;
+ __u64 err_address;
+ __u64 mask;
+ } __packed records[0];
+} __packed;
+
+enum {
+ ND_CMD_IMPLEMENTED = 0,
+
+ /* bus commands */
+ ND_CMD_ARS_CAP = 1,
+ ND_CMD_ARS_START = 2,
+ ND_CMD_ARS_QUERY = 3,
+
+ /* per-dimm commands */
+ ND_CMD_SMART = 1,
+ ND_CMD_SMART_THRESHOLD = 2,
+ ND_CMD_DIMM_FLAGS = 3,
+ ND_CMD_GET_CONFIG_SIZE = 4,
+ ND_CMD_GET_CONFIG_DATA = 5,
+ ND_CMD_SET_CONFIG_DATA = 6,
+ ND_CMD_VENDOR_EFFECT_LOG_SIZE = 7,
+ ND_CMD_VENDOR_EFFECT_LOG = 8,
+ ND_CMD_VENDOR = 9,
+};
+
+static inline const char *nd_bus_cmd_name(unsigned cmd)
+{
+ static const char * const names[] = {
+ [ND_CMD_ARS_CAP] = "ars_cap",
+ [ND_CMD_ARS_START] = "ars_start",
+ [ND_CMD_ARS_QUERY] = "ars_query",
+ };
+
+ if (cmd < ARRAY_SIZE(names) && names[cmd])
+ return names[cmd];
+ return "unknown";
+}
+
+static inline const char *nd_dimm_cmd_name(unsigned cmd)
+{
+ static const char * const names[] = {
+ [ND_CMD_SMART] = "smart",
+ [ND_CMD_SMART_THRESHOLD] = "smart_thresh",
+ [ND_CMD_DIMM_FLAGS] = "flags",
+ [ND_CMD_GET_CONFIG_SIZE] = "get_size",
+ [ND_CMD_GET_CONFIG_DATA] = "get_data",
+ [ND_CMD_SET_CONFIG_DATA] = "set_data",
+ [ND_CMD_VENDOR_EFFECT_LOG_SIZE] = "effect_size",
+ [ND_CMD_VENDOR_EFFECT_LOG] = "effect_log",
+ [ND_CMD_VENDOR] = "vendor",
+ };
+
+ if (cmd < ARRAY_SIZE(names) && names[cmd])
+ return names[cmd];
+ return "unknown";
+}
+
+#define ND_IOCTL 'N'
+
+#define ND_IOCTL_SMART _IOWR(ND_IOCTL, ND_CMD_SMART,\
+ struct nd_cmd_smart)
+
+#define ND_IOCTL_SMART_THRESHOLD _IOWR(ND_IOCTL, ND_CMD_SMART_THRESHOLD,\
+ struct nd_cmd_smart_threshold)
+
+#define ND_IOCTL_DIMM_FLAGS _IOWR(ND_IOCTL, ND_CMD_DIMM_FLAGS,\
+ struct nd_cmd_dimm_flags)
+
+#define ND_IOCTL_GET_CONFIG_SIZE _IOWR(ND_IOCTL, ND_CMD_GET_CONFIG_SIZE,\
+ struct nd_cmd_get_config_size)
+
+#define ND_IOCTL_GET_CONFIG_DATA _IOWR(ND_IOCTL, ND_CMD_GET_CONFIG_DATA,\
+ struct nd_cmd_get_config_data_hdr)
+
+#define ND_IOCTL_SET_CONFIG_DATA _IOWR(ND_IOCTL, ND_CMD_SET_CONFIG_DATA,\
+ struct nd_cmd_set_config_hdr)
+
+#define ND_IOCTL_VENDOR _IOWR(ND_IOCTL, ND_CMD_VENDOR,\
+ struct nd_cmd_vendor_hdr)
+
+#define ND_IOCTL_ARS_CAP _IOWR(ND_IOCTL, ND_CMD_ARS_CAP,\
+ struct nd_cmd_ars_cap)
+
+#define ND_IOCTL_ARS_START _IOWR(ND_IOCTL, ND_CMD_ARS_START,\
+ struct nd_cmd_ars_start)
+
+#define ND_IOCTL_ARS_QUERY _IOWR(ND_IOCTL, ND_CMD_ARS_QUERY,\
+ struct nd_cmd_ars_query)
+
+#endif /* __NDCTL_H__ */
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (4 preceding siblings ...)
2015-04-28 18:24 ` [PATCH v2 06/20] libnd: ndctl.h, the nd ioctl abi Dan Williams
@ 2015-04-28 18:24 ` Dan Williams
2015-04-29 15:53 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
2015-05-04 20:26 ` Toshi Kani
2015-04-28 18:25 ` [PATCH v2 12/20] libnd, nd_acpi: add interleave-set state-tracking infrastructure Dan Williams
` (3 subsequent siblings)
9 siblings, 2 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:24 UTC (permalink / raw)
To: linux-nvdimm
Cc: Neil Brown, Greg KH, Rafael J. Wysocki, Robert Moore,
linux-kernel, linux-acpi
A "region" device represents the maximum capacity of a BLK range (mmio
block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
volatile memory), without regard for aliasing. Aliasing, in the
dimm-local address space (DPA), is resolved by metadata on a dimm to
designate which exclusive interface will access the aliased DPA ranges.
Support for the per-dimm metadata/label arrvies is in a subsequent
patch.
The name format of "region" devices is "regionN" where, like dimms, N is
a global ida index assigned at discovery time. This id is not reliable
across reboots nor in the presence of hotplug. Look to attributes of
the region or static id-data of the sub-namespace to generate a
persistent name.
"region"s have 2 generic attributes "size", and "mapping"s where:
- size: the block-data-window accessible capacity or the span of the
spa-range in the case of pm.
- mappingN: a tuple describing a dimm's contribution to the region's
capacity in the format (<nmemX>,<dpa>,<size>). For a
PMEM-region there will be at least one mapping per dimm in the interleave
set. For a BLK-region there is only "mapping0" listing the starting dimm
offset of the block-data-window and the available capacity of that
window (matches "size" above).
The max number of mappings per "region" is hard coded per the constraints of
sysfs attribute groups. That said the number of mappings per region should
never exceed the maximum number of possible dimms in the system. If the
current number turns out to not be enough then the "mappings" attribute
clarifies how many there are supposed to be. "32 should be enough for
anybody...".
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/Makefile | 1
drivers/block/nd/acpi.c | 130 ++++++++++++++++++
drivers/block/nd/libnd.h | 25 +++
drivers/block/nd/nd-private.h | 3
drivers/block/nd/nd.h | 11 +
drivers/block/nd/region_devs.c | 294 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 463 insertions(+), 1 deletion(-)
create mode 100644 drivers/block/nd/region_devs.c
diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
index 842ba13253fd..6010469c4d4c 100644
--- a/drivers/block/nd/Makefile
+++ b/drivers/block/nd/Makefile
@@ -22,3 +22,4 @@ libnd-y := core.o
libnd-y += bus.o
libnd-y += dimm_devs.o
libnd-y += dimm.o
+libnd-y += region_devs.o
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index bb0c2c764e78..41d0bb732b3e 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -751,12 +751,136 @@ static void nd_acpi_init_dsms(struct acpi_nfit_desc *acpi_desc)
set_bit(i, &nd_desc->dsm_mask);
}
+static ssize_t spa_index_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nfit_spa *nfit_spa = nd_region_provider_data(nd_region);
+
+ return sprintf(buf, "%d\n", nfit_spa->spa->spa_index);
+}
+static DEVICE_ATTR_RO(spa_index);
+
+static struct attribute *nd_acpi_region_attributes[] = {
+ &dev_attr_spa_index.attr,
+ NULL,
+};
+
+static struct attribute_group nd_acpi_region_attribute_group = {
+ .name = "nfit",
+ .attrs = nd_acpi_region_attributes,
+};
+
+static const struct attribute_group *nd_acpi_region_attribute_groups[] = {
+ &nd_region_attribute_group,
+ &nd_mapping_attribute_group,
+ &nd_acpi_region_attribute_group,
+ NULL,
+};
+
+static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
+ struct nfit_spa *nfit_spa)
+{
+ static struct nd_mapping nd_mappings[ND_MAX_MAPPINGS];
+ struct acpi_nfit_spa *spa = nfit_spa->spa;
+ struct nfit_memdev *nfit_memdev;
+ struct nd_region_desc ndr_desc;
+ int spa_type, count = 0;
+ struct resource res;
+ u16 spa_index;
+
+ spa_type = nfit_spa_type(spa);
+ spa_index = spa->spa_index;
+ if (spa_index == 0) {
+ dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
+ __func__);
+ return 0;
+ }
+
+ memset(&res, 0, sizeof(res));
+ memset(&nd_mappings, 0, sizeof(nd_mappings));
+ memset(&ndr_desc, 0, sizeof(ndr_desc));
+ res.start = spa->spa_base;
+ res.end = res.start + spa->spa_length - 1;
+ ndr_desc.res = &res;
+ ndr_desc.provider_data = nfit_spa;
+ ndr_desc.attr_groups = nd_acpi_region_attribute_groups;
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+ struct acpi_nfit_memdev *memdev = nfit_memdev->memdev;
+ struct nd_mapping *nd_mapping;
+ struct nd_dimm *nd_dimm;
+
+ if (memdev->spa_index != spa_index)
+ continue;
+ if (count >= ND_MAX_MAPPINGS) {
+ dev_err(acpi_desc->dev, "spa%d exceeds max mappings %d\n",
+ spa_index, ND_MAX_MAPPINGS);
+ return -ENXIO;
+ }
+ nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, memdev->nfit_handle);
+ if (!nd_dimm) {
+ dev_err(acpi_desc->dev, "spa%d dimm: %#x not found\n",
+ spa_index, memdev->nfit_handle);
+ return -ENODEV;
+ }
+ nd_mapping = &nd_mappings[count++];
+ nd_mapping->nd_dimm = nd_dimm;
+ if (spa_type == NFIT_SPA_PM || spa_type == NFIT_SPA_VOLATILE) {
+ nd_mapping->start = memdev->region_dpa;
+ nd_mapping->size = memdev->region_len;
+ } else if (spa_type == NFIT_SPA_DCR) {
+ struct nfit_mem *nfit_mem;
+ int blk_valid = 1;
+
+ nfit_mem = nd_dimm_provider_data(nd_dimm);
+ if (!nfit_mem || !nfit_mem->bdw) {
+ dev_dbg(acpi_desc->dev, "%s: spa%d missing bdw\n",
+ nd_dimm_name(nd_dimm), spa_index);
+ blk_valid = 0;
+ } else {
+ nd_mapping->size = nfit_mem->bdw->blk_capacity;
+ nd_mapping->start = nfit_mem->bdw->blk_offset;
+ }
+
+ ndr_desc.nd_mapping = nd_mapping;
+ ndr_desc.num_mappings = blk_valid;
+ if (!nd_blk_region_create(acpi_desc->nd_bus, &ndr_desc))
+ return -ENOMEM;
+ }
+ }
+
+ ndr_desc.nd_mapping = nd_mappings;
+ ndr_desc.num_mappings = count;
+ if (spa_type == NFIT_SPA_PM) {
+ if (!nd_pmem_region_create(acpi_desc->nd_bus, &ndr_desc))
+ return -ENOMEM;
+ } else if (spa_type == NFIT_SPA_VOLATILE) {
+ if (!nd_volatile_region_create(acpi_desc->nd_bus, &ndr_desc))
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static int nd_acpi_register_regions(struct acpi_nfit_desc *acpi_desc)
+{
+ struct nfit_spa *nfit_spa;
+
+ list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+ int rc = nd_acpi_register_region(acpi_desc, nfit_spa);
+
+ if (rc)
+ return rc;
+ }
+ return 0;
+}
+
int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
{
struct device *dev = acpi_desc->dev;
const void *end;
u8 *data, sum;
acpi_size i;
+ int rc;
INIT_LIST_HEAD(&acpi_desc->spas);
INIT_LIST_HEAD(&acpi_desc->dcrs);
@@ -790,7 +914,11 @@ int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
nd_acpi_init_dsms(acpi_desc);
- return nd_acpi_register_dimms(acpi_desc);
+ rc = nd_acpi_register_dimms(acpi_desc);
+ if (rc)
+ return rc;
+
+ return nd_acpi_register_regions(acpi_desc);
}
EXPORT_SYMBOL_GPL(nd_acpi_nfit_init);
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
index 3b91df914b76..630a703d1316 100644
--- a/drivers/block/nd/libnd.h
+++ b/drivers/block/nd/libnd.h
@@ -25,11 +25,14 @@ enum {
ND_CMD_MAX_ELEM = 4,
ND_CMD_MAX_ENVELOPE = 16,
ND_CMD_ARS_QUERY_MAX = SZ_4K,
+ ND_MAX_MAPPINGS = 32,
};
extern struct attribute_group nd_bus_attribute_group;
extern struct attribute_group nd_dimm_attribute_group;
extern struct attribute_group nd_device_attribute_group;
+extern struct attribute_group nd_region_attribute_group;
+extern struct attribute_group nd_mapping_attribute_group;
struct nd_dimm;
struct nd_bus_descriptor;
@@ -37,6 +40,12 @@ typedef int (*ndctl_fn)(struct nd_bus_descriptor *nd_desc,
struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
unsigned int buf_len);
+struct nd_mapping {
+ struct nd_dimm *nd_dimm;
+ u64 start;
+ u64 size;
+};
+
struct nd_bus_descriptor {
const struct attribute_group **attr_groups;
unsigned long dsm_mask;
@@ -51,15 +60,25 @@ struct nd_cmd_desc {
int out_sizes[ND_CMD_MAX_ELEM];
};
+struct nd_region_desc {
+ struct resource *res;
+ struct nd_mapping *nd_mapping;
+ u16 num_mappings;
+ const struct attribute_group **attr_groups;
+ void *provider_data;
+};
+
struct nd_bus;
struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nfit_desc);
void nd_bus_unregister(struct nd_bus *nd_bus);
struct nd_bus *to_nd_bus(struct device *dev);
struct nd_dimm *to_nd_dimm(struct device *dev);
+struct nd_region *to_nd_region(struct device *dev);
struct nd_bus_descriptor *to_nd_desc(struct nd_bus *nd_bus);
const char *nd_dimm_name(struct nd_dimm *nd_dimm);
void *nd_dimm_provider_data(struct nd_dimm *nd_dimm);
+void *nd_region_provider_data(struct nd_region *nd_region);
struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
const struct attribute_group **groups, unsigned long flags,
unsigned long *dsm_mask);
@@ -71,4 +90,10 @@ u32 nd_cmd_out_size(struct nd_dimm *nd_dimm, int cmd,
const struct nd_cmd_desc *desc, int idx, const u32 *in_field,
const u32 *out_field);
int nd_bus_validate_dimm_count(struct nd_bus *nd_bus, int dimm_count);
+struct nd_region *nd_pmem_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc);
+struct nd_region *nd_blk_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc);
+struct nd_region *nd_volatile_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc);
#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index 35ab0d476d15..838b6f958c00 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -42,5 +42,8 @@ void __exit nd_dimm_exit(void);
int nd_bus_create_ndctl(struct nd_bus *nd_bus);
void nd_bus_destroy_ndctl(struct nd_bus *nd_bus);
void nd_synchronize(void);
+int nd_bus_register_dimms(struct nd_bus *nd_bus);
+int nd_bus_register_regions(struct nd_bus *nd_bus);
+int nd_match_dimm(struct device *dev, void *data);
bool is_nd_dimm(struct device *dev);
#endif /* __ND_PRIVATE_H__ */
diff --git a/drivers/block/nd/nd.h b/drivers/block/nd/nd.h
index 1a5a081ce640..cae83de12c45 100644
--- a/drivers/block/nd/nd.h
+++ b/drivers/block/nd/nd.h
@@ -15,6 +15,7 @@
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/ndctl.h>
+#include "libnd.h"
struct nd_dimm_drvdata {
struct device *dev;
@@ -22,6 +23,16 @@ struct nd_dimm_drvdata {
void *data;
};
+struct nd_region {
+ struct device dev;
+ u16 ndr_mappings;
+ u64 ndr_size;
+ u64 ndr_start;
+ int id;
+ void *provider_data;
+ struct nd_mapping mapping[0];
+};
+
enum nd_async_mode {
ND_SYNC,
ND_ASYNC,
diff --git a/drivers/block/nd/region_devs.c b/drivers/block/nd/region_devs.c
new file mode 100644
index 000000000000..12a5415acfcc
--- /dev/null
+++ b/drivers/block/nd/region_devs.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/slab.h>
+#include <linux/io.h>
+#include "nd-private.h"
+#include "nd.h"
+
+static DEFINE_IDA(region_ida);
+
+static void nd_region_release(struct device *dev)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ u16 i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
+
+ put_device(&nd_dimm->dev);
+ }
+ ida_simple_remove(®ion_ida, nd_region->id);
+ kfree(nd_region);
+}
+
+static struct device_type nd_blk_device_type = {
+ .name = "nd_blk",
+ .release = nd_region_release,
+};
+
+static struct device_type nd_pmem_device_type = {
+ .name = "nd_pmem",
+ .release = nd_region_release,
+};
+
+static struct device_type nd_volatile_device_type = {
+ .name = "nd_volatile",
+ .release = nd_region_release,
+};
+
+static bool is_nd_pmem(struct device *dev)
+{
+ return dev ? dev->type == &nd_pmem_device_type : false;
+}
+
+struct nd_region *to_nd_region(struct device *dev)
+{
+ struct nd_region *nd_region = container_of(dev, struct nd_region, dev);
+
+ WARN_ON(dev->type->release != nd_region_release);
+ return nd_region;
+}
+EXPORT_SYMBOL_GPL(to_nd_region);
+
+static ssize_t size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ unsigned long long size = 0;
+
+ if (is_nd_pmem(dev)) {
+ size = nd_region->ndr_size;
+ } else if (nd_region->ndr_mappings == 1) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[0];
+
+ size = nd_mapping->size;
+ }
+
+ return sprintf(buf, "%llu\n", size);
+}
+static DEVICE_ATTR_RO(size);
+
+static ssize_t mappings_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ return sprintf(buf, "%d\n", nd_region->ndr_mappings);
+}
+static DEVICE_ATTR_RO(mappings);
+
+static struct attribute *nd_region_attributes[] = {
+ &dev_attr_size.attr,
+ &dev_attr_mappings.attr,
+ NULL,
+};
+
+struct attribute_group nd_region_attribute_group = {
+ .attrs = nd_region_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_region_attribute_group);
+
+static ssize_t mappingN(struct device *dev, char *buf, int n)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_mapping *nd_mapping;
+ struct nd_dimm *nd_dimm;
+
+ if (n >= nd_region->ndr_mappings)
+ return -ENXIO;
+ nd_mapping = &nd_region->mapping[n];
+ nd_dimm = nd_mapping->nd_dimm;
+
+ return sprintf(buf, "%s,%llu,%llu\n", dev_name(&nd_dimm->dev),
+ nd_mapping->start, nd_mapping->size);
+}
+
+#define REGION_MAPPING(idx) \
+static ssize_t mapping##idx##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return mappingN(dev, buf, idx); \
+} \
+static DEVICE_ATTR_RO(mapping##idx)
+
+/*
+ * 32 should be enough for a while, even in the presence of socket
+ * interleave a 32-way interleave set is a degenerate case.
+ */
+REGION_MAPPING(0);
+REGION_MAPPING(1);
+REGION_MAPPING(2);
+REGION_MAPPING(3);
+REGION_MAPPING(4);
+REGION_MAPPING(5);
+REGION_MAPPING(6);
+REGION_MAPPING(7);
+REGION_MAPPING(8);
+REGION_MAPPING(9);
+REGION_MAPPING(10);
+REGION_MAPPING(11);
+REGION_MAPPING(12);
+REGION_MAPPING(13);
+REGION_MAPPING(14);
+REGION_MAPPING(15);
+REGION_MAPPING(16);
+REGION_MAPPING(17);
+REGION_MAPPING(18);
+REGION_MAPPING(19);
+REGION_MAPPING(20);
+REGION_MAPPING(21);
+REGION_MAPPING(22);
+REGION_MAPPING(23);
+REGION_MAPPING(24);
+REGION_MAPPING(25);
+REGION_MAPPING(26);
+REGION_MAPPING(27);
+REGION_MAPPING(28);
+REGION_MAPPING(29);
+REGION_MAPPING(30);
+REGION_MAPPING(31);
+
+static umode_t nd_mapping_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ if (n < nd_region->ndr_mappings)
+ return a->mode;
+ return 0;
+}
+
+static struct attribute *nd_mapping_attributes[] = {
+ &dev_attr_mapping0.attr,
+ &dev_attr_mapping1.attr,
+ &dev_attr_mapping2.attr,
+ &dev_attr_mapping3.attr,
+ &dev_attr_mapping4.attr,
+ &dev_attr_mapping5.attr,
+ &dev_attr_mapping6.attr,
+ &dev_attr_mapping7.attr,
+ &dev_attr_mapping8.attr,
+ &dev_attr_mapping9.attr,
+ &dev_attr_mapping10.attr,
+ &dev_attr_mapping11.attr,
+ &dev_attr_mapping12.attr,
+ &dev_attr_mapping13.attr,
+ &dev_attr_mapping14.attr,
+ &dev_attr_mapping15.attr,
+ &dev_attr_mapping16.attr,
+ &dev_attr_mapping17.attr,
+ &dev_attr_mapping18.attr,
+ &dev_attr_mapping19.attr,
+ &dev_attr_mapping20.attr,
+ &dev_attr_mapping21.attr,
+ &dev_attr_mapping22.attr,
+ &dev_attr_mapping23.attr,
+ &dev_attr_mapping24.attr,
+ &dev_attr_mapping25.attr,
+ &dev_attr_mapping26.attr,
+ &dev_attr_mapping27.attr,
+ &dev_attr_mapping28.attr,
+ &dev_attr_mapping29.attr,
+ &dev_attr_mapping30.attr,
+ &dev_attr_mapping31.attr,
+ NULL,
+};
+
+struct attribute_group nd_mapping_attribute_group = {
+ .is_visible = nd_mapping_visible,
+ .attrs = nd_mapping_attributes,
+};
+EXPORT_SYMBOL_GPL(nd_mapping_attribute_group);
+
+void *nd_region_provider_data(struct nd_region *nd_region)
+{
+ return nd_region->provider_data;
+}
+EXPORT_SYMBOL_GPL(nd_region_provider_data);
+
+static noinline struct nd_region *nd_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc, struct device_type *dev_type)
+{
+ struct nd_region *nd_region;
+ struct device *dev;
+ u16 i;
+
+ for (i = 0; i < ndr_desc->num_mappings; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
+
+ if ((nd_mapping->start | nd_mapping->size) % SZ_4K) {
+ dev_err(&nd_bus->dev, "%pf: %s mapping%d is not 4K aligned\n",
+ __builtin_return_address(0),
+ dev_name(&nd_dimm->dev), i);
+
+ return NULL;
+ }
+ }
+
+ nd_region = kzalloc(sizeof(struct nd_region)
+ + sizeof(struct nd_mapping) * ndr_desc->num_mappings,
+ GFP_KERNEL);
+ if (!nd_region)
+ return NULL;
+ nd_region->id = ida_simple_get(®ion_ida, 0, 0, GFP_KERNEL);
+ if (nd_region->id < 0) {
+ kfree(nd_region);
+ return NULL;
+ }
+
+ memcpy(nd_region->mapping, ndr_desc->nd_mapping,
+ sizeof(struct nd_mapping) * ndr_desc->num_mappings);
+ for (i = 0; i < ndr_desc->num_mappings; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
+
+ get_device(&nd_dimm->dev);
+ }
+ nd_region->ndr_mappings = ndr_desc->num_mappings;
+ nd_region->provider_data = ndr_desc->provider_data;
+ dev = &nd_region->dev;
+ dev_set_name(dev, "region%d", nd_region->id);
+ dev->parent = &nd_bus->dev;
+ dev->type = dev_type;
+ dev->groups = ndr_desc->attr_groups;
+ nd_region->ndr_size = resource_size(ndr_desc->res);
+ nd_region->ndr_start = ndr_desc->res->start;
+ nd_device_register(dev);
+
+ return nd_region;
+}
+
+struct nd_region *nd_pmem_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ return nd_region_create(nd_bus, ndr_desc, &nd_pmem_device_type);
+}
+EXPORT_SYMBOL_GPL(nd_pmem_region_create);
+
+struct nd_region *nd_blk_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ if (ndr_desc->num_mappings > 1)
+ return NULL;
+ return nd_region_create(nd_bus, ndr_desc, &nd_blk_device_type);
+}
+EXPORT_SYMBOL_GPL(nd_blk_region_create);
+
+struct nd_region *nd_volatile_region_create(struct nd_bus *nd_bus,
+ struct nd_region_desc *ndr_desc)
+{
+ return nd_region_create(nd_bus, ndr_desc, &nd_volatile_device_type);
+}
+EXPORT_SYMBOL_GPL(nd_volatile_region_create);
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v2 12/20] libnd, nd_acpi: add interleave-set state-tracking infrastructure
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (5 preceding siblings ...)
2015-04-28 18:24 ` [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory) Dan Williams
@ 2015-04-28 18:25 ` Dan Williams
2015-04-28 20:52 ` [PATCH v2 00/20] libnd: non-volatile memory device support Andy Lutomirski
` (2 subsequent siblings)
9 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-28 18:25 UTC (permalink / raw)
To: linux-nvdimm
Cc: Neil Brown, Greg KH, Rafael J. Wysocki, Robert Moore,
linux-kernel, linux-acpi
On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface. A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.
Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active. Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command. Adding/deleting namespaces from an active
interleave set will only be possible via sysfs.
Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered. For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration. It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/nd/acpi.c | 90 ++++++++++++++++++++++++++++++++++++++++
drivers/block/nd/bus.c | 41 ++++++++++++++++++
drivers/block/nd/core.c | 47 +++++++++++++++++++++
drivers/block/nd/dimm_devs.c | 19 ++++++++
drivers/block/nd/libnd.h | 6 +++
drivers/block/nd/nd-private.h | 11 ++++-
drivers/block/nd/nd.h | 4 ++
drivers/block/nd/region_devs.c | 85 ++++++++++++++++++++++++++++++++++++++
8 files changed, 299 insertions(+), 4 deletions(-)
diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
index c3dda74f73d7..d34cefe38e2f 100644
--- a/drivers/block/nd/acpi.c
+++ b/drivers/block/nd/acpi.c
@@ -15,6 +15,7 @@
#include <linux/ndctl.h>
#include <linux/list.h>
#include <linux/acpi.h>
+#include <linux/sort.h>
#include "acpi_nfit.h"
#include "libnd.h"
@@ -779,6 +780,90 @@ static const struct attribute_group *nd_acpi_region_attribute_groups[] = {
NULL,
};
+/* enough info to uniquely specify an interleave set */
+struct nfit_set_info {
+ struct nfit_set_info_map {
+ u64 region_spa_offset;
+ u32 serial_number;
+ u32 pad;
+ } mapping[0];
+};
+
+static size_t sizeof_nfit_set_info(int num_mappings)
+{
+ return sizeof(struct nfit_set_info)
+ + num_mappings * sizeof(struct nfit_set_info_map);
+}
+
+static int cmp_map(const void *m0, const void *m1)
+{
+ const struct nfit_set_info_map *map0 = m0;
+ const struct nfit_set_info_map *map1 = m1;
+
+ return memcmp(&map0->region_spa_offset, &map1->region_spa_offset,
+ sizeof(u64));
+}
+
+/* Retrieve the nth entry referencing this spa */
+static struct acpi_nfit_memdev *memdev_from_spa(
+ struct acpi_nfit_desc *acpi_desc, u16 spa_index, int n)
+{
+ struct nfit_memdev *nfit_memdev;
+
+ list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list)
+ if (nfit_memdev->memdev->spa_index == spa_index)
+ if (n-- == 0)
+ return nfit_memdev->memdev;
+ return NULL;
+}
+
+static int nd_acpi_init_interleave_set(struct acpi_nfit_desc *acpi_desc,
+ struct nd_region_desc *ndr_desc, struct acpi_nfit_spa *spa)
+{
+ u16 num_mappings = ndr_desc->num_mappings;
+ int i, spa_type = nfit_spa_type(spa);
+ struct device *dev = acpi_desc->dev;
+ struct nd_interleave_set *nd_set;
+ struct nfit_set_info *info;
+
+ if (spa_type == NFIT_SPA_PM || spa_type == NFIT_SPA_VOLATILE)
+ /* pass */;
+ else
+ return 0;
+
+ nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
+ if (!nd_set)
+ return -ENOMEM;
+
+ info = devm_kzalloc(dev, sizeof_nfit_set_info(num_mappings), GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+ for (i = 0; i < num_mappings; i++) {
+ struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
+ struct nfit_set_info_map *map = &info->mapping[i];
+ struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
+ struct nfit_mem *nfit_mem = nd_dimm_provider_data(nd_dimm);
+ struct acpi_nfit_memdev *memdev = memdev_from_spa(acpi_desc,
+ spa->spa_index, i);
+
+ if (!memdev || !nfit_mem->dcr) {
+ dev_err(dev, "%s: failed to find DCR\n", __func__);
+ return -ENODEV;
+ }
+
+ map->region_spa_offset = memdev->region_spa_offset;
+ map->serial_number = nfit_mem->dcr->serial_number;
+ }
+
+ sort(&info->mapping[0], num_mappings, sizeof(struct nfit_set_info_map),
+ cmp_map, NULL);
+ nd_set->cookie = nd_fletcher64(info, sizeof_nfit_set_info(num_mappings));
+ ndr_desc->nd_set = nd_set;
+ devm_kfree(dev, info);
+
+ return 0;
+}
+
static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
struct nfit_spa *nfit_spa)
{
@@ -786,7 +871,7 @@ static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
struct acpi_nfit_spa *spa = nfit_spa->spa;
struct nfit_memdev *nfit_memdev;
struct nd_region_desc ndr_desc;
- int spa_type, count = 0;
+ int spa_type, count = 0, rc;
struct resource res;
u16 spa_index;
@@ -852,6 +937,9 @@ static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
ndr_desc.nd_mapping = nd_mappings;
ndr_desc.num_mappings = count;
+ rc = nd_acpi_init_interleave_set(acpi_desc, &ndr_desc, spa);
+ if (rc)
+ return rc;
if (spa_type == NFIT_SPA_PM) {
if (!nd_pmem_region_create(acpi_desc->nd_bus, &ndr_desc))
return -ENOMEM;
diff --git a/drivers/block/nd/bus.c b/drivers/block/nd/bus.c
index 46568d182559..8afb8d4a7e81 100644
--- a/drivers/block/nd/bus.c
+++ b/drivers/block/nd/bus.c
@@ -78,7 +78,10 @@ static int nd_bus_probe(struct device *dev)
if (!try_module_get(provider))
return -ENXIO;
+ nd_region_probe_start(nd_bus, dev);
rc = nd_drv->probe(dev);
+ nd_region_probe_end(nd_bus, dev, rc);
+
dev_dbg(&nd_bus->dev, "%s.probe(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
if (rc != 0)
@@ -94,6 +97,8 @@ static int nd_bus_remove(struct device *dev)
int rc;
rc = nd_drv->remove(dev);
+ nd_region_notify_remove(nd_bus, dev, rc);
+
dev_dbg(&nd_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
dev_name(dev), rc);
module_put(provider);
@@ -359,6 +364,33 @@ u32 nd_cmd_out_size(struct nd_dimm *nd_dimm, int cmd,
}
EXPORT_SYMBOL_GPL(nd_cmd_out_size);
+static void wait_nd_bus_probe_idle(struct nd_bus *nd_bus)
+{
+ do {
+ if (nd_bus->probe_active == 0)
+ break;
+ nd_bus_unlock(&nd_bus->dev);
+ wait_event(nd_bus->probe_wait, nd_bus->probe_active == 0);
+ nd_bus_lock(&nd_bus->dev);
+ } while (true);
+}
+
+/* set_config requires an idle interleave set */
+static int nd_cmd_clear_to_send(struct nd_dimm *nd_dimm, unsigned int cmd)
+{
+ struct nd_bus *nd_bus;
+
+ if (!nd_dimm || cmd != ND_CMD_SET_CONFIG_DATA)
+ return 0;
+
+ nd_bus = walk_to_nd_bus(&nd_dimm->dev);
+ wait_nd_bus_probe_idle(nd_bus);
+
+ if (atomic_read(&nd_dimm->busy))
+ return -EBUSY;
+ return 0;
+}
+
static int __nd_ioctl(struct nd_bus *nd_bus, struct nd_dimm *nd_dimm,
int read_only, unsigned int ioctl_cmd, unsigned long arg)
{
@@ -469,11 +501,18 @@ static int __nd_ioctl(struct nd_bus *nd_bus, struct nd_dimm *nd_dimm,
goto out;
}
+ nd_bus_lock(&nd_bus->dev);
+ rc = nd_cmd_clear_to_send(nd_dimm, cmd);
+ if (rc)
+ goto out_unlock;
+
rc = nd_desc->ndctl(nd_desc, nd_dimm, cmd, buf, buf_len);
if (rc < 0)
- goto out;
+ goto out_unlock;
if (copy_to_user(p, buf, buf_len))
rc = -EFAULT;
+ out_unlock:
+ nd_bus_unlock(&nd_bus->dev);
out:
vfree(buf);
return rc;
diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
index 646e424ae36c..603970d0ef3a 100644
--- a/drivers/block/nd/core.c
+++ b/drivers/block/nd/core.c
@@ -24,6 +24,51 @@ LIST_HEAD(nd_bus_list);
DEFINE_MUTEX(nd_bus_list_mutex);
static DEFINE_IDA(nd_ida);
+void nd_bus_lock(struct device *dev)
+{
+ struct nd_bus *nd_bus = walk_to_nd_bus(dev);
+
+ if (!nd_bus)
+ return;
+ mutex_lock(&nd_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(nd_bus_lock);
+
+void nd_bus_unlock(struct device *dev)
+{
+ struct nd_bus *nd_bus = walk_to_nd_bus(dev);
+
+ if (!nd_bus)
+ return;
+ mutex_unlock(&nd_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(nd_bus_unlock);
+
+bool is_nd_bus_locked(struct device *dev)
+{
+ struct nd_bus *nd_bus = walk_to_nd_bus(dev);
+
+ if (!nd_bus)
+ return false;
+ return mutex_is_locked(&nd_bus->reconfig_mutex);
+}
+EXPORT_SYMBOL(is_nd_bus_locked);
+
+u64 nd_fletcher64(void __iomem *addr, size_t len)
+{
+ u32 lo32 = 0;
+ u64 hi32 = 0;
+ int i;
+
+ for (i = 0; i < len; i += 4) {
+ lo32 = readl(addr + i);
+ hi32 += lo32;
+ }
+
+ return hi32 << 32 | lo32;
+}
+EXPORT_SYMBOL_GPL(nd_fletcher64);
+
static void nd_bus_release(struct device *dev)
{
struct nd_bus *nd_bus = container_of(dev, struct nd_bus, dev);
@@ -142,7 +187,9 @@ struct nd_bus *__nd_bus_register(struct device *parent,
if (!nd_bus)
return NULL;
INIT_LIST_HEAD(&nd_bus->list);
+ init_waitqueue_head(&nd_bus->probe_wait);
nd_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
+ mutex_init(&nd_bus->reconfig_mutex);
if (nd_bus->id < 0) {
kfree(nd_bus);
return NULL;
diff --git a/drivers/block/nd/dimm_devs.c b/drivers/block/nd/dimm_devs.c
index 33b6d5336096..8981adc59ba4 100644
--- a/drivers/block/nd/dimm_devs.c
+++ b/drivers/block/nd/dimm_devs.c
@@ -185,7 +185,24 @@ static ssize_t commands_show(struct device *dev,
}
static DEVICE_ATTR_RO(commands);
+static ssize_t state_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct nd_dimm *nd_dimm = to_nd_dimm(dev);
+
+ /*
+ * The state may be in the process of changing, userspace should
+ * quiesce probing if it wants a static answer
+ */
+ nd_bus_lock(dev);
+ nd_bus_unlock(dev);
+ return sprintf(buf, "%s\n", atomic_read(&nd_dimm->busy)
+ ? "active" : "idle");
+}
+static DEVICE_ATTR_RO(state);
+
static struct attribute *nd_dimm_attributes[] = {
+ &dev_attr_state.attr,
&dev_attr_commands.attr,
NULL,
};
@@ -213,7 +230,7 @@ struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
nd_dimm->provider_data = provider_data;
nd_dimm->flags = flags;
nd_dimm->dsm_mask = dsm_mask;
-
+ atomic_set(&nd_dimm->busy, 0);
dev = &nd_dimm->dev;
dev_set_name(dev, "nmem%d", nd_dimm->id);
dev->parent = &nd_bus->dev;
diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
index 8c6f07696f30..deb29eff5b61 100644
--- a/drivers/block/nd/libnd.h
+++ b/drivers/block/nd/libnd.h
@@ -60,11 +60,16 @@ struct nd_cmd_desc {
int out_sizes[ND_CMD_MAX_ELEM];
};
+struct nd_interleave_set {
+ u64 cookie;
+};
+
struct nd_region_desc {
struct resource *res;
struct nd_mapping *nd_mapping;
u16 num_mappings;
const struct attribute_group **attr_groups;
+ struct nd_interleave_set *nd_set;
void *provider_data;
};
@@ -98,4 +103,5 @@ struct nd_region *nd_blk_region_create(struct nd_bus *nd_bus,
struct nd_region_desc *ndr_desc);
struct nd_region *nd_volatile_region_create(struct nd_bus *nd_bus,
struct nd_region_desc *ndr_desc);
+u64 nd_fletcher64(void __iomem *addr, size_t len);
#endif /* __LIBND_H__ */
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index 131fc66ce7ab..5d8249be3415 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -13,6 +13,8 @@
#ifndef __ND_PRIVATE_H__
#define __ND_PRIVATE_H__
#include <linux/device.h>
+#include <linux/sizes.h>
+#include <linux/mutex.h>
#include "libnd.h"
extern struct list_head nd_bus_list;
@@ -21,10 +23,12 @@ extern int nd_dimm_major;
struct nd_bus {
struct nd_bus_descriptor *nd_desc;
+ wait_queue_head_t probe_wait;
struct module *module;
struct list_head list;
struct device dev;
- int id;
+ int id, probe_active;
+ struct mutex reconfig_mutex;
};
struct nd_dimm {
@@ -32,6 +36,7 @@ struct nd_dimm {
void *provider_data;
unsigned long *dsm_mask;
struct device dev;
+ atomic_t busy;
int id;
};
@@ -45,10 +50,14 @@ int __init nd_dimm_init(void);
int __init nd_region_init(void);
void nd_dimm_exit(void);
int nd_region_exit(void);
+void nd_region_probe_start(struct nd_bus *nd_bus, struct device *dev);
+void nd_region_probe_end(struct nd_bus *nd_bus, struct device *dev, int rc);
+void nd_region_notify_remove(struct nd_bus *nd_bus, struct device *dev, int rc);
int nd_bus_create_ndctl(struct nd_bus *nd_bus);
void nd_bus_destroy_ndctl(struct nd_bus *nd_bus);
void nd_synchronize(void);
int nd_bus_register_dimms(struct nd_bus *nd_bus);
int nd_bus_register_regions(struct nd_bus *nd_bus);
+int nd_bus_init_interleave_sets(struct nd_bus *nd_bus);
int nd_match_dimm(struct device *dev, void *data);
#endif /* __ND_PRIVATE_H__ */
diff --git a/drivers/block/nd/nd.h b/drivers/block/nd/nd.h
index 23469513f4c0..c69707dbd272 100644
--- a/drivers/block/nd/nd.h
+++ b/drivers/block/nd/nd.h
@@ -35,6 +35,7 @@ struct nd_region {
u64 ndr_start;
int id;
void *provider_data;
+ struct nd_interleave_set *nd_set;
struct nd_mapping mapping[0];
};
@@ -50,4 +51,7 @@ int nd_dimm_init_config_data(struct nd_dimm_drvdata *ndd);
struct nd_region *to_nd_region(struct device *dev);
int nd_region_to_namespace_type(struct nd_region *nd_region);
int nd_region_register_namespaces(struct nd_region *nd_region, int *err);
+void nd_bus_lock(struct device *dev);
+void nd_bus_unlock(struct device *dev);
+bool is_nd_bus_locked(struct device *dev);
#endif /* __ND_H__ */
diff --git a/drivers/block/nd/region_devs.c b/drivers/block/nd/region_devs.c
index 49ebce0c97be..c1cea42d0473 100644
--- a/drivers/block/nd/region_devs.c
+++ b/drivers/block/nd/region_devs.c
@@ -10,7 +10,10 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
+#include <linux/scatterlist.h>
+#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/sort.h>
#include <linux/io.h>
#include "nd-private.h"
#include "nd.h"
@@ -133,6 +136,21 @@ static ssize_t nstype_show(struct device *dev,
}
static DEVICE_ATTR_RO(nstype);
+static ssize_t set_cookie_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_interleave_set *nd_set = nd_region->nd_set;
+
+ if (is_nd_pmem(dev) && nd_set)
+ /* pass, should be precluded by nd_region_visible */;
+ else
+ return -ENXIO;
+
+ return sprintf(buf, "%#llx\n", nd_set->cookie);
+}
+static DEVICE_ATTR_RO(set_cookie);
+
static ssize_t init_namespaces_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -149,15 +167,81 @@ static struct attribute *nd_region_attributes[] = {
&dev_attr_size.attr,
&dev_attr_nstype.attr,
&dev_attr_mappings.attr,
+ &dev_attr_set_cookie.attr,
&dev_attr_init_namespaces.attr,
NULL,
};
+static umode_t nd_region_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = container_of(kobj, typeof(*dev), kobj);
+ struct nd_region *nd_region = to_nd_region(dev);
+ struct nd_interleave_set *nd_set = nd_region->nd_set;
+
+ if (a != &dev_attr_set_cookie.attr)
+ return a->mode;
+
+ if (is_nd_pmem(dev) && nd_set)
+ return a->mode;
+
+ return 0;
+}
+
struct attribute_group nd_region_attribute_group = {
.attrs = nd_region_attributes,
+ .is_visible = nd_region_visible,
};
EXPORT_SYMBOL_GPL(nd_region_attribute_group);
+/*
+ * Upon successful probe/remove, take/release a reference on the
+ * associated interleave set (if present)
+ */
+static void nd_region_notify_driver_action(struct nd_bus *nd_bus,
+ struct device *dev, int rc, bool probe)
+{
+ if (rc)
+ return;
+
+ if (is_nd_pmem(dev) || is_nd_blk(dev)) {
+ struct nd_region *nd_region = to_nd_region(dev);
+ int i;
+
+ for (i = 0; i < nd_region->ndr_mappings; i++) {
+ struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+ struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
+
+ if (probe)
+ atomic_inc(&nd_dimm->busy);
+ else
+ atomic_dec(&nd_dimm->busy);
+ }
+ }
+}
+
+void nd_region_probe_start(struct nd_bus *nd_bus, struct device *dev)
+{
+ nd_bus_lock(&nd_bus->dev);
+ nd_bus->probe_active++;
+ nd_bus_unlock(&nd_bus->dev);
+}
+
+void nd_region_probe_end(struct nd_bus *nd_bus, struct device *dev, int rc)
+{
+ nd_bus_lock(&nd_bus->dev);
+ nd_region_notify_driver_action(nd_bus, dev, rc, true);
+ if (--nd_bus->probe_active == 0)
+ wake_up(&nd_bus->probe_wait);
+ nd_bus_unlock(&nd_bus->dev);
+}
+
+void nd_region_notify_remove(struct nd_bus *nd_bus, struct device *dev, int rc)
+{
+ nd_bus_lock(dev);
+ nd_region_notify_driver_action(nd_bus, dev, rc, false);
+ nd_bus_unlock(dev);
+}
+
static ssize_t mappingN(struct device *dev, char *buf, int n)
{
struct nd_region *nd_region = to_nd_region(dev);
@@ -317,6 +401,7 @@ static noinline struct nd_region *nd_region_create(struct nd_bus *nd_bus,
}
nd_region->ndr_mappings = ndr_desc->num_mappings;
nd_region->provider_data = ndr_desc->provider_data;
+ nd_region->nd_set = ndr_desc->nd_set;
dev = &nd_region->dev;
dev_set_name(dev, "region%d", nd_region->id);
dev->parent = &nd_bus->dev;
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (6 preceding siblings ...)
2015-04-28 18:25 ` [PATCH v2 12/20] libnd, nd_acpi: add interleave-set state-tracking infrastructure Dan Williams
@ 2015-04-28 20:52 ` Andy Lutomirski
2015-04-28 20:59 ` Dan Williams
2015-04-28 21:24 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
2015-04-29 0:25 ` Rafael J. Wysocki
9 siblings, 1 reply; 47+ messages in thread
From: Andy Lutomirski @ 2015-04-28 20:52 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> 1/ Ingo said [2]:
>
> "So why on earth is this whole concept and the naming itself
> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
> revolving around a specific 'firmware' mindset and revolving
> around specific, weirdly named, overly complicated looking
> firmware interfaces that come with their own new weird
> glossary??"
>
> Indeed, we of course consulted the NFIT specification to determine
> the shape of the sub-system, but then let its terms and data
> structures permeate too deep into the implementation. That is fixed
> now with all NFIT specifics factored out into acpi.c. The NFIT is no
> longer required reading to review libnd. Only three concepts are
> needed:
>
> i/ PMEM - contiguous memory range where cpu stores are
> persistent once they are flushed through the memory
> controller.
>
> ii/ BLK - mmio apertures (sliding windows) that can be
> programmed to access an aperture's-worth of persistent
> media at a time.
>
> iii/ DPA - "dimm-physical-address", address space local to a
> dimm. A dimm may provide both PMEM-mode and BLK-mode
> access to a range of DPA. libnd manages allocation of DPA
> to either PMEM or BLK-namespaces to resolve this aliasing.
Mostly for my understanding: is there a name for "address relative to
the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
apparent physical memory, possibly interleaved, broken up, or weirdly
remapped by the memory controller, would still have addresses between
0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
some might be BLK apertures, etc.
IIUC "DPA" refers to actual addressable storage, not this type of address?
--Andy
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 20:52 ` [PATCH v2 00/20] libnd: non-volatile memory device support Andy Lutomirski
@ 2015-04-28 20:59 ` Dan Williams
2015-04-28 21:06 ` Andy Lutomirski
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 20:59 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>
>> 1/ Ingo said [2]:
>>
>> "So why on earth is this whole concept and the naming itself
>> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>> revolving around a specific 'firmware' mindset and revolving
>> around specific, weirdly named, overly complicated looking
>> firmware interfaces that come with their own new weird
>> glossary??"
>>
>> Indeed, we of course consulted the NFIT specification to determine
>> the shape of the sub-system, but then let its terms and data
>> structures permeate too deep into the implementation. That is fixed
>> now with all NFIT specifics factored out into acpi.c. The NFIT is no
>> longer required reading to review libnd. Only three concepts are
>> needed:
>>
>> i/ PMEM - contiguous memory range where cpu stores are
>> persistent once they are flushed through the memory
>> controller.
>>
>> ii/ BLK - mmio apertures (sliding windows) that can be
>> programmed to access an aperture's-worth of persistent
>> media at a time.
>>
>> iii/ DPA - "dimm-physical-address", address space local to a
>> dimm. A dimm may provide both PMEM-mode and BLK-mode
>> access to a range of DPA. libnd manages allocation of DPA
>> to either PMEM or BLK-namespaces to resolve this aliasing.
>
> Mostly for my understanding: is there a name for "address relative to
> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
> apparent physical memory, possibly interleaved, broken up, or weirdly
> remapped by the memory controller, would still have addresses between
> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
> some might be BLK apertures, etc.
>
> IIUC "DPA" refers to actual addressable storage, not this type of address?
No, DPA is exactly as you describe above. You can't directly access
it except through a PMEM mapping (possibly interleaved with DPA from
other DIMMs) or a BLK aperture (mmio window into DPA).
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 20:59 ` Dan Williams
@ 2015-04-28 21:06 ` Andy Lutomirski
2015-04-28 22:28 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Andy Lutomirski @ 2015-04-28 21:06 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>>
>>> 1/ Ingo said [2]:
>>>
>>> "So why on earth is this whole concept and the naming itself
>>> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>> revolving around a specific 'firmware' mindset and revolving
>>> around specific, weirdly named, overly complicated looking
>>> firmware interfaces that come with their own new weird
>>> glossary??"
>>>
>>> Indeed, we of course consulted the NFIT specification to determine
>>> the shape of the sub-system, but then let its terms and data
>>> structures permeate too deep into the implementation. That is fixed
>>> now with all NFIT specifics factored out into acpi.c. The NFIT is no
>>> longer required reading to review libnd. Only three concepts are
>>> needed:
>>>
>>> i/ PMEM - contiguous memory range where cpu stores are
>>> persistent once they are flushed through the memory
>>> controller.
>>>
>>> ii/ BLK - mmio apertures (sliding windows) that can be
>>> programmed to access an aperture's-worth of persistent
>>> media at a time.
>>>
>>> iii/ DPA - "dimm-physical-address", address space local to a
>>> dimm. A dimm may provide both PMEM-mode and BLK-mode
>>> access to a range of DPA. libnd manages allocation of DPA
>>> to either PMEM or BLK-namespaces to resolve this aliasing.
>>
>> Mostly for my understanding: is there a name for "address relative to
>> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
>> apparent physical memory, possibly interleaved, broken up, or weirdly
>> remapped by the memory controller, would still have addresses between
>> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
>> some might be BLK apertures, etc.
>>
>> IIUC "DPA" refers to actual addressable storage, not this type of address?
>
> No, DPA is exactly as you describe above. You can't directly access
> it except through a PMEM mapping (possibly interleaved with DPA from
> other DIMMs) or a BLK aperture (mmio window into DPA).
So the thing I'm describing has no name, then? Oh, well.
--Andy
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (7 preceding siblings ...)
2015-04-28 20:52 ` [PATCH v2 00/20] libnd: non-volatile memory device support Andy Lutomirski
@ 2015-04-28 21:24 ` Elliott, Robert (Server Storage)
2015-04-28 22:15 ` Dan Williams
2015-04-29 0:25 ` Rafael J. Wysocki
9 siblings, 1 reply; 47+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-04-28 21:24 UTC (permalink / raw)
To: Dan Williams, linux-nvdimm@lists.01.org
Cc: Neil Brown, Dave Chinner, H. Peter Anvin, Christoph Hellwig,
Rafael J. Wysocki, Robert Moore, Ingo Molnar,
linux-acpi@vger.kernel.org, Jens Axboe, Borislav Petkov,
Thomas Gleixner, Greg KH, linux-kernel@vger.kernel.org,
Andy Lutomirski, Andrew Morton, Linus Torvalds
> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
> Dan Williams
> Sent: Tuesday, April 28, 2015 1:24 PM
> To: linux-nvdimm@lists.01.org
> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
> Wysocki; Robert Moore; Ingo Molnar; linux-acpi@vger.kernel.org; Jens Axboe;
> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
> Andy Lutomirski; Andrew Morton; Linus Torvalds
> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
> support
>
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
Here are some comments on the sysfs properties reported for a pmem device.
They are based on v1, but I don't think v2 changes anything.
1. This confuses lsblk (part of util-linux):
/sys/block/pmem0/device/type:4
lsblk shows:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
pmem0 251:0 0 8G 0 worm
pmem1 251:16 0 8G 0 worm
pmem2 251:32 0 8G 0 worm
pmem3 251:48 0 8G 0 worm
pmem4 251:64 0 8G 0 worm
pmem5 251:80 0 8G 0 worm
pmem6 251:96 0 8G 0 worm
pmem7 251:112 0 8G 0 worm
lsblk's blkdev_scsi_type_to_name() considers 4 to mean
SCSI_TYPE_WORM (write once read many ... used for certain optical
and tape drives).
I'm not sure what nd and pmem are doing to result in that value.
2. To avoid confusing software trying to detect fast storage vs.
slow storage devices via sysfs, this value should be 0:
/sys/block/pmem0/queue/rotational:1
That can be done by adding this shortly after the blk_alloc_queue call:
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);
3. Is there any reason to have a 512 KiB limit on the transfer
length?
/sys/block/pmem0/queue/max_hw_sectors_kb:512
That is from:
blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
4. These are read-writeable, but IOs never reach a queue, so
the queue size is irrelevant and merging never happens:
/sys/block/pmem0/queue/nomerges:0
/sys/block/pmem0/queue/nr_requests:128
Consider making them both read-only with:
* nomerges set to 2 (no merging happening)
* nr_requests as small as the block layer allows to avoid
wasting memory.
5. No scatter-gather lists are created by the driver, so these
read-only fields are meaningless:
/sys/block/pmem0/queue/max_segments:128
/sys/block/pmem0/queue/max_segment_size:65536
Is there a better way to report them as irrelevant?
6. There is no completion processing, so the read-writeable
cpu affinity is not used:
/sys/block/pmem0/queue/rq_affinity:0
Consider making it read-only and set to 2, meaning the
completions always run on the requesting CPU.
7. With mmap() allowing less than logical block sized accesses
to the device, this could be considered misleading:
/sys/block/pmem0/queue/physical_block_size:512
Perhaps that needs to be 1 byte or a cacheline size (64 bytes
on x86) to indicate that direct partial logical block accesses
are possible. The btt driver could report 512 as one indication
it is different.
I wouldn't be surprised if smaller values than the logical block
size confused some software, though.
---
Robert Elliott, HP Server Storage
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 21:24 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
@ 2015-04-28 22:15 ` Dan Williams
2015-05-07 7:29 ` Christoph Hellwig
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 22:15 UTC (permalink / raw)
To: Elliott, Robert (Server Storage)
Cc: linux-nvdimm@lists.01.org, Neil Brown, Dave Chinner,
H. Peter Anvin, Christoph Hellwig, Wysocki, Rafael J,
Moore, Robert, Ingo Molnar, linux-acpi@vger.kernel.org,
Jens Axboe, Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andy Lutomirski, Andrew Morton,
Linus Torvalds
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage)
<Elliott@hp.com> wrote:
>> -----Original Message-----
>> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
>> Dan Williams
>> Sent: Tuesday, April 28, 2015 1:24 PM
>> To: linux-nvdimm@lists.01.org
>> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
>> Wysocki; Robert Moore; Ingo Molnar; linux-acpi@vger.kernel.org; Jens Axboe;
>> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
>> Andy Lutomirski; Andrew Morton; Linus Torvalds
>> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
>> support
>>
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> Here are some comments on the sysfs properties reported for a pmem device.
> They are based on v1, but I don't think v2 changes anything.
>
> 1. This confuses lsblk (part of util-linux):
> /sys/block/pmem0/device/type:4
>
> lsblk shows:
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> pmem0 251:0 0 8G 0 worm
> pmem1 251:16 0 8G 0 worm
> pmem2 251:32 0 8G 0 worm
> pmem3 251:48 0 8G 0 worm
> pmem4 251:64 0 8G 0 worm
> pmem5 251:80 0 8G 0 worm
> pmem6 251:96 0 8G 0 worm
> pmem7 251:112 0 8G 0 worm
>
> lsblk's blkdev_scsi_type_to_name() considers 4 to mean
> SCSI_TYPE_WORM (write once read many ... used for certain optical
> and tape drives).
Why is lsblk assuming these are scsi devices? I'll need to go check that out.
> I'm not sure what nd and pmem are doing to result in that value.
That is their libnd specific device type number from
include/uapi/ndctl.h. 4 == ND_DEVICE_NAMESPACE_IO. lsblk has no
business interpreting this as something SCSI specific.
> 2. To avoid confusing software trying to detect fast storage vs.
> slow storage devices via sysfs, this value should be 0:
> /sys/block/pmem0/queue/rotational:1
>
> That can be done by adding this shortly after the blk_alloc_queue call:
> queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);
Yeah, good catch.
> 3. Is there any reason to have a 512 KiB limit on the transfer
> length?
> /sys/block/pmem0/queue/max_hw_sectors_kb:512
>
> That is from:
> blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
I'd only change this from the default if performance testing showed it
made a non-trivial difference.
> 4. These are read-writeable, but IOs never reach a queue, so
> the queue size is irrelevant and merging never happens:
> /sys/block/pmem0/queue/nomerges:0
> /sys/block/pmem0/queue/nr_requests:128
>
> Consider making them both read-only with:
> * nomerges set to 2 (no merging happening)
> * nr_requests as small as the block layer allows to avoid
> wasting memory.
>
> 5. No scatter-gather lists are created by the driver, so these
> read-only fields are meaningless:
> /sys/block/pmem0/queue/max_segments:128
> /sys/block/pmem0/queue/max_segment_size:65536
>
> Is there a better way to report them as irrelevant?
Again it comes back to the question of whether these default settings
are actively harmful.
>
> 6. There is no completion processing, so the read-writeable
> cpu affinity is not used:
> /sys/block/pmem0/queue/rq_affinity:0
>
> Consider making it read-only and set to 2, meaning the
> completions always run on the requesting CPU.
There are no completions with pmem, the entire I/O path is
synchronous. Ideally, this attribute would disappear for a pmem
queue, not be set to 2.
> 7. With mmap() allowing less than logical block sized accesses
> to the device, this could be considered misleading:
> /sys/block/pmem0/queue/physical_block_size:512
I don't see how it is misleading. If you access it as a block device
the block size is 512. If the application is mmap() + DAX aware it
knows that the physical_block_size is being bypassed.
>
> Perhaps that needs to be 1 byte or a cacheline size (64 bytes
> on x86) to indicate that direct partial logical block accesses
> are possible.
No, because that breaks the definition of a block device. Through the
bdev interface it's always accessed a block at a time.
> The btt driver could report 512 as one indication
> it is different.
>
> I wouldn't be surprised if smaller values than the logical block
> size confused some software, though.
Precisely why we shouldn't go there with pmem.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 21:06 ` Andy Lutomirski
@ 2015-04-28 22:28 ` Dan Williams
2015-04-28 23:05 ` Andy Lutomirski
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-28 22:28 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>>>
>>>> 1/ Ingo said [2]:
>>>>
>>>> "So why on earth is this whole concept and the naming itself
>>>> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>>> revolving around a specific 'firmware' mindset and revolving
>>>> around specific, weirdly named, overly complicated looking
>>>> firmware interfaces that come with their own new weird
>>>> glossary??"
>>>>
>>>> Indeed, we of course consulted the NFIT specification to determine
>>>> the shape of the sub-system, but then let its terms and data
>>>> structures permeate too deep into the implementation. That is fixed
>>>> now with all NFIT specifics factored out into acpi.c. The NFIT is no
>>>> longer required reading to review libnd. Only three concepts are
>>>> needed:
>>>>
>>>> i/ PMEM - contiguous memory range where cpu stores are
>>>> persistent once they are flushed through the memory
>>>> controller.
>>>>
>>>> ii/ BLK - mmio apertures (sliding windows) that can be
>>>> programmed to access an aperture's-worth of persistent
>>>> media at a time.
>>>>
>>>> iii/ DPA - "dimm-physical-address", address space local to a
>>>> dimm. A dimm may provide both PMEM-mode and BLK-mode
>>>> access to a range of DPA. libnd manages allocation of DPA
>>>> to either PMEM or BLK-namespaces to resolve this aliasing.
>>>
>>> Mostly for my understanding: is there a name for "address relative to
>>> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
>>> apparent physical memory, possibly interleaved, broken up, or weirdly
>>> remapped by the memory controller, would still have addresses between
>>> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
>>> some might be BLK apertures, etc.
>>>
>>> IIUC "DPA" refers to actual addressable storage, not this type of address?
>>
>> No, DPA is exactly as you describe above. You can't directly access
>> it except through a PMEM mapping (possibly interleaved with DPA from
>> other DIMMs) or a BLK aperture (mmio window into DPA).
>
> So the thing I'm describing has no name, then? Oh, well.
What? The thing you are describing *is* DPA.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 22:28 ` Dan Williams
@ 2015-04-28 23:05 ` Andy Lutomirski
2015-04-30 20:56 ` Ross Zwisler
0 siblings, 1 reply; 47+ messages in thread
From: Andy Lutomirski @ 2015-04-28 23:05 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>>>>
>>>>> 1/ Ingo said [2]:
>>>>>
>>>>> "So why on earth is this whole concept and the naming itself
>>>>> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>>>> revolving around a specific 'firmware' mindset and revolving
>>>>> around specific, weirdly named, overly complicated looking
>>>>> firmware interfaces that come with their own new weird
>>>>> glossary??"
>>>>>
>>>>> Indeed, we of course consulted the NFIT specification to determine
>>>>> the shape of the sub-system, but then let its terms and data
>>>>> structures permeate too deep into the implementation. That is fixed
>>>>> now with all NFIT specifics factored out into acpi.c. The NFIT is no
>>>>> longer required reading to review libnd. Only three concepts are
>>>>> needed:
>>>>>
>>>>> i/ PMEM - contiguous memory range where cpu stores are
>>>>> persistent once they are flushed through the memory
>>>>> controller.
>>>>>
>>>>> ii/ BLK - mmio apertures (sliding windows) that can be
>>>>> programmed to access an aperture's-worth of persistent
>>>>> media at a time.
>>>>>
>>>>> iii/ DPA - "dimm-physical-address", address space local to a
>>>>> dimm. A dimm may provide both PMEM-mode and BLK-mode
>>>>> access to a range of DPA. libnd manages allocation of DPA
>>>>> to either PMEM or BLK-namespaces to resolve this aliasing.
>>>>
>>>> Mostly for my understanding: is there a name for "address relative to
>>>> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
>>>> apparent physical memory, possibly interleaved, broken up, or weirdly
>>>> remapped by the memory controller, would still have addresses between
>>>> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
>>>> some might be BLK apertures, etc.
>>>>
>>>> IIUC "DPA" refers to actual addressable storage, not this type of address?
>>>
>>> No, DPA is exactly as you describe above. You can't directly access
>>> it except through a PMEM mapping (possibly interleaved with DPA from
>>> other DIMMs) or a BLK aperture (mmio window into DPA).
>>
>> So the thing I'm describing has no name, then? Oh, well.
>
> What? The thing you are describing *is* DPA.
I'm confused. Here are the two things I have in mind:
1. An address into on-DIMM storage. If I have a DIMM that is mapped
to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
apertures, say), then this address runs from 0 to 64 GB.
2. An address into the DIMM's view of physical address space. If I
have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
storage (accessed through BLK apertures, say), then this address runs
from 0 to 8 GB. There's a one-to-one mapping between SPA and this
type of address.
Since you said "a dimm may provide both PMEM-mode and BLK-mode access
to a range of DPA.," I thought that DPA was #1.
--Andy
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
` (8 preceding siblings ...)
2015-04-28 21:24 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
@ 2015-04-29 0:25 ` Rafael J. Wysocki
2015-04-29 1:22 ` Dan Williams
9 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2015-04-29 0:25 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, linux-acpi, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH, linux-kernel,
Andy Lutomirski, Andrew Morton, Linus Torvalds
On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> 1/ Ingo said [2]:
>
> "So why on earth is this whole concept and the naming itself
> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
> revolving around a specific 'firmware' mindset and revolving
> around specific, weirdly named, overly complicated looking
> firmware interfaces that come with their own new weird
> glossary??"
>
> Indeed, we of course consulted the NFIT specification to determine
> the shape of the sub-system, but then let its terms and data
> structures permeate too deep into the implementation. That is fixed
> now with all NFIT specifics factored out into acpi.c. The NFIT is no
> longer required reading to review libnd. Only three concepts are
> needed:
>
> i/ PMEM - contiguous memory range where cpu stores are
> persistent once they are flushed through the memory
> controller.
>
> ii/ BLK - mmio apertures (sliding windows) that can be
> programmed to access an aperture's-worth of persistent
> media at a time.
>
> iii/ DPA - "dimm-physical-address", address space local to a
> dimm. A dimm may provide both PMEM-mode and BLK-mode
> access to a range of DPA. libnd manages allocation of DPA
> to either PMEM or BLK-namespaces to resolve this aliasing.
>
> The v1..v2 diffstat below shows the migration of nfit-specifics to
> acpi.c and the new state of libnd being nfit-free. "nd" now only
> refers to "non-volatile devices". Note, reworked documentation will
> return once the review has settled.
>
> Documentation/blockdev/nd.txt | 867 ---------------------
> MAINTAINERS | 34 +-
> arch/ia64/kernel/efi.c | 5 +-
> arch/x86/kernel/e820.c | 11 +-
> arch/x86/kernel/pmem.c | 2 +-
> drivers/block/Makefile | 2 +-
> drivers/block/nd/Kconfig | 135 ++--
> drivers/block/nd/Makefile | 32 +-
> drivers/block/nd/acpi.c | 1506 +++++++++++++++++++++++++++++++------
> drivers/block/nd/acpi_nfit.h | 321 ++++++++
> drivers/block/nd/blk.c | 27 +-
> drivers/block/nd/btt.c | 6 +-
> drivers/block/nd/btt_devs.c | 8 +-
> drivers/block/nd/bus.c | 337 +++++----
> drivers/block/nd/core.c | 574 +-------------
> drivers/block/nd/dimm.c | 11 -
> drivers/block/nd/dimm_devs.c | 292 ++-----
> drivers/block/nd/e820.c | 100 +++
> drivers/block/nd/libnd.h | 122 +++
> drivers/block/nd/namespace_devs.c | 10 +-
> drivers/block/nd/nd-private.h | 107 +--
> drivers/block/nd/nd.h | 91 +--
> drivers/block/nd/nfit.h | 238 ------
> drivers/block/nd/pmem.c | 56 +-
> drivers/block/nd/region.c | 78 +-
> drivers/block/nd/region_devs.c | 783 +++----------------
> drivers/block/nd/test/iomap.c | 86 +--
> drivers/block/nd/test/nfit.c | 1115 +++++++++++++++------------
> drivers/block/nd/test/nfit_test.h | 15 +-
> include/uapi/linux/ndctl.h | 130 ++--
> 30 files changed, 3166 insertions(+), 3935 deletions(-)
> delete mode 100644 Documentation/blockdev/nd.txt
> create mode 100644 drivers/block/nd/acpi_nfit.h
> create mode 100644 drivers/block/nd/e820.c
> create mode 100644 drivers/block/nd/libnd.h
> delete mode 100644 drivers/block/nd/nfit.h
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
> [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
>
> 2/ Christoph asked the pmem ida conversion to be moved to its own patch
> (done), and to consider leaving the current pmem.c in drivers/block/.
> Instead, I converted the e820-type-12 enabling to be the first
> non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply
> registers e820-type-12 ranges as libnd PMEM regions. Among other
> things this conversion enables BTT for these ranges. The alternative
> is to move drivers/block/nd/nd.h internals out to include/linux/
> which I think is worse.
>
> 3/ Toshi reported that the NFIT parsing fails to handle the case of a
> PMEM range with a single-dimm (non-aliasing) interleave description.
> Support for this case was added and is tested by default by the
> nfit_test.1 configuration.
>
> 4/ Toshi reported that we should not be treating a missing _STA property
> as a "dimm disabled by firmware" case. (fixed).
>
> 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
> arch code. It is gone for now and we'll revisit when adding cached
> mappings back to the PMEM driver.
>
> 6/ Toshi mentioned that the presence of two different nd_bus_probe()
> functions was confusing. (cleaned up).
>
> 7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).
>
> 8/ Linda asked for nfit_test to honor dynamic cma reservations via the
> cma= command line (done). The cma requirements have also been
> reduced to 128M as only the simulated DAX regions need CMA. The rest
> can use vmalloc().
>
> ---
>
> Available here:
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm nd-v2
>
> ---
>
> Dan Williams (18):
> e820, efi: add ACPI 6.0 persistent memory types
> libnd, nd_acpi: initial libnd infrastructure and NFIT support
> nd_acpi, nfit-test: manufactured NFITs for interface development
> libnd: ndctl class device, and nd bus attributes
> libnd, nd_acpi: dimm/memory-devices
> libnd: ndctl.h, the nd ioctl abi
> libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
> libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
> libnd: support for legacy (non-aliasing) nvdimms
> pmem: use ida
> libnd, nd_pmem: add libnd support to the pmem driver
> libnd, nd_acpi: add interleave-set state-tracking infrastructure
> libnd: namespace indices: read and validate
> libnd: pmem label sets and namespace instantiation.
> libnd: blk labels and namespace instantiation
> libnd: write pmem label set
> libnd: write blk label set
> libnd: infrastructure for btt devices
>
> Ross Zwisler (1):
> libnd, nd_acpi, nd_blk: driver for BLK-mode access persistent memory
>
> Vishal Verma (1):
> nd_btt: atomic sector updates
I'm wondering what's wrong with CCing all of the series to linux-acpi?
Is there anything in it that the people on that list should not see, by any
chance?
Like patch [01/20], for the most obvious example. It even has "ACPI 6" in the
subject ...
Rafael
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-29 0:25 ` Rafael J. Wysocki
@ 2015-04-29 1:22 ` Dan Williams
2015-05-05 0:06 ` Rafael J. Wysocki
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-04-29 1:22 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-nvdimm@lists.01.org, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andy Lutomirski, Andrew Morton
On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>
>> 1/ Ingo said [2]:
>>
>> "So why on earth is this whole concept and the naming itself
>> ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>> revolving around a specific 'firmware' mindset and revolving
>> around specific, weirdly named, overly complicated looking
>> firmware interfaces that come with their own new weird
>> glossary??"
>>
>> Indeed, we of course consulted the NFIT specification to determine
>> the shape of the sub-system, but then let its terms and data
>> structures permeate too deep into the implementation. That is fixed
>> now with all NFIT specifics factored out into acpi.c. The NFIT is no
>> longer required reading to review libnd. Only three concepts are
>> needed:
>>
>> i/ PMEM - contiguous memory range where cpu stores are
>> persistent once they are flushed through the memory
>> controller.
>>
>> ii/ BLK - mmio apertures (sliding windows) that can be
>> programmed to access an aperture's-worth of persistent
>> media at a time.
>>
>> iii/ DPA - "dimm-physical-address", address space local to a
>> dimm. A dimm may provide both PMEM-mode and BLK-mode
>> access to a range of DPA. libnd manages allocation of DPA
>> to either PMEM or BLK-namespaces to resolve this aliasing.
>>
>> The v1..v2 diffstat below shows the migration of nfit-specifics to
>> acpi.c and the new state of libnd being nfit-free. "nd" now only
>> refers to "non-volatile devices". Note, reworked documentation will
>> return once the review has settled.
>>
>> Documentation/blockdev/nd.txt | 867 ---------------------
>> MAINTAINERS | 34 +-
>> arch/ia64/kernel/efi.c | 5 +-
>> arch/x86/kernel/e820.c | 11 +-
>> arch/x86/kernel/pmem.c | 2 +-
>> drivers/block/Makefile | 2 +-
>> drivers/block/nd/Kconfig | 135 ++--
>> drivers/block/nd/Makefile | 32 +-
>> drivers/block/nd/acpi.c | 1506 +++++++++++++++++++++++++++++++------
>> drivers/block/nd/acpi_nfit.h | 321 ++++++++
>> drivers/block/nd/blk.c | 27 +-
>> drivers/block/nd/btt.c | 6 +-
>> drivers/block/nd/btt_devs.c | 8 +-
>> drivers/block/nd/bus.c | 337 +++++----
>> drivers/block/nd/core.c | 574 +-------------
>> drivers/block/nd/dimm.c | 11 -
>> drivers/block/nd/dimm_devs.c | 292 ++-----
>> drivers/block/nd/e820.c | 100 +++
>> drivers/block/nd/libnd.h | 122 +++
>> drivers/block/nd/namespace_devs.c | 10 +-
>> drivers/block/nd/nd-private.h | 107 +--
>> drivers/block/nd/nd.h | 91 +--
>> drivers/block/nd/nfit.h | 238 ------
>> drivers/block/nd/pmem.c | 56 +-
>> drivers/block/nd/region.c | 78 +-
>> drivers/block/nd/region_devs.c | 783 +++----------------
>> drivers/block/nd/test/iomap.c | 86 +--
>> drivers/block/nd/test/nfit.c | 1115 +++++++++++++++------------
>> drivers/block/nd/test/nfit_test.h | 15 +-
>> include/uapi/linux/ndctl.h | 130 ++--
>> 30 files changed, 3166 insertions(+), 3935 deletions(-)
>> delete mode 100644 Documentation/blockdev/nd.txt
>> create mode 100644 drivers/block/nd/acpi_nfit.h
>> create mode 100644 drivers/block/nd/e820.c
>> create mode 100644 drivers/block/nd/libnd.h
>> delete mode 100644 drivers/block/nd/nfit.h
>>
>> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
>> [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
>>
>> 2/ Christoph asked the pmem ida conversion to be moved to its own patch
>> (done), and to consider leaving the current pmem.c in drivers/block/.
>> Instead, I converted the e820-type-12 enabling to be the first
>> non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply
>> registers e820-type-12 ranges as libnd PMEM regions. Among other
>> things this conversion enables BTT for these ranges. The alternative
>> is to move drivers/block/nd/nd.h internals out to include/linux/
>> which I think is worse.
>>
>> 3/ Toshi reported that the NFIT parsing fails to handle the case of a
>> PMEM range with a single-dimm (non-aliasing) interleave description.
>> Support for this case was added and is tested by default by the
>> nfit_test.1 configuration.
>>
>> 4/ Toshi reported that we should not be treating a missing _STA property
>> as a "dimm disabled by firmware" case. (fixed).
>>
>> 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
>> arch code. It is gone for now and we'll revisit when adding cached
>> mappings back to the PMEM driver.
>>
>> 6/ Toshi mentioned that the presence of two different nd_bus_probe()
>> functions was confusing. (cleaned up).
>>
>> 7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).
>>
>> 8/ Linda asked for nfit_test to honor dynamic cma reservations via the
>> cma= command line (done). The cma requirements have also been
>> reduced to 128M as only the simulated DAX regions need CMA. The rest
>> can use vmalloc().
>>
>> ---
>>
>> Available here:
>> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm nd-v2
>>
>> ---
>>
>> Dan Williams (18):
>> e820, efi: add ACPI 6.0 persistent memory types
>> libnd, nd_acpi: initial libnd infrastructure and NFIT support
>> nd_acpi, nfit-test: manufactured NFITs for interface development
>> libnd: ndctl class device, and nd bus attributes
>> libnd, nd_acpi: dimm/memory-devices
>> libnd: ndctl.h, the nd ioctl abi
>> libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
>> libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
>> libnd: support for legacy (non-aliasing) nvdimms
>> pmem: use ida
>> libnd, nd_pmem: add libnd support to the pmem driver
>> libnd, nd_acpi: add interleave-set state-tracking infrastructure
>> libnd: namespace indices: read and validate
>> libnd: pmem label sets and namespace instantiation.
>> libnd: blk labels and namespace instantiation
>> libnd: write pmem label set
>> libnd: write blk label set
>> libnd: infrastructure for btt devices
>>
>> Ross Zwisler (1):
>> libnd, nd_acpi, nd_blk: driver for BLK-mode access persistent memory
>>
>> Vishal Verma (1):
>> nd_btt: atomic sector updates
>
> I'm wondering what's wrong with CCing all of the series to linux-acpi?
>
> Is there anything in it that the people on that list should not see, by any
> chance?
linux-acpi may not care about the dimm-metadata labeling patches that
are completely independent of ACPI, but might as well include
linux-acpi on the whole series at this point.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-04-28 18:24 ` [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory) Dan Williams
@ 2015-04-29 15:53 ` Elliott, Robert (Server Storage)
2015-04-29 15:59 ` Dan Williams
2015-05-04 20:26 ` Toshi Kani
1 sibling, 1 reply; 47+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-04-29 15:53 UTC (permalink / raw)
To: Dan Williams, linux-nvdimm@lists.01.org
Cc: Neil Brown, Greg KH, Rafael J. Wysocki, Robert Moore,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org
> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
> Dan Williams
> Sent: Tuesday, April 28, 2015 1:25 PM
> Subject: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-
> data-window, persistent memory, volatile memory)
>
> A "region" device represents the maximum capacity of a BLK range (mmio
> block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
> volatile memory), without regard for aliasing. Aliasing, in the
> dimm-local address space (DPA), is resolved by metadata on a dimm to
> designate which exclusive interface will access the aliased DPA ranges.
> Support for the per-dimm metadata/label arrvies is in a subsequent
> patch.
>
> The name format of "region" devices is "regionN" where, like dimms, N is
> a global ida index assigned at discovery time. This id is not reliable
> across reboots nor in the presence of hotplug. Look to attributes of
> the region or static id-data of the sub-namespace to generate a
> persistent name.
...
> +++ b/drivers/block/nd/region_devs.c
...
> +static noinline struct nd_region *nd_region_create(struct nd_bus *nd_bus,
> + struct nd_region_desc *ndr_desc, struct device_type *dev_type)
> +{
> + struct nd_region *nd_region;
> + struct device *dev;
> + u16 i;
> +
> + for (i = 0; i < ndr_desc->num_mappings; i++) {
> + struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
> + struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
> +
> + if ((nd_mapping->start | nd_mapping->size) % SZ_4K) {
> + dev_err(&nd_bus->dev, "%pf: %s mapping%d is not 4K
> aligned\n",
> + __builtin_return_address(0),
Please use "KiB" rather than the unclear "K".
Same comment for a dev_dbg print in patch 14.
> + dev_name(&nd_dimm->dev), i);
> +
> + return NULL;
> + }
> + }
> +
> + nd_region = kzalloc(sizeof(struct nd_region)
> + + sizeof(struct nd_mapping) * ndr_desc->num_mappings,
> + GFP_KERNEL);
> + if (!nd_region)
> + return NULL;
> + nd_region->id = ida_simple_get(®ion_ida, 0, 0, GFP_KERNEL);
> + if (nd_region->id < 0) {
> + kfree(nd_region);
> + return NULL;
> + }
> +
> + memcpy(nd_region->mapping, ndr_desc->nd_mapping,
> + sizeof(struct nd_mapping) * ndr_desc->num_mappings);
> + for (i = 0; i < ndr_desc->num_mappings; i++) {
> + struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
> + struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
> +
> + get_device(&nd_dimm->dev);
> + }
> + nd_region->ndr_mappings = ndr_desc->num_mappings;
> + nd_region->provider_data = ndr_desc->provider_data;
> + dev = &nd_region->dev;
> + dev_set_name(dev, "region%d", nd_region->id);
Could this include "nd" in the name, like "ndregion%d"?
The other dev_set_name calls in this patch set use:
btt%d
ndbus%d
nmem%d
namespace%d.%d
which are a bit more distinctive.
---
Robert Elliott, HP Server Storage
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-04-29 15:53 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
@ 2015-04-29 15:59 ` Dan Williams
0 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-04-29 15:59 UTC (permalink / raw)
To: Elliott, Robert (Server Storage)
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Wysocki, Rafael J,
Moore, Robert, linux-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org
On Wed, Apr 29, 2015 at 8:53 AM, Elliott, Robert (Server Storage)
<Elliott@hp.com> wrote:
>> -----Original Message-----
>> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
>> Dan Williams
>> Sent: Tuesday, April 28, 2015 1:25 PM
>> Subject: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-
>> data-window, persistent memory, volatile memory)
>>
>> A "region" device represents the maximum capacity of a BLK range (mmio
>> block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
>> volatile memory), without regard for aliasing. Aliasing, in the
>> dimm-local address space (DPA), is resolved by metadata on a dimm to
>> designate which exclusive interface will access the aliased DPA ranges.
>> Support for the per-dimm metadata/label arrvies is in a subsequent
>> patch.
>>
>> The name format of "region" devices is "regionN" where, like dimms, N is
>> a global ida index assigned at discovery time. This id is not reliable
>> across reboots nor in the presence of hotplug. Look to attributes of
>> the region or static id-data of the sub-namespace to generate a
>> persistent name.
> ...
>> +++ b/drivers/block/nd/region_devs.c
> ...
>> +static noinline struct nd_region *nd_region_create(struct nd_bus *nd_bus,
>> + struct nd_region_desc *ndr_desc, struct device_type *dev_type)
>> +{
>> + struct nd_region *nd_region;
>> + struct device *dev;
>> + u16 i;
>> +
>> + for (i = 0; i < ndr_desc->num_mappings; i++) {
>> + struct nd_mapping *nd_mapping = &ndr_desc->nd_mapping[i];
>> + struct nd_dimm *nd_dimm = nd_mapping->nd_dimm;
>> +
>> + if ((nd_mapping->start | nd_mapping->size) % SZ_4K) {
>> + dev_err(&nd_bus->dev, "%pf: %s mapping%d is not 4K
>> aligned\n",
>> + __builtin_return_address(0),
>
> Please use "KiB" rather than the unclear "K".
Ok.
> Same comment for a dev_dbg print in patch 14.
It's a debug statement, but ok.
[..]
>
> Could this include "nd" in the name, like "ndregion%d"?
>
> The other dev_set_name calls in this patch set use:
> btt%d
> ndbus%d
> nmem%d
> namespace%d.%d
>
> which are a bit more distinctive.
They sit on an "nd" bus and don't have global device nodes, I don't
see a need to make them anymore distinctive.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 23:05 ` Andy Lutomirski
@ 2015-04-30 20:56 ` Ross Zwisler
0 siblings, 0 replies; 47+ messages in thread
From: Ross Zwisler @ 2015-04-30 20:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Dan Williams, linux-nvdimm, Boaz Harrosh, Neil Brown,
Dave Chinner, H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki,
Robert Moore, Christoph Hellwig, Linux ACPI, Jeff Moyer,
Nicholas Moulin, Matthew Wilcox, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andrew Morton, Linus Torvalds
On Tue, 2015-04-28 at 16:05 -0700, Andy Lutomirski wrote:
> On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> >>>> Mostly for my understanding: is there a name for "address relative to
> >>>> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of
> >>>> apparent physical memory, possibly interleaved, broken up, or weirdly
> >>>> remapped by the memory controller, would still have addresses between
> >>>> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO,
> >>>> some might be BLK apertures, etc.
> >>>>
> >>>> IIUC "DPA" refers to actual addressable storage, not this type of address?
> >>>
> >>> No, DPA is exactly as you describe above. You can't directly access
> >>> it except through a PMEM mapping (possibly interleaved with DPA from
> >>> other DIMMs) or a BLK aperture (mmio window into DPA).
> >>
> >> So the thing I'm describing has no name, then? Oh, well.
> >
> > What? The thing you are describing *is* DPA.
>
> I'm confused. Here are the two things I have in mind:
>
> 1. An address into on-DIMM storage. If I have a DIMM that is mapped
> to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
> apertures, say), then this address runs from 0 to 64 GB.
>
> 2. An address into the DIMM's view of physical address space. If I
> have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
> storage (accessed through BLK apertures, say), then this address runs
> from 0 to 8 GB. There's a one-to-one mapping between SPA and this
> type of address.
>
> Since you said "a dimm may provide both PMEM-mode and BLK-mode access
> to a range of DPA.," I thought that DPA was #1.
>
> --Andy
I think that you've got the right definition, #1 above, for DPA. The DPA is
relative to the DIMM, knows nothing about interleaving or SPA or anything else
in the system, and is basically equivalent to the idea of an LBA on a disk. A
DIMM that has 64 GiB of storage could have a DPA space ranging from 0 to 64
GiB.
The second concept is a little trickier - we've been talking about this by
using the term "N-way interleave set". Say you have your 64 GiB DIMM and only
the first 8 GiB are given to the OS in an SPA, and that DIMM isn't interleaved
with any other DIMMs. This would be a 1-way interleave set, ranging from DPA
0 - 8GiB on the DIMM.
If you have 2 DIMMs of size 64 GiB, and they each have a 8 GiB region given to
the SPA space, those two regions could be interleaved together. The OS would
then see a 16 GiB 2-way interleave set, made up of DPAs 0 -> 8 GiB on each of
the two DIMMs.
You can figure out exactly how all the interleaving works by looking at the
SPA tables, the Memory Device tables and the Interleave Tables.
These are in sections 5.2.25.1 - 5.2.25.3 in ACPI 6, and are in our code as
struct acpi_nfit_spa, struct acpi_nfit_memdev and struct acpi_nfit_idt.
- Ross
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
@ 2015-04-30 23:23 ` Rafael J. Wysocki
2015-05-01 0:39 ` Dan Williams
2015-05-15 19:44 ` [Linux-nvdimm] " Jeff Moyer
1 sibling, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2015-04-30 23:23 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, linux-acpi, Rafael J. Wysocki, Robert Moore,
linux-kernel, David Box
On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote:
> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
> "ACPI0012"
>
> 2/ libnd bus registration
>
> The NFIT provided by ACPI is one possible method by which platforms will
> discover NVDIMM resources. However, the intent of the nd_bus_descriptor
> abstraction is to abstract "provider" specific details, leaving libnd
> to be independent of the specific NVDIMM resource discovery mechanism.
> This flexibility is later exploited later to implement custom-defined nd
> buses.
>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: Robert Moore <robert.moore@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/block/Kconfig | 2
> drivers/block/Makefile | 1
> drivers/block/nd/Kconfig | 40 +++
> drivers/block/nd/Makefile | 6 +
> drivers/block/nd/acpi.c | 475 +++++++++++++++++++++++++++++++++++++++++
> drivers/block/nd/acpi_nfit.h | 254 ++++++++++++++++++++++
> drivers/block/nd/core.c | 67 ++++++
> drivers/block/nd/libnd.h | 33 +++
> drivers/block/nd/nd-private.h | 23 ++
> 9 files changed, 901 insertions(+)
> create mode 100644 drivers/block/nd/Kconfig
> create mode 100644 drivers/block/nd/Makefile
> create mode 100644 drivers/block/nd/acpi.c
> create mode 100644 drivers/block/nd/acpi_nfit.h
> create mode 100644 drivers/block/nd/core.c
> create mode 100644 drivers/block/nd/libnd.h
> create mode 100644 drivers/block/nd/nd-private.h
>
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index eb1fed5bd516..dfe40e5ca9bd 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -321,6 +321,8 @@ config BLK_DEV_NVME
> To compile this driver as a module, choose M here: the
> module will be called nvme.
>
> +source "drivers/block/nd/Kconfig"
> +
> config BLK_DEV_SKD
> tristate "STEC S1120 Block Driver"
> depends on PCI
> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> index 9cc6c18a1c7e..07a6acecf4d8 100644
> --- a/drivers/block/Makefile
> +++ b/drivers/block/Makefile
> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
> obj-$(CONFIG_MG_DISK) += mg_disk.o
> obj-$(CONFIG_SUNVDC) += sunvdc.o
> obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
> +obj-$(CONFIG_ND_DEVICES) += nd/
> obj-$(CONFIG_BLK_DEV_SKD) += skd.o
> obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o
>
> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
> new file mode 100644
> index 000000000000..6d5d6b732f82
> --- /dev/null
> +++ b/drivers/block/nd/Kconfig
> @@ -0,0 +1,40 @@
> +menuconfig ND_DEVICES
> + bool "NVDIMM Support"
> + depends on PHYS_ADDR_T_64BIT
> + help
> + Generic support for non-volatile memory devices including
> + ACPI-6-NFIT defined resources. On platforms that define an
> + NFIT, or otherwise can discover NVDIMM resources, a libnd
> + bus is registered to advertise PMEM (persistent memory)
> + namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
> + namespaces (/dev/ndX). A PMEM namespace refers to a memory
> + resource that may span multiple DIMMs and support DAX (see
> + CONFIG_DAX). A BLK namespace refers to an NVDIMM control
> + region which exposes an mmio register set for windowed
> + access mode to non-volatile memory.
> +
> +if ND_DEVICES
> +
> +config LIBND
> + tristate "LIBND: libnd device driver support"
> + help
> + Platform agnostic device model for a libnd bus. Publishes
> + resources for a PMEM (persistent-memory) driver and/or BLK
> + (sliding mmio window(s)) driver to attach. Exposes a device
> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
> + message passing interface, and a "/dev/nmemX" dimm-ioctl
> + message interface for each memory device registered on the
> + bus. instance. A userspace library "ndctl" provides an API
> + to enumerate/manage this subsystem.
> +
> +config ND_ACPI
> + tristate "ACPI: NFIT to libnd bus support"
> + select LIBND
> + depends on ACPI
> + help
> + Infrastructure to probe ACPI 6 compliant platforms for
> + NVDIMMs (NFIT) and register a libnd device tree. In
> + addition to storage devices this also enables libnd craft
> + ACPI._DSM messages for platform/dimm configuration.
I'm wondering if the two CONFIG options above really need to be user-selectable?
For example, what reason people (who've already selected ND_DEVICES) may have
for not selecting ND_ACPI if ACPI is set?
> +
> +endif
> diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
> new file mode 100644
> index 000000000000..944b5947c0cb
> --- /dev/null
> +++ b/drivers/block/nd/Makefile
> @@ -0,0 +1,6 @@
> +obj-$(CONFIG_LIBND) += libnd.o
> +obj-$(CONFIG_ND_ACPI) += nd_acpi.o
> +
> +nd_acpi-y := acpi.o
> +
> +libnd-y := core.o
OK, so it looks like no modules, just built-in code, right?
> diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
> new file mode 100644
> index 000000000000..9f0b24390d1b
> --- /dev/null
> +++ b/drivers/block/nd/acpi.c
> @@ -0,0 +1,475 @@
> +/*
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#include <linux/list_sort.h>
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/acpi.h>
> +#include "acpi_nfit.h"
> +#include "libnd.h"
> +
> +static bool warn_checksum;
> +module_param(warn_checksum, bool, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(warn_checksum, "Turn checksum errors into warnings");
> +
> +enum {
> + NFIT_ACPI_NOTIFY_TABLE = 0x80,
> +};
> +
> +static int nd_acpi_ctl(struct nd_bus_descriptor *nd_desc,
> + struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
> + unsigned int buf_len)
> +{
> + return -ENOTTY;
> +}
Why -ENOTTY? And why not to leave a NULL entry for this instead and
make the library fail it in that case?
> +
> +static const char *spa_type_name(u16 type)
> +{
> + switch (type) {
> + case NFIT_SPA_VOLATILE: return "volatile";
> + case NFIT_SPA_PM: return "pmem";
> + case NFIT_SPA_DCR: return "dimm-control-region";
> + case NFIT_SPA_BDW: return "block-data-window";
> + default: return "unknown";
> + }
> +}
> +
> +static int nfit_spa_type(struct acpi_nfit_spa *spa)
> +{
> + if (memcmp(&nfit_spa_uuid_volatile, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_VOLATILE;
> +
> + if (memcmp(&nfit_spa_uuid_pm, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_PM;
> +
> + if (memcmp(&nfit_spa_uuid_dcr, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_DCR;
> +
> + if (memcmp(&nfit_spa_uuid_bdw, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_BDW;
> +
> + if (memcmp(&nfit_spa_uuid_vdisk, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_VDISK;
> +
> + if (memcmp(&nfit_spa_uuid_vcd, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_VCD;
> +
> + if (memcmp(&nfit_spa_uuid_pdisk, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_PDISK;
> +
> + if (memcmp(&nfit_spa_uuid_pcd, spa->type_uuid, 16) == 0)
> + return NFIT_SPA_PCD;
> +
> + return -1;
> +}
> +
> +struct nfit_table_header {
> + __le16 type;
> + __le16 length;
> +};
That you'll be able to get from ACPICA I suppose?
> +
> +static void *add_table(struct acpi_nfit_desc *acpi_desc, void *table, const void *end)
> +{
> + struct device *dev = acpi_desc->dev;
> + struct nfit_table_header *hdr;
> + void *err = ERR_PTR(-ENOMEM);
> +
> + if (table >= end)
> + return NULL;
> +
> + hdr = (struct nfit_table_header *) table;
> + switch (hdr->type) {
> + case NFIT_TABLE_SPA: {
> + struct nfit_spa *nfit_spa = devm_kzalloc(dev, sizeof(*nfit_spa),
> + GFP_KERNEL);
> + struct acpi_nfit_spa *spa = table;
> +
> + if (!nfit_spa)
> + return err;
> + INIT_LIST_HEAD(&nfit_spa->list);
> + nfit_spa->spa = spa;
> + list_add_tail(&nfit_spa->list, &acpi_desc->spas);
> + dev_dbg(dev, "%s: spa index: %d type: %s\n", __func__,
> + spa->spa_index,
> + spa_type_name(nfit_spa_type(spa)));
> + break;
> + }
> + case NFIT_TABLE_MEM: {
> + struct nfit_memdev *nfit_memdev = devm_kzalloc(dev,
> + sizeof(*nfit_memdev), GFP_KERNEL);
> + struct acpi_nfit_memdev *memdev = table;
> +
> + if (!nfit_memdev)
> + return err;
> + INIT_LIST_HEAD(&nfit_memdev->list);
> + nfit_memdev->memdev = memdev;
> + list_add_tail(&nfit_memdev->list, &acpi_desc->memdevs);
> + dev_dbg(dev, "%s: memdev handle: %#x spa: %d dcr: %d\n",
> + __func__, memdev->nfit_handle, memdev->spa_index,
> + memdev->dcr_index);
> + break;
> + }
> + case NFIT_TABLE_DCR: {
> + struct nfit_dcr *nfit_dcr = devm_kzalloc(dev, sizeof(*nfit_dcr),
> + GFP_KERNEL);
> + struct acpi_nfit_dcr *dcr = table;
> +
> + if (!nfit_dcr)
> + return err;
> + INIT_LIST_HEAD(&nfit_dcr->list);
> + nfit_dcr->dcr = dcr;
> + list_add_tail(&nfit_dcr->list, &acpi_desc->dcrs);
> + dev_dbg(dev, "%s: dcr index: %d num_bcw: %d\n", __func__,
> + dcr->dcr_index, dcr->num_bcw);
> + break;
> + }
> + case NFIT_TABLE_BDW: {
> + struct nfit_bdw *nfit_bdw = devm_kzalloc(dev, sizeof(*nfit_bdw),
> + GFP_KERNEL);
> + struct acpi_nfit_bdw *bdw = table;
> +
> + if (!nfit_bdw)
> + return err;
> + INIT_LIST_HEAD(&nfit_bdw->list);
> + nfit_bdw->bdw = bdw;
> + list_add_tail(&nfit_bdw->list, &acpi_desc->bdws);
> + dev_dbg(dev, "%s: bdw dcr: %d num_bdw: %d\n", __func__,
> + bdw->dcr_index, bdw->num_bdw);
> + break;
> + }
> + /* TODO */
> + case NFIT_TABLE_IDT:
> + dev_dbg(dev, "%s: idt\n", __func__);
> + break;
> + case NFIT_TABLE_FLUSH:
> + dev_dbg(dev, "%s: flush\n", __func__);
> + break;
> + case NFIT_TABLE_SMBIOS:
> + dev_dbg(dev, "%s: smbios\n", __func__);
> + break;
> + default:
> + dev_err(dev, "unknown table '%d' parsing nfit\n", hdr->type);
> + return ERR_PTR(-ENXIO);
> + }
> +
> + return table + hdr->length;
> +}
> +
> +static void nfit_mem_find_spa_bdw(struct acpi_nfit_desc *acpi_desc,
> + struct nfit_mem *nfit_mem)
> +{
> + u32 nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
> + u16 dcr_index = nfit_mem->dcr->dcr_index;
> + struct nfit_spa *nfit_spa;
> +
> + list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> + u16 spa_index = nfit_spa->spa->spa_index;
> + int type = nfit_spa_type(nfit_spa->spa);
> + struct nfit_memdev *nfit_memdev;
> +
> + if (type != NFIT_SPA_BDW)
> + continue;
> +
> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
> + if (nfit_memdev->memdev->spa_index != spa_index)
> + continue;
> + if (nfit_memdev->memdev->nfit_handle != nfit_handle)
> + continue;
> + if (nfit_memdev->memdev->dcr_index != dcr_index)
> + continue;
> +
> + nfit_mem->spa_bdw = nfit_spa->spa;
> + return;
> + }
> + }
> +
> + dev_dbg(acpi_desc->dev, "SPA-BDW not found for SPA-DCR %d\n",
> + nfit_mem->spa_dcr->spa_index);
> + nfit_mem->bdw = NULL;
> +}
> +
> +static int nfit_mem_add(struct acpi_nfit_desc *acpi_desc,
> + struct nfit_mem *nfit_mem, struct acpi_nfit_spa *spa)
> +{
> + u16 dcr_index = __to_nfit_memdev(nfit_mem)->dcr_index;
> + struct nfit_dcr *nfit_dcr;
> + struct nfit_bdw *nfit_bdw;
> +
> + list_for_each_entry(nfit_dcr, &acpi_desc->dcrs, list) {
> + if (nfit_dcr->dcr->dcr_index != dcr_index)
> + continue;
> + nfit_mem->dcr = nfit_dcr->dcr;
> + break;
> + }
> +
> + if (!nfit_mem->dcr) {
> + dev_dbg(acpi_desc->dev, "SPA %d missing:%s%s\n", spa->spa_index,
> + __to_nfit_memdev(nfit_mem) ? "" : " MEMDEV",
> + nfit_mem->dcr ? "" : " DCR");
> + return -ENODEV;
> + }
> +
> + /*
> + * We've found enough to create an nd_dimm, optionally
> + * find an associated BDW
> + */
> + list_add(&nfit_mem->list, &acpi_desc->dimms);
> +
> + list_for_each_entry(nfit_bdw, &acpi_desc->bdws, list) {
> + if (nfit_bdw->bdw->dcr_index != dcr_index)
> + continue;
> + nfit_mem->bdw = nfit_bdw->bdw;
> + break;
> + }
> +
> + if (!nfit_mem->bdw)
> + return 0;
> +
> + nfit_mem_find_spa_bdw(acpi_desc, nfit_mem);
> + return 0;
> +}
> +
> +static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
> + struct acpi_nfit_spa *spa)
> +{
> + struct nfit_mem *nfit_mem, *found;
> + struct nfit_memdev *nfit_memdev;
> + int type = nfit_spa_type(spa);
> + u16 dcr_index;
> +
> + switch (type) {
> + case NFIT_SPA_DCR:
> + case NFIT_SPA_PM:
> + break;
> + default:
> + return 0;
> + }
> +
> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
> + int rc;
> +
> + if (nfit_memdev->memdev->spa_index != spa->spa_index)
> + continue;
> + found = NULL;
> + dcr_index = nfit_memdev->memdev->dcr_index;
> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
> + if (__to_nfit_memdev(nfit_mem)->dcr_index == dcr_index) {
> + found = nfit_mem;
> + break;
> + }
> +
> + if (found)
> + nfit_mem = found;
> + else {
> + nfit_mem = devm_kzalloc(acpi_desc->dev,
> + sizeof(*nfit_mem), GFP_KERNEL);
> + if (!nfit_mem)
> + return -ENOMEM;
> + INIT_LIST_HEAD(&nfit_mem->list);
> + }
> +
> + if (type == NFIT_SPA_DCR) {
> + /* multiple dimms may share a SPA when interleaved */
> + nfit_mem->spa_dcr = spa;
> + nfit_mem->memdev_dcr = nfit_memdev->memdev;
> + } else {
> + /*
> + * A single dimm may belong to multiple SPA-PM
> + * ranges, record at least one in addition to
> + * any SPA-DCR range.
> + */
> + nfit_mem->memdev_pmem = nfit_memdev->memdev;
> + }
> +
> + if (found)
> + continue;
> +
> + rc = nfit_mem_add(acpi_desc, nfit_mem, spa);
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +static int nfit_mem_cmp(void *priv, struct list_head *__a, struct list_head *__b)
> +{
> + struct nfit_mem *a = container_of(__a, typeof(*a), list);
> + struct nfit_mem *b = container_of(__b, typeof(*b), list);
> + u32 handleA, handleB;
> +
> + handleA = __to_nfit_memdev(a)->nfit_handle;
> + handleB = __to_nfit_memdev(b)->nfit_handle;
> + if (handleA < handleB)
> + return -1;
> + else if (handleA > handleB)
> + return 1;
> + return 0;
> +}
> +
> +static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
> +{
> + struct nfit_spa *nfit_spa;
> +
> + /*
> + * For each SPA-DCR or SPA-PMEM address range find its
> + * corresponding MEMDEV(s). From each MEMDEV find the
> + * corresponding DCR. Then, if we're operating on a SPA-DCR,
> + * try to find a SPA-BDW and a corresponding BDW that references
> + * the DCR. Throw it all into an nfit_mem object. Note, that
> + * BDWs are optional.
> + */
> + list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> + int rc;
> +
> + rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
> + if (rc)
> + return rc;
> + }
> +
> + list_sort(NULL, &acpi_desc->dimms, nfit_mem_cmp);
> +
> + return 0;
> +}
> +
> +static int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> +{
> + struct device *dev = acpi_desc->dev;
> + const void *end;
> + u8 *data, sum;
> + acpi_size i;
> +
> + INIT_LIST_HEAD(&acpi_desc->spas);
> + INIT_LIST_HEAD(&acpi_desc->dcrs);
> + INIT_LIST_HEAD(&acpi_desc->bdws);
> + INIT_LIST_HEAD(&acpi_desc->memdevs);
> + INIT_LIST_HEAD(&acpi_desc->dimms);
> +
> + data = (u8 *) acpi_desc->nfit;
> + for (i = 0, sum = 0; i < sz; i++)
> + sum += readb(data + i);
> + if (sum != 0 && !warn_checksum) {
> + dev_dbg(dev, "%s: nfit checksum failure\n", __func__);
> + return -ENXIO;
> + }
> + WARN_TAINT_ONCE(sum != 0, TAINT_FIRMWARE_WORKAROUND,
> + "nfit checksum failure, continuing...\n");
> +
> + end = data + sz;
> + data += sizeof(struct acpi_nfit);
> + while (!IS_ERR_OR_NULL(data))
> + data = add_table(acpi_desc, data, end);
This looks like we are expecting a series of tables here and we're going to
fail the whole discovery if just one of them in invalid.
I wonder if it would be practical to skip just the invalid ones instead?
> +
> + if (IS_ERR(data)) {
> + dev_dbg(dev, "%s: nfit table parsing error: %ld\n", __func__,
> + PTR_ERR(data));
> + return PTR_ERR(data);
> + }
> +
> + if (nfit_mem_init(acpi_desc) != 0)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static int nd_acpi_add(struct acpi_device *adev)
> +{
> + struct nd_bus_descriptor *nd_desc;
> + struct acpi_nfit_desc *acpi_desc;
> + struct device *dev = &adev->dev;
> + struct acpi_table_header *tbl;
> + acpi_status status = AE_OK;
> + acpi_size sz;
> + int rc;
> +
> + status = acpi_get_table_with_size("NFIT", 0, &tbl, &sz);
> + if (ACPI_FAILURE(status)) {
> + dev_err(dev, "failed to find NFIT\n");
> + return -ENXIO;
> + }
> +
> + acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> + if (!acpi_desc)
> + return -ENOMEM;
> +
> + dev_set_drvdata(dev, acpi_desc);
> + acpi_desc->dev = dev;
> + acpi_desc->nfit = (struct acpi_nfit *) tbl;
> + nd_desc = &acpi_desc->nd_desc;
> + nd_desc->provider_name = "ACPI.NFIT";
> + nd_desc->ndctl = nd_acpi_ctl;
> +
> + acpi_desc->nd_bus = nd_bus_register(dev, nd_desc);
> + if (!acpi_desc->nd_bus)
> + return -ENXIO;
> +
> + rc = nd_acpi_nfit_init(acpi_desc, sz);
> + if (rc) {
> + nd_bus_unregister(acpi_desc->nd_bus);
> + return rc;
> + }
> + return 0;
> +}
> +
> +static int nd_acpi_remove(struct acpi_device *adev)
> +{
> + struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
> +
> + nd_bus_unregister(acpi_desc->nd_bus);
> + return 0;
> +}
> +
> +static void nd_acpi_notify(struct acpi_device *adev, u32 event)
> +{
> + /* TODO: handle ACPI_NOTIFY_BUS_CHECK notification */
> + dev_dbg(&adev->dev, "%s: event: %d\n", __func__, event);
> +}
> +
> +static const struct acpi_device_id nd_acpi_ids[] = {
> + { "ACPI0012", 0 },
> + { "", 0 },
> +};
> +MODULE_DEVICE_TABLE(acpi, nd_acpi_ids);
> +
> +static struct acpi_driver nd_acpi_driver = {
> + .name = KBUILD_MODNAME,
> + .ids = nd_acpi_ids,
> + .flags = ACPI_DRIVER_ALL_NOTIFY_EVENTS,
> + .ops = {
> + .add = nd_acpi_add,
> + .remove = nd_acpi_remove,
> + .notify = nd_acpi_notify
> + },
> +};
Since this is going to be non-modular built-in code, please use an ACPI
scan handler instead of using a driver here. acpi_memhotplug.c does that,
you can use it as an example, but I guess you don't need to enable hotplug
for it to start with.
> +
> +static __init int nd_acpi_init(void)
> +{
> + BUILD_BUG_ON(sizeof(struct acpi_nfit) != 40);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_spa) != 56);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_memdev) != 48);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_idt) != 16);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_smbios) != 8);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_dcr) != 80);
> + BUILD_BUG_ON(sizeof(struct acpi_nfit_bdw) != 40);
> +
> + return acpi_bus_register_driver(&nd_acpi_driver);
> +}
> +
> +static __exit void nd_acpi_exit(void)
> +{
> + acpi_bus_unregister_driver(&nd_acpi_driver);
> +}
> +
> +module_init(nd_acpi_init);
> +module_exit(nd_acpi_exit);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Intel Corporation");
> diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
> new file mode 100644
> index 000000000000..e0b0f12736bf
> --- /dev/null
> +++ b/drivers/block/nd/acpi_nfit.h
I'm assuming that the below is coordinated with Bob and David and will be
changed to use ACPICA-provided definitions going forward.
Is that correct?
> @@ -0,0 +1,254 @@
> +/*
> + * NVDIMM Firmware Interface Table - NFIT
> + *
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#ifndef __NFIT_H__
> +#define __NFIT_H__
> +#include <linux/types.h>
> +#include <linux/uuid.h>
> +#include <linux/acpi.h>
> +#include "libnd.h"
> +
> +static const uuid_le nfit_spa_uuid_volatile __maybe_unused = UUID_LE(0x7305944f,
> + 0xfdda, 0x44e3, 0xb1, 0x6c, 0x3f, 0x22, 0xd2, 0x52, 0xe5, 0xd0);
> +
> +static const uuid_le nfit_spa_uuid_pm __maybe_unused = UUID_LE(0x66f0d379,
> + 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb);
> +
> +static const uuid_le nfit_spa_uuid_dcr __maybe_unused = UUID_LE(0x92f701f6,
> + 0x13b4, 0x405d, 0x91, 0x0b, 0x29, 0x93, 0x67, 0xe8, 0x23, 0x4c);
> +
> +static const uuid_le nfit_spa_uuid_bdw __maybe_unused = UUID_LE(0x91af0530,
> + 0x5d86, 0x470e, 0xa6, 0xb0, 0x0a, 0x2d, 0xb9, 0x40, 0x82, 0x49);
> +
> +static const uuid_le nfit_spa_uuid_vdisk __maybe_unused = UUID_LE(0x77ab535a,
> + 0x45fc, 0x624b, 0x55, 0x60, 0xf7, 0xb2, 0x81, 0xd1, 0xf9, 0x6e);
> +
> +static const uuid_le nfit_spa_uuid_vcd __maybe_unused = UUID_LE(0x3d5abd30,
> + 0x4175, 0x87ce, 0x6d, 0x64, 0xd2, 0xad, 0xe5, 0x23, 0xc4, 0xbb);
> +
> +static const uuid_le nfit_spa_uuid_pdisk __maybe_unused = UUID_LE(0x5cea02c9,
> + 0x4d07, 0x69d3, 0x26, 0x9f, 0x44, 0x96, 0xfb, 0xe0, 0x96, 0xf9);
> +
> +static const uuid_le nfit_spa_uuid_pcd __maybe_unused = UUID_LE(0x08018188,
> + 0x42cd, 0xbb48, 0x10, 0x0f, 0x53, 0x87, 0xd5, 0x3d, 0xed, 0x3d);
> +
> +enum {
> + NFIT_TABLE_SPA = 0,
> + NFIT_TABLE_MEM = 1,
> + NFIT_TABLE_IDT = 2,
> + NFIT_TABLE_SMBIOS = 3,
> + NFIT_TABLE_DCR = 4,
> + NFIT_TABLE_BDW = 5,
> + NFIT_TABLE_FLUSH = 6,
> + NFIT_SPA_VOLATILE = 0,
> + NFIT_SPA_PM = 1,
> + NFIT_SPA_DCR = 2,
> + NFIT_SPA_BDW = 3,
> + NFIT_SPA_VDISK = 4,
> + NFIT_SPA_VCD = 5,
> + NFIT_SPA_PDISK = 6,
> + NFIT_SPA_PCD = 7,
> + NFIT_SPAF_DCR_HOT_ADD = 1 << 0,
> + NFIT_SPAF_PDVALID = 1 << 1,
> + NFIT_MEMF_SAVE_FAIL = 1 << 0,
> + NFIT_MEMF_RESTORE_FAIL = 1 << 1,
> + NFIT_MEMF_FLUSH_FAIL = 1 << 2,
> + NFIT_MEMF_UNARMED = 1 << 3,
> + NFIT_MEMF_NOTIFY_SMART = 1 << 4,
> + NFIT_MEMF_SMART_READY = 1 << 5,
> + NFIT_DCRF_BUFFERED = 1 << 0,
> +};
> +
> +/**
> + * struct acpi_nfit - Nvdimm Firmware Interface Table
> + * @signature: "NFIT"
> + * @length: sum of size of this table plus all appended subtables
> + */
> +struct acpi_nfit {
> + u8 signature[4];
> + u32 length;
> + u8 revision;
> + u8 checksum;
> + u8 oemid[6];
> + u64 oem_tbl_id;
> + u32 oem_revision;
> + u32 creator_id;
> + u32 creator_revision;
> + u32 reserved;
> +};
> +
> +/**
> + * struct acpi_nfit_spa - System Physical Address Range Descriptor Table
> + */
> +struct acpi_nfit_spa {
> + u16 type;
> + u16 length;
> + u16 spa_index;
> + u16 flags;
> + u32 reserved;
> + u32 proximity_domain;
> + u8 type_uuid[16];
> + u64 spa_base;
> + u64 spa_length;
> + u64 mem_attr;
> +};
> +
> +/**
> + * struct acpi_nfit_mem - Memory Device to SPA Mapping Table
> + */
> +struct acpi_nfit_memdev {
> + u16 type;
> + u16 length;
> + u32 nfit_handle;
> + u16 phys_id;
> + u16 region_id;
> + u16 spa_index;
> + u16 dcr_index;
> + u64 region_len;
> + u64 region_spa_offset;
> + u64 region_dpa;
> + u16 idt_index;
> + u16 interleave_ways;
> + u16 flags;
> + u16 reserved;
> +};
> +
> +/**
> + * struct acpi_nfit_idt - Interleave description Table
> + */
> +struct acpi_nfit_idt {
> + u16 type;
> + u16 length;
> + u16 idt_index;
> + u16 reserved;
> + u32 num_lines;
> + u32 line_size;
> + u32 line_offset[0];
> +};
> +
> +/**
> + * struct acpi_nfit_smbios - SMBIOS Management Information Table
> + */
> +struct acpi_nfit_smbios {
> + u16 type;
> + u16 length;
> + u32 reserved;
> + u8 data[0];
> +};
> +
> +/**
> + * struct acpi_nfit_dcr - NVDIMM Control Region Table
> + * @fic: Format Interface Code
> + * @cmd_offset: command registers relative to block control window
> + * @status_offset: status registers relative to block control window
> + */
> +struct acpi_nfit_dcr {
> + u16 type;
> + u16 length;
> + u16 dcr_index;
> + u16 vendor_id;
> + u16 device_id;
> + u16 revision_id;
> + u16 sub_vendor_id;
> + u16 sub_device_id;
> + u16 sub_revision_id;
> + u8 reserved[6];
> + u32 serial_number;
> + u16 fic;
> + u16 num_bcw;
> + u64 bcw_size;
> + u64 cmd_offset;
> + u64 cmd_size;
> + u64 status_offset;
> + u64 status_size;
> + u16 flags;
> + u8 reserved2[6];
> +};
> +
> +/**
> + * struct acpi_nfit_bdw - NVDIMM Block Data Window Region Table
> + */
> +struct acpi_nfit_bdw {
> + u16 type;
> + u16 length;
> + u16 dcr_index;
> + u16 num_bdw;
> + u64 bdw_offset;
> + u64 bdw_size;
> + u64 blk_capacity;
> + u64 blk_offset;
> +};
> +
> +/**
> + * struct acpi_nfit_flush - Flush Hint Address Structure
> + */
> +struct acpi_nfit_flush {
> + u16 type;
> + u16 length;
> + u32 nfit_handle;
> + u16 num_hints;
> + u8 reserved[6];
> + u64 hint_addr[0];
> +};
> +
> +struct nfit_spa {
> + struct acpi_nfit_spa *spa;
> + struct list_head list;
> +};
> +
> +struct nfit_dcr {
> + struct acpi_nfit_dcr *dcr;
> + struct list_head list;
> +};
> +
> +struct nfit_bdw {
> + struct acpi_nfit_bdw *bdw;
> + struct list_head list;
> +};
> +
> +struct nfit_memdev {
> + struct acpi_nfit_memdev *memdev;
> + struct list_head list;
> +};
> +
> +/* assembled tables for a given dimm/memory-device */
> +struct nfit_mem {
> + struct acpi_nfit_memdev *memdev_dcr;
> + struct acpi_nfit_memdev *memdev_pmem;
> + struct acpi_nfit_dcr *dcr;
> + struct acpi_nfit_bdw *bdw;
> + struct acpi_nfit_spa *spa_dcr;
> + struct acpi_nfit_spa *spa_bdw;
> + struct list_head list;
> +};
> +
> +struct acpi_nfit_desc {
> + struct nd_bus_descriptor nd_desc;
> + struct acpi_nfit *nfit;
> + struct list_head memdevs;
> + struct list_head dimms;
> + struct list_head spas;
> + struct list_head dcrs;
> + struct list_head bdws;
> + struct nd_bus *nd_bus;
> + struct device *dev;
> +};
> +
> +static inline struct acpi_nfit_memdev *__to_nfit_memdev(struct nfit_mem *nfit_mem)
> +{
> + if (nfit_mem->memdev_dcr)
> + return nfit_mem->memdev_dcr;
> + return nfit_mem->memdev_pmem;
> +}
> +#endif /* __NFIT_H__ */
> diff --git a/drivers/block/nd/core.c b/drivers/block/nd/core.c
> new file mode 100644
> index 000000000000..3cccdbc0f3b7
> --- /dev/null
> +++ b/drivers/block/nd/core.c
> @@ -0,0 +1,67 @@
> +/*
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#include <linux/export.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include "nd-private.h"
> +#include "libnd.h"
> +
> +static DEFINE_IDA(nd_ida);
> +
> +static void nd_bus_release(struct device *dev)
> +{
> + struct nd_bus *nd_bus = container_of(dev, struct nd_bus, dev);
> +
> + ida_simple_remove(&nd_ida, nd_bus->id);
> + kfree(nd_bus);
> +}
> +
> +struct nd_bus *nd_bus_register(struct device *parent,
> + struct nd_bus_descriptor *nd_desc)
> +{
> + struct nd_bus *nd_bus = kzalloc(sizeof(*nd_bus), GFP_KERNEL);
> + int rc;
> +
> + if (!nd_bus)
> + return NULL;
> + nd_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
> + if (nd_bus->id < 0) {
> + kfree(nd_bus);
> + return NULL;
> + }
> + nd_bus->nd_desc = nd_desc;
> + nd_bus->dev.parent = parent;
> + nd_bus->dev.release = nd_bus_release;
> + dev_set_name(&nd_bus->dev, "ndbus%d", nd_bus->id);
> + rc = device_register(&nd_bus->dev);
> + if (rc) {
> + dev_dbg(&nd_bus->dev, "device registration failed: %d\n", rc);
> + put_device(&nd_bus->dev);
> + return NULL;
> + }
> +
> + return nd_bus;
> +}
> +EXPORT_SYMBOL_GPL(nd_bus_register);
> +
> +void nd_bus_unregister(struct nd_bus *nd_bus)
> +{
> + if (!nd_bus)
> + return;
> + device_unregister(&nd_bus->dev);
> +}
> +EXPORT_SYMBOL_GPL(nd_bus_unregister);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Intel Corporation");
> diff --git a/drivers/block/nd/libnd.h b/drivers/block/nd/libnd.h
> new file mode 100644
> index 000000000000..163832937e9c
> --- /dev/null
> +++ b/drivers/block/nd/libnd.h
> @@ -0,0 +1,33 @@
> +/*
> + * libnd - Non-volatile-memory Devices Subsystem
> + *
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#ifndef __LIBND_H__
> +#define __LIBND_H__
> +struct nd_dimm;
> +struct nd_bus_descriptor;
> +typedef int (*ndctl_fn)(struct nd_bus_descriptor *nd_desc,
> + struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
> + unsigned int buf_len);
> +
> +struct nd_bus_descriptor {
> + unsigned long dsm_mask;
> + char *provider_name;
> + ndctl_fn ndctl;
> +};
> +
> +struct nd_bus;
> +struct nd_bus *nd_bus_register(struct device *parent,
> + struct nd_bus_descriptor *nfit_desc);
> +void nd_bus_unregister(struct nd_bus *nd_bus);
> +#endif /* __LIBND_H__ */
> diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
> new file mode 100644
> index 000000000000..3dbab29fa0f9
> --- /dev/null
> +++ b/drivers/block/nd/nd-private.h
> @@ -0,0 +1,23 @@
> +/*
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#ifndef __ND_PRIVATE_H__
> +#define __ND_PRIVATE_H__
> +#include <linux/device.h>
> +#include "libnd.h"
> +
> +struct nd_bus {
> + struct nd_bus_descriptor *nd_desc;
> + struct device dev;
> + int id;
> +};
> +#endif /* __ND_PRIVATE_H__ */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-04-30 23:23 ` Rafael J. Wysocki
@ 2015-05-01 0:39 ` Dan Williams
2015-05-01 1:21 ` Rafael J. Wysocki
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-01 0:39 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-nvdimm@lists.01.org, Linux ACPI, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, David Box
On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote:
>> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
>> "ACPI0012"
>>
>> 2/ libnd bus registration
>>
>> The NFIT provided by ACPI is one possible method by which platforms will
>> discover NVDIMM resources. However, the intent of the nd_bus_descriptor
>> abstraction is to abstract "provider" specific details, leaving libnd
>> to be independent of the specific NVDIMM resource discovery mechanism.
>> This flexibility is later exploited later to implement custom-defined nd
>> buses.
>>
>> Cc: <linux-acpi@vger.kernel.org>
>> Cc: Robert Moore <robert.moore@intel.com>
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> drivers/block/Kconfig | 2
>> drivers/block/Makefile | 1
>> drivers/block/nd/Kconfig | 40 +++
>> drivers/block/nd/Makefile | 6 +
>> drivers/block/nd/acpi.c | 475 +++++++++++++++++++++++++++++++++++++++++
>> drivers/block/nd/acpi_nfit.h | 254 ++++++++++++++++++++++
>> drivers/block/nd/core.c | 67 ++++++
>> drivers/block/nd/libnd.h | 33 +++
>> drivers/block/nd/nd-private.h | 23 ++
>> 9 files changed, 901 insertions(+)
>> create mode 100644 drivers/block/nd/Kconfig
>> create mode 100644 drivers/block/nd/Makefile
>> create mode 100644 drivers/block/nd/acpi.c
>> create mode 100644 drivers/block/nd/acpi_nfit.h
>> create mode 100644 drivers/block/nd/core.c
>> create mode 100644 drivers/block/nd/libnd.h
>> create mode 100644 drivers/block/nd/nd-private.h
>>
>> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
>> index eb1fed5bd516..dfe40e5ca9bd 100644
>> --- a/drivers/block/Kconfig
>> +++ b/drivers/block/Kconfig
>> @@ -321,6 +321,8 @@ config BLK_DEV_NVME
>> To compile this driver as a module, choose M here: the
>> module will be called nvme.
>>
>> +source "drivers/block/nd/Kconfig"
>> +
>> config BLK_DEV_SKD
>> tristate "STEC S1120 Block Driver"
>> depends on PCI
>> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
>> index 9cc6c18a1c7e..07a6acecf4d8 100644
>> --- a/drivers/block/Makefile
>> +++ b/drivers/block/Makefile
>> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
>> obj-$(CONFIG_MG_DISK) += mg_disk.o
>> obj-$(CONFIG_SUNVDC) += sunvdc.o
>> obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
>> +obj-$(CONFIG_ND_DEVICES) += nd/
>> obj-$(CONFIG_BLK_DEV_SKD) += skd.o
>> obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o
>>
>> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
>> new file mode 100644
>> index 000000000000..6d5d6b732f82
>> --- /dev/null
>> +++ b/drivers/block/nd/Kconfig
>> @@ -0,0 +1,40 @@
>> +menuconfig ND_DEVICES
>> + bool "NVDIMM Support"
>> + depends on PHYS_ADDR_T_64BIT
>> + help
>> + Generic support for non-volatile memory devices including
>> + ACPI-6-NFIT defined resources. On platforms that define an
>> + NFIT, or otherwise can discover NVDIMM resources, a libnd
>> + bus is registered to advertise PMEM (persistent memory)
>> + namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
>> + namespaces (/dev/ndX). A PMEM namespace refers to a memory
>> + resource that may span multiple DIMMs and support DAX (see
>> + CONFIG_DAX). A BLK namespace refers to an NVDIMM control
>> + region which exposes an mmio register set for windowed
>> + access mode to non-volatile memory.
>> +
>> +if ND_DEVICES
>> +
>> +config LIBND
>> + tristate "LIBND: libnd device driver support"
>> + help
>> + Platform agnostic device model for a libnd bus. Publishes
>> + resources for a PMEM (persistent-memory) driver and/or BLK
>> + (sliding mmio window(s)) driver to attach. Exposes a device
>> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
>> + message passing interface, and a "/dev/nmemX" dimm-ioctl
>> + message interface for each memory device registered on the
>> + bus. instance. A userspace library "ndctl" provides an API
>> + to enumerate/manage this subsystem.
>> +
>> +config ND_ACPI
>> + tristate "ACPI: NFIT to libnd bus support"
>> + select LIBND
>> + depends on ACPI
>> + help
>> + Infrastructure to probe ACPI 6 compliant platforms for
>> + NVDIMMs (NFIT) and register a libnd device tree. In
>> + addition to storage devices this also enables libnd craft
>> + ACPI._DSM messages for platform/dimm configuration.
>
> I'm wondering if the two CONFIG options above really need to be user-selectable?
>
> For example, what reason people (who've already selected ND_DEVICES) may have
> for not selecting ND_ACPI if ACPI is set?
Later on in the series we introduce ND_E820 which supports creating a
libnd-bus from e820-type-12 memory ranges on pre-NFIT systems. I'm
also considering a configfs defined libnd-bus because e820 types are
not nearly enough information to safely define nvdimm resources
outside of NFIT.
>> +
>> +endif
>> diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
>> new file mode 100644
>> index 000000000000..944b5947c0cb
>> --- /dev/null
>> +++ b/drivers/block/nd/Makefile
>> @@ -0,0 +1,6 @@
>> +obj-$(CONFIG_LIBND) += libnd.o
>> +obj-$(CONFIG_ND_ACPI) += nd_acpi.o
>> +
>> +nd_acpi-y := acpi.o
>> +
>> +libnd-y := core.o
>
> OK, so it looks like no modules, just built-in code, right?
>
Um, no, both CONFIG_ND_ACPI and CONFIG_LIBND can be =m.
>> diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
>> new file mode 100644
>> index 000000000000..9f0b24390d1b
>> --- /dev/null
>> +++ b/drivers/block/nd/acpi.c
>> @@ -0,0 +1,475 @@
>> +/*
>> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of version 2 of the GNU General Public License as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + */
>> +#include <linux/list_sort.h>
>> +#include <linux/module.h>
>> +#include <linux/list.h>
>> +#include <linux/acpi.h>
>> +#include "acpi_nfit.h"
>> +#include "libnd.h"
>> +
>> +static bool warn_checksum;
>> +module_param(warn_checksum, bool, S_IRUGO|S_IWUSR);
>> +MODULE_PARM_DESC(warn_checksum, "Turn checksum errors into warnings");
>> +
>> +enum {
>> + NFIT_ACPI_NOTIFY_TABLE = 0x80,
>> +};
>> +
>> +static int nd_acpi_ctl(struct nd_bus_descriptor *nd_desc,
>> + struct nd_dimm *nd_dimm, unsigned int cmd, void *buf,
>> + unsigned int buf_len)
>> +{
>> + return -ENOTTY;
>> +}
>
> Why -ENOTTY? And why not to leave a NULL entry for this instead and
> make the library fail it in that case?
Yes, I should have deferred this to the patch that fills in
nd_acpi_ctl later in the series. Having this stub here in this patch
just trips up reviewers, sorry.
>> +static const char *spa_type_name(u16 type)
>> +{
>> + switch (type) {
>> + case NFIT_SPA_VOLATILE: return "volatile";
>> + case NFIT_SPA_PM: return "pmem";
>> + case NFIT_SPA_DCR: return "dimm-control-region";
>> + case NFIT_SPA_BDW: return "block-data-window";
>> + default: return "unknown";
>> + }
>> +}
>> +
>> +static int nfit_spa_type(struct acpi_nfit_spa *spa)
>> +{
>> + if (memcmp(&nfit_spa_uuid_volatile, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_VOLATILE;
>> +
>> + if (memcmp(&nfit_spa_uuid_pm, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_PM;
>> +
>> + if (memcmp(&nfit_spa_uuid_dcr, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_DCR;
>> +
>> + if (memcmp(&nfit_spa_uuid_bdw, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_BDW;
>> +
>> + if (memcmp(&nfit_spa_uuid_vdisk, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_VDISK;
>> +
>> + if (memcmp(&nfit_spa_uuid_vcd, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_VCD;
>> +
>> + if (memcmp(&nfit_spa_uuid_pdisk, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_PDISK;
>> +
>> + if (memcmp(&nfit_spa_uuid_pcd, spa->type_uuid, 16) == 0)
>> + return NFIT_SPA_PCD;
>> +
>> + return -1;
>> +}
>> +
>> +struct nfit_table_header {
>> + __le16 type;
>> + __le16 length;
>> +};
>
> That you'll be able to get from ACPICA I suppose?
Yes.
>> +
>> +static void *add_table(struct acpi_nfit_desc *acpi_desc, void *table, const void *end)
>> +{
>> + struct device *dev = acpi_desc->dev;
>> + struct nfit_table_header *hdr;
>> + void *err = ERR_PTR(-ENOMEM);
>> +
>> + if (table >= end)
>> + return NULL;
>> +
>> + hdr = (struct nfit_table_header *) table;
>> + switch (hdr->type) {
>> + case NFIT_TABLE_SPA: {
>> + struct nfit_spa *nfit_spa = devm_kzalloc(dev, sizeof(*nfit_spa),
>> + GFP_KERNEL);
>> + struct acpi_nfit_spa *spa = table;
>> +
>> + if (!nfit_spa)
>> + return err;
>> + INIT_LIST_HEAD(&nfit_spa->list);
>> + nfit_spa->spa = spa;
>> + list_add_tail(&nfit_spa->list, &acpi_desc->spas);
>> + dev_dbg(dev, "%s: spa index: %d type: %s\n", __func__,
>> + spa->spa_index,
>> + spa_type_name(nfit_spa_type(spa)));
>> + break;
>> + }
>> + case NFIT_TABLE_MEM: {
>> + struct nfit_memdev *nfit_memdev = devm_kzalloc(dev,
>> + sizeof(*nfit_memdev), GFP_KERNEL);
>> + struct acpi_nfit_memdev *memdev = table;
>> +
>> + if (!nfit_memdev)
>> + return err;
>> + INIT_LIST_HEAD(&nfit_memdev->list);
>> + nfit_memdev->memdev = memdev;
>> + list_add_tail(&nfit_memdev->list, &acpi_desc->memdevs);
>> + dev_dbg(dev, "%s: memdev handle: %#x spa: %d dcr: %d\n",
>> + __func__, memdev->nfit_handle, memdev->spa_index,
>> + memdev->dcr_index);
>> + break;
>> + }
>> + case NFIT_TABLE_DCR: {
>> + struct nfit_dcr *nfit_dcr = devm_kzalloc(dev, sizeof(*nfit_dcr),
>> + GFP_KERNEL);
>> + struct acpi_nfit_dcr *dcr = table;
>> +
>> + if (!nfit_dcr)
>> + return err;
>> + INIT_LIST_HEAD(&nfit_dcr->list);
>> + nfit_dcr->dcr = dcr;
>> + list_add_tail(&nfit_dcr->list, &acpi_desc->dcrs);
>> + dev_dbg(dev, "%s: dcr index: %d num_bcw: %d\n", __func__,
>> + dcr->dcr_index, dcr->num_bcw);
>> + break;
>> + }
>> + case NFIT_TABLE_BDW: {
>> + struct nfit_bdw *nfit_bdw = devm_kzalloc(dev, sizeof(*nfit_bdw),
>> + GFP_KERNEL);
>> + struct acpi_nfit_bdw *bdw = table;
>> +
>> + if (!nfit_bdw)
>> + return err;
>> + INIT_LIST_HEAD(&nfit_bdw->list);
>> + nfit_bdw->bdw = bdw;
>> + list_add_tail(&nfit_bdw->list, &acpi_desc->bdws);
>> + dev_dbg(dev, "%s: bdw dcr: %d num_bdw: %d\n", __func__,
>> + bdw->dcr_index, bdw->num_bdw);
>> + break;
>> + }
>> + /* TODO */
>> + case NFIT_TABLE_IDT:
>> + dev_dbg(dev, "%s: idt\n", __func__);
>> + break;
>> + case NFIT_TABLE_FLUSH:
>> + dev_dbg(dev, "%s: flush\n", __func__);
>> + break;
>> + case NFIT_TABLE_SMBIOS:
>> + dev_dbg(dev, "%s: smbios\n", __func__);
>> + break;
>> + default:
>> + dev_err(dev, "unknown table '%d' parsing nfit\n", hdr->type);
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + return table + hdr->length;
>> +}
>> +
>> +static void nfit_mem_find_spa_bdw(struct acpi_nfit_desc *acpi_desc,
>> + struct nfit_mem *nfit_mem)
>> +{
>> + u32 nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
>> + u16 dcr_index = nfit_mem->dcr->dcr_index;
>> + struct nfit_spa *nfit_spa;
>> +
>> + list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
>> + u16 spa_index = nfit_spa->spa->spa_index;
>> + int type = nfit_spa_type(nfit_spa->spa);
>> + struct nfit_memdev *nfit_memdev;
>> +
>> + if (type != NFIT_SPA_BDW)
>> + continue;
>> +
>> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
>> + if (nfit_memdev->memdev->spa_index != spa_index)
>> + continue;
>> + if (nfit_memdev->memdev->nfit_handle != nfit_handle)
>> + continue;
>> + if (nfit_memdev->memdev->dcr_index != dcr_index)
>> + continue;
>> +
>> + nfit_mem->spa_bdw = nfit_spa->spa;
>> + return;
>> + }
>> + }
>> +
>> + dev_dbg(acpi_desc->dev, "SPA-BDW not found for SPA-DCR %d\n",
>> + nfit_mem->spa_dcr->spa_index);
>> + nfit_mem->bdw = NULL;
>> +}
>> +
>> +static int nfit_mem_add(struct acpi_nfit_desc *acpi_desc,
>> + struct nfit_mem *nfit_mem, struct acpi_nfit_spa *spa)
>> +{
>> + u16 dcr_index = __to_nfit_memdev(nfit_mem)->dcr_index;
>> + struct nfit_dcr *nfit_dcr;
>> + struct nfit_bdw *nfit_bdw;
>> +
>> + list_for_each_entry(nfit_dcr, &acpi_desc->dcrs, list) {
>> + if (nfit_dcr->dcr->dcr_index != dcr_index)
>> + continue;
>> + nfit_mem->dcr = nfit_dcr->dcr;
>> + break;
>> + }
>> +
>> + if (!nfit_mem->dcr) {
>> + dev_dbg(acpi_desc->dev, "SPA %d missing:%s%s\n", spa->spa_index,
>> + __to_nfit_memdev(nfit_mem) ? "" : " MEMDEV",
>> + nfit_mem->dcr ? "" : " DCR");
>> + return -ENODEV;
>> + }
>> +
>> + /*
>> + * We've found enough to create an nd_dimm, optionally
>> + * find an associated BDW
>> + */
>> + list_add(&nfit_mem->list, &acpi_desc->dimms);
>> +
>> + list_for_each_entry(nfit_bdw, &acpi_desc->bdws, list) {
>> + if (nfit_bdw->bdw->dcr_index != dcr_index)
>> + continue;
>> + nfit_mem->bdw = nfit_bdw->bdw;
>> + break;
>> + }
>> +
>> + if (!nfit_mem->bdw)
>> + return 0;
>> +
>> + nfit_mem_find_spa_bdw(acpi_desc, nfit_mem);
>> + return 0;
>> +}
>> +
>> +static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
>> + struct acpi_nfit_spa *spa)
>> +{
>> + struct nfit_mem *nfit_mem, *found;
>> + struct nfit_memdev *nfit_memdev;
>> + int type = nfit_spa_type(spa);
>> + u16 dcr_index;
>> +
>> + switch (type) {
>> + case NFIT_SPA_DCR:
>> + case NFIT_SPA_PM:
>> + break;
>> + default:
>> + return 0;
>> + }
>> +
>> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
>> + int rc;
>> +
>> + if (nfit_memdev->memdev->spa_index != spa->spa_index)
>> + continue;
>> + found = NULL;
>> + dcr_index = nfit_memdev->memdev->dcr_index;
>> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list)
>> + if (__to_nfit_memdev(nfit_mem)->dcr_index == dcr_index) {
>> + found = nfit_mem;
>> + break;
>> + }
>> +
>> + if (found)
>> + nfit_mem = found;
>> + else {
>> + nfit_mem = devm_kzalloc(acpi_desc->dev,
>> + sizeof(*nfit_mem), GFP_KERNEL);
>> + if (!nfit_mem)
>> + return -ENOMEM;
>> + INIT_LIST_HEAD(&nfit_mem->list);
>> + }
>> +
>> + if (type == NFIT_SPA_DCR) {
>> + /* multiple dimms may share a SPA when interleaved */
>> + nfit_mem->spa_dcr = spa;
>> + nfit_mem->memdev_dcr = nfit_memdev->memdev;
>> + } else {
>> + /*
>> + * A single dimm may belong to multiple SPA-PM
>> + * ranges, record at least one in addition to
>> + * any SPA-DCR range.
>> + */
>> + nfit_mem->memdev_pmem = nfit_memdev->memdev;
>> + }
>> +
>> + if (found)
>> + continue;
>> +
>> + rc = nfit_mem_add(acpi_desc, nfit_mem, spa);
>> + if (rc)
>> + return rc;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int nfit_mem_cmp(void *priv, struct list_head *__a, struct list_head *__b)
>> +{
>> + struct nfit_mem *a = container_of(__a, typeof(*a), list);
>> + struct nfit_mem *b = container_of(__b, typeof(*b), list);
>> + u32 handleA, handleB;
>> +
>> + handleA = __to_nfit_memdev(a)->nfit_handle;
>> + handleB = __to_nfit_memdev(b)->nfit_handle;
>> + if (handleA < handleB)
>> + return -1;
>> + else if (handleA > handleB)
>> + return 1;
>> + return 0;
>> +}
>> +
>> +static int nfit_mem_init(struct acpi_nfit_desc *acpi_desc)
>> +{
>> + struct nfit_spa *nfit_spa;
>> +
>> + /*
>> + * For each SPA-DCR or SPA-PMEM address range find its
>> + * corresponding MEMDEV(s). From each MEMDEV find the
>> + * corresponding DCR. Then, if we're operating on a SPA-DCR,
>> + * try to find a SPA-BDW and a corresponding BDW that references
>> + * the DCR. Throw it all into an nfit_mem object. Note, that
>> + * BDWs are optional.
>> + */
>> + list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
>> + int rc;
>> +
>> + rc = nfit_mem_dcr_init(acpi_desc, nfit_spa->spa);
>> + if (rc)
>> + return rc;
>> + }
>> +
>> + list_sort(NULL, &acpi_desc->dimms, nfit_mem_cmp);
>> +
>> + return 0;
>> +}
>> +
>> +static int nd_acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>> +{
>> + struct device *dev = acpi_desc->dev;
>> + const void *end;
>> + u8 *data, sum;
>> + acpi_size i;
>> +
>> + INIT_LIST_HEAD(&acpi_desc->spas);
>> + INIT_LIST_HEAD(&acpi_desc->dcrs);
>> + INIT_LIST_HEAD(&acpi_desc->bdws);
>> + INIT_LIST_HEAD(&acpi_desc->memdevs);
>> + INIT_LIST_HEAD(&acpi_desc->dimms);
>> +
>> + data = (u8 *) acpi_desc->nfit;
>> + for (i = 0, sum = 0; i < sz; i++)
>> + sum += readb(data + i);
>> + if (sum != 0 && !warn_checksum) {
>> + dev_dbg(dev, "%s: nfit checksum failure\n", __func__);
>> + return -ENXIO;
>> + }
>> + WARN_TAINT_ONCE(sum != 0, TAINT_FIRMWARE_WORKAROUND,
>> + "nfit checksum failure, continuing...\n");
>> +
>> + end = data + sz;
>> + data += sizeof(struct acpi_nfit);
>> + while (!IS_ERR_OR_NULL(data))
>> + data = add_table(acpi_desc, data, end);
>
> This looks like we are expecting a series of tables here and we're going to
> fail the whole discovery if just one of them in invalid.
>
> I wonder if it would be practical to skip just the invalid ones instead?
Yes, for future-proofing we should just ignore tables that we don't
understand. Will fix.
>
>> +
>> + if (IS_ERR(data)) {
>> + dev_dbg(dev, "%s: nfit table parsing error: %ld\n", __func__,
>> + PTR_ERR(data));
>> + return PTR_ERR(data);
>> + }
>> +
>> + if (nfit_mem_init(acpi_desc) != 0)
>> + return -ENOMEM;
>> +
>> + return 0;
>> +}
>> +
>> +static int nd_acpi_add(struct acpi_device *adev)
>> +{
>> + struct nd_bus_descriptor *nd_desc;
>> + struct acpi_nfit_desc *acpi_desc;
>> + struct device *dev = &adev->dev;
>> + struct acpi_table_header *tbl;
>> + acpi_status status = AE_OK;
>> + acpi_size sz;
>> + int rc;
>> +
>> + status = acpi_get_table_with_size("NFIT", 0, &tbl, &sz);
>> + if (ACPI_FAILURE(status)) {
>> + dev_err(dev, "failed to find NFIT\n");
>> + return -ENXIO;
>> + }
>> +
>> + acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
>> + if (!acpi_desc)
>> + return -ENOMEM;
>> +
>> + dev_set_drvdata(dev, acpi_desc);
>> + acpi_desc->dev = dev;
>> + acpi_desc->nfit = (struct acpi_nfit *) tbl;
>> + nd_desc = &acpi_desc->nd_desc;
>> + nd_desc->provider_name = "ACPI.NFIT";
>> + nd_desc->ndctl = nd_acpi_ctl;
>> +
>> + acpi_desc->nd_bus = nd_bus_register(dev, nd_desc);
>> + if (!acpi_desc->nd_bus)
>> + return -ENXIO;
>> +
>> + rc = nd_acpi_nfit_init(acpi_desc, sz);
>> + if (rc) {
>> + nd_bus_unregister(acpi_desc->nd_bus);
>> + return rc;
>> + }
>> + return 0;
>> +}
>> +
>> +static int nd_acpi_remove(struct acpi_device *adev)
>> +{
>> + struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
>> +
>> + nd_bus_unregister(acpi_desc->nd_bus);
>> + return 0;
>> +}
>> +
>> +static void nd_acpi_notify(struct acpi_device *adev, u32 event)
>> +{
>> + /* TODO: handle ACPI_NOTIFY_BUS_CHECK notification */
>> + dev_dbg(&adev->dev, "%s: event: %d\n", __func__, event);
>> +}
>> +
>> +static const struct acpi_device_id nd_acpi_ids[] = {
>> + { "ACPI0012", 0 },
>> + { "", 0 },
>> +};
>> +MODULE_DEVICE_TABLE(acpi, nd_acpi_ids);
>> +
>> +static struct acpi_driver nd_acpi_driver = {
>> + .name = KBUILD_MODNAME,
>> + .ids = nd_acpi_ids,
>> + .flags = ACPI_DRIVER_ALL_NOTIFY_EVENTS,
>> + .ops = {
>> + .add = nd_acpi_add,
>> + .remove = nd_acpi_remove,
>> + .notify = nd_acpi_notify
>> + },
>> +};
>
> Since this is going to be non-modular built-in code, please use an ACPI
> scan handler instead of using a driver here. acpi_memhotplug.c does that,
> you can use it as an example, but I guess you don't need to enable hotplug
> for it to start with.
No, you misunderstood, this will certainly be modular and loaded on-demand.
>
>> +
>> +static __init int nd_acpi_init(void)
>> +{
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit) != 40);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_spa) != 56);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_memdev) != 48);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_idt) != 16);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_smbios) != 8);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_dcr) != 80);
>> + BUILD_BUG_ON(sizeof(struct acpi_nfit_bdw) != 40);
>> +
>> + return acpi_bus_register_driver(&nd_acpi_driver);
>> +}
>> +
>> +static __exit void nd_acpi_exit(void)
>> +{
>> + acpi_bus_unregister_driver(&nd_acpi_driver);
>> +}
>> +
>> +module_init(nd_acpi_init);
>> +module_exit(nd_acpi_exit);
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_AUTHOR("Intel Corporation");
>> diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
>> new file mode 100644
>> index 000000000000..e0b0f12736bf
>> --- /dev/null
>> +++ b/drivers/block/nd/acpi_nfit.h
>
> I'm assuming that the below is coordinated with Bob and David and will be
> changed to use ACPICA-provided definitions going forward.
>
> Is that correct?
Yes, as soon as definitions those are available we will drop this
header and rebase on the ACPICA implementation.
[..]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-05-01 0:39 ` Dan Williams
@ 2015-05-01 1:21 ` Rafael J. Wysocki
2015-05-01 16:23 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2015-05-01 1:21 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Linux ACPI, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, David Box
On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote:
> On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote:
> >> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
> >> "ACPI0012"
> >>
> >> 2/ libnd bus registration
> >>
> >> The NFIT provided by ACPI is one possible method by which platforms will
> >> discover NVDIMM resources. However, the intent of the nd_bus_descriptor
> >> abstraction is to abstract "provider" specific details, leaving libnd
> >> to be independent of the specific NVDIMM resource discovery mechanism.
> >> This flexibility is later exploited later to implement custom-defined nd
> >> buses.
> >>
> >> Cc: <linux-acpi@vger.kernel.org>
> >> Cc: Robert Moore <robert.moore@intel.com>
> >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >> drivers/block/Kconfig | 2
> >> drivers/block/Makefile | 1
> >> drivers/block/nd/Kconfig | 40 +++
> >> drivers/block/nd/Makefile | 6 +
> >> drivers/block/nd/acpi.c | 475 +++++++++++++++++++++++++++++++++++++++++
> >> drivers/block/nd/acpi_nfit.h | 254 ++++++++++++++++++++++
> >> drivers/block/nd/core.c | 67 ++++++
> >> drivers/block/nd/libnd.h | 33 +++
> >> drivers/block/nd/nd-private.h | 23 ++
> >> 9 files changed, 901 insertions(+)
> >> create mode 100644 drivers/block/nd/Kconfig
> >> create mode 100644 drivers/block/nd/Makefile
> >> create mode 100644 drivers/block/nd/acpi.c
> >> create mode 100644 drivers/block/nd/acpi_nfit.h
> >> create mode 100644 drivers/block/nd/core.c
> >> create mode 100644 drivers/block/nd/libnd.h
> >> create mode 100644 drivers/block/nd/nd-private.h
> >>
> >> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> >> index eb1fed5bd516..dfe40e5ca9bd 100644
> >> --- a/drivers/block/Kconfig
> >> +++ b/drivers/block/Kconfig
> >> @@ -321,6 +321,8 @@ config BLK_DEV_NVME
> >> To compile this driver as a module, choose M here: the
> >> module will be called nvme.
> >>
> >> +source "drivers/block/nd/Kconfig"
> >> +
> >> config BLK_DEV_SKD
> >> tristate "STEC S1120 Block Driver"
> >> depends on PCI
> >> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> >> index 9cc6c18a1c7e..07a6acecf4d8 100644
> >> --- a/drivers/block/Makefile
> >> +++ b/drivers/block/Makefile
> >> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
> >> obj-$(CONFIG_MG_DISK) += mg_disk.o
> >> obj-$(CONFIG_SUNVDC) += sunvdc.o
> >> obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
> >> +obj-$(CONFIG_ND_DEVICES) += nd/
> >> obj-$(CONFIG_BLK_DEV_SKD) += skd.o
> >> obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o
> >>
> >> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
> >> new file mode 100644
> >> index 000000000000..6d5d6b732f82
> >> --- /dev/null
> >> +++ b/drivers/block/nd/Kconfig
> >> @@ -0,0 +1,40 @@
> >> +menuconfig ND_DEVICES
> >> + bool "NVDIMM Support"
> >> + depends on PHYS_ADDR_T_64BIT
> >> + help
> >> + Generic support for non-volatile memory devices including
> >> + ACPI-6-NFIT defined resources. On platforms that define an
> >> + NFIT, or otherwise can discover NVDIMM resources, a libnd
> >> + bus is registered to advertise PMEM (persistent memory)
> >> + namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
> >> + namespaces (/dev/ndX). A PMEM namespace refers to a memory
> >> + resource that may span multiple DIMMs and support DAX (see
> >> + CONFIG_DAX). A BLK namespace refers to an NVDIMM control
> >> + region which exposes an mmio register set for windowed
> >> + access mode to non-volatile memory.
> >> +
> >> +if ND_DEVICES
> >> +
> >> +config LIBND
> >> + tristate "LIBND: libnd device driver support"
> >> + help
> >> + Platform agnostic device model for a libnd bus. Publishes
> >> + resources for a PMEM (persistent-memory) driver and/or BLK
> >> + (sliding mmio window(s)) driver to attach. Exposes a device
> >> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
> >> + message passing interface, and a "/dev/nmemX" dimm-ioctl
> >> + message interface for each memory device registered on the
> >> + bus. instance. A userspace library "ndctl" provides an API
> >> + to enumerate/manage this subsystem.
> >> +
> >> +config ND_ACPI
> >> + tristate "ACPI: NFIT to libnd bus support"
> >> + select LIBND
> >> + depends on ACPI
> >> + help
> >> + Infrastructure to probe ACPI 6 compliant platforms for
> >> + NVDIMMs (NFIT) and register a libnd device tree. In
> >> + addition to storage devices this also enables libnd craft
> >> + ACPI._DSM messages for platform/dimm configuration.
> >
> > I'm wondering if the two CONFIG options above really need to be user-selectable?
> >
> > For example, what reason people (who've already selected ND_DEVICES) may have
> > for not selecting ND_ACPI if ACPI is set?
>
>
> Later on in the series we introduce ND_E820 which supports creating a
> libnd-bus from e820-type-12 memory ranges on pre-NFIT systems. I'm
> also considering a configfs defined libnd-bus because e820 types are
> not nearly enough information to safely define nvdimm resources
> outside of NFIT.
I hope these are not mutually exclusive with ND_ACPI? Otherwise distros
will have problems with supporting them in one kernel.
If ND_E820 and ND_ACPI aren't mutually exclusive, I still don't see a good
enough reason for asking users about ND_ACPI. Why would I ever say "No"
here if I said "Yes" or "Module" to ND_DEVICES?
> >> +
> >> +endif
> >> diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
> >> new file mode 100644
> >> index 000000000000..944b5947c0cb
> >> --- /dev/null
> >> +++ b/drivers/block/nd/Makefile
> >> @@ -0,0 +1,6 @@
> >> +obj-$(CONFIG_LIBND) += libnd.o
> >> +obj-$(CONFIG_ND_ACPI) += nd_acpi.o
> >> +
> >> +nd_acpi-y := acpi.o
> >> +
> >> +libnd-y := core.o
> >
> > OK, so it looks like no modules, just built-in code, right?
> >
>
> Um, no, both CONFIG_ND_ACPI and CONFIG_LIBND can be =m.
OK
[cut]
> >> +static int nd_acpi_remove(struct acpi_device *adev)
> >> +{
> >> + struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
> >> +
> >> + nd_bus_unregister(acpi_desc->nd_bus);
> >> + return 0;
> >> +}
> >> +
> >> +static void nd_acpi_notify(struct acpi_device *adev, u32 event)
> >> +{
> >> + /* TODO: handle ACPI_NOTIFY_BUS_CHECK notification */
> >> + dev_dbg(&adev->dev, "%s: event: %d\n", __func__, event);
> >> +}
> >> +
> >> +static const struct acpi_device_id nd_acpi_ids[] = {
> >> + { "ACPI0012", 0 },
> >> + { "", 0 },
> >> +};
> >> +MODULE_DEVICE_TABLE(acpi, nd_acpi_ids);
> >> +
> >> +static struct acpi_driver nd_acpi_driver = {
> >> + .name = KBUILD_MODNAME,
> >> + .ids = nd_acpi_ids,
> >> + .flags = ACPI_DRIVER_ALL_NOTIFY_EVENTS,
> >> + .ops = {
> >> + .add = nd_acpi_add,
> >> + .remove = nd_acpi_remove,
> >> + .notify = nd_acpi_notify
> >> + },
> >> +};
> >
> > Since this is going to be non-modular built-in code, please use an ACPI
> > scan handler instead of using a driver here. acpi_memhotplug.c does that,
> > you can use it as an example, but I guess you don't need to enable hotplug
> > for it to start with.
>
>
> No, you misunderstood, this will certainly be modular and loaded on-demand.
OK
So please drop the .notify thing at least for now. It most likely doesn't do
what you need anyway.
> >
> >> +
> >> +static __init int nd_acpi_init(void)
> >> +{
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit) != 40);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_spa) != 56);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_memdev) != 48);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_idt) != 16);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_smbios) != 8);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_dcr) != 80);
> >> + BUILD_BUG_ON(sizeof(struct acpi_nfit_bdw) != 40);
> >> +
> >> + return acpi_bus_register_driver(&nd_acpi_driver);
> >> +}
> >> +
> >> +static __exit void nd_acpi_exit(void)
> >> +{
> >> + acpi_bus_unregister_driver(&nd_acpi_driver);
> >> +}
> >> +
> >> +module_init(nd_acpi_init);
> >> +module_exit(nd_acpi_exit);
> >> +MODULE_LICENSE("GPL v2");
> >> +MODULE_AUTHOR("Intel Corporation");
> >> diff --git a/drivers/block/nd/acpi_nfit.h b/drivers/block/nd/acpi_nfit.h
> >> new file mode 100644
> >> index 000000000000..e0b0f12736bf
> >> --- /dev/null
> >> +++ b/drivers/block/nd/acpi_nfit.h
> >
> > I'm assuming that the below is coordinated with Bob and David and will be
> > changed to use ACPICA-provided definitions going forward.
> >
> > Is that correct?
>
> Yes, as soon as definitions those are available we will drop this
> header and rebase on the ACPICA implementation.
>
> [..]
OK
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-05-01 1:21 ` Rafael J. Wysocki
@ 2015-05-01 16:23 ` Dan Williams
2015-05-04 23:58 ` Rafael J. Wysocki
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-01 16:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-nvdimm@lists.01.org, Linux ACPI, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, David Box
On Thu, Apr 30, 2015 at 6:21 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote:
>> On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
[..]
>> >> +if ND_DEVICES
>> >> +
>> >> +config LIBND
>> >> + tristate "LIBND: libnd device driver support"
>> >> + help
>> >> + Platform agnostic device model for a libnd bus. Publishes
>> >> + resources for a PMEM (persistent-memory) driver and/or BLK
>> >> + (sliding mmio window(s)) driver to attach. Exposes a device
>> >> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
>> >> + message passing interface, and a "/dev/nmemX" dimm-ioctl
>> >> + message interface for each memory device registered on the
>> >> + bus. instance. A userspace library "ndctl" provides an API
>> >> + to enumerate/manage this subsystem.
>> >> +
>> >> +config ND_ACPI
>> >> + tristate "ACPI: NFIT to libnd bus support"
>> >> + select LIBND
>> >> + depends on ACPI
>> >> + help
>> >> + Infrastructure to probe ACPI 6 compliant platforms for
>> >> + NVDIMMs (NFIT) and register a libnd device tree. In
>> >> + addition to storage devices this also enables libnd craft
>> >> + ACPI._DSM messages for platform/dimm configuration.
>> >
>> > I'm wondering if the two CONFIG options above really need to be user-selectable?
>> >
>> > For example, what reason people (who've already selected ND_DEVICES) may have
>> > for not selecting ND_ACPI if ACPI is set?
>>
>>
>> Later on in the series we introduce ND_E820 which supports creating a
>> libnd-bus from e820-type-12 memory ranges on pre-NFIT systems. I'm
>> also considering a configfs defined libnd-bus because e820 types are
>> not nearly enough information to safely define nvdimm resources
>> outside of NFIT.
>
> I hope these are not mutually exclusive with ND_ACPI? Otherwise distros
> will have problems with supporting them in one kernel.
You can have ND_E820 support and ND_ACPI support in the same system.
Likely an NFIT enabled system will never have e820-type-12 ranges, but
if a user messes up and uses the new memmap=ss!nn command line to
overlap NFIT-defined memory then the request_mem_region() calls in the
driver will collide. First to load wins in that scenario.
> If ND_E820 and ND_ACPI aren't mutually exclusive, I still don't see a good
> enough reason for asking users about ND_ACPI. Why would I ever say "No"
> here if I said "Yes" or "Module" to ND_DEVICES?
I agree that if the user selects ND_DEVICES then ND_ACPI should
probably default on, but otherwise turning it off is a useful option.
If you know your system is pre-ACPI-6 then why bother including
support?
>> >> +
>> >> +endif
>> >> diff --git a/drivers/block/nd/Makefile b/drivers/block/nd/Makefile
>> >> new file mode 100644
>> >> index 000000000000..944b5947c0cb
>> >> --- /dev/null
>> >> +++ b/drivers/block/nd/Makefile
>> >> @@ -0,0 +1,6 @@
>> >> +obj-$(CONFIG_LIBND) += libnd.o
>> >> +obj-$(CONFIG_ND_ACPI) += nd_acpi.o
>> >> +
>> >> +nd_acpi-y := acpi.o
>> >> +
>> >> +libnd-y := core.o
>> >
>> > OK, so it looks like no modules, just built-in code, right?
>> >
>>
>> Um, no, both CONFIG_ND_ACPI and CONFIG_LIBND can be =m.
>
> OK
>
> [cut]
>
>> >> +static int nd_acpi_remove(struct acpi_device *adev)
>> >> +{
>> >> + struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(&adev->dev);
>> >> +
>> >> + nd_bus_unregister(acpi_desc->nd_bus);
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static void nd_acpi_notify(struct acpi_device *adev, u32 event)
>> >> +{
>> >> + /* TODO: handle ACPI_NOTIFY_BUS_CHECK notification */
>> >> + dev_dbg(&adev->dev, "%s: event: %d\n", __func__, event);
>> >> +}
>> >> +
>> >> +static const struct acpi_device_id nd_acpi_ids[] = {
>> >> + { "ACPI0012", 0 },
>> >> + { "", 0 },
>> >> +};
>> >> +MODULE_DEVICE_TABLE(acpi, nd_acpi_ids);
>> >> +
>> >> +static struct acpi_driver nd_acpi_driver = {
>> >> + .name = KBUILD_MODNAME,
>> >> + .ids = nd_acpi_ids,
>> >> + .flags = ACPI_DRIVER_ALL_NOTIFY_EVENTS,
>> >> + .ops = {
>> >> + .add = nd_acpi_add,
>> >> + .remove = nd_acpi_remove,
>> >> + .notify = nd_acpi_notify
>> >> + },
>> >> +};
>> >
>> > Since this is going to be non-modular built-in code, please use an ACPI
>> > scan handler instead of using a driver here. acpi_memhotplug.c does that,
>> > you can use it as an example, but I guess you don't need to enable hotplug
>> > for it to start with.
>>
>>
>> No, you misunderstood, this will certainly be modular and loaded on-demand.
>
> OK
>
> So please drop the .notify thing at least for now. It most likely doesn't do
> what you need anyway.
The .notify handler will eventually be filled in to handle hot-add of
NFIT structures, but yes I'll drop it for now.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-04-28 18:24 ` [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices Dan Williams
@ 2015-05-01 17:48 ` Toshi Kani
2015-05-01 18:22 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Toshi Kani @ 2015-05-01 17:48 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel, linux-acpi
On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> Register the memory devices described in the nfit as libnd 'dimm'
> devices on an nd bus. The kernel assigned device id for dimms is
> dynamic. If userspace needs a more static identifier it should consult
> a provider-specific attribute. In the case where NFIT is the provider,
> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
> for this purpose.
:
> +
> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
> +{
> + struct nfit_mem *nfit_mem;
> +
> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
> + struct nd_dimm *nd_dimm;
> + unsigned long flags = 0;
> + u32 nfit_handle;
> +
> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
> + if (nd_dimm) {
> + /*
> + * If for some reason we find multiple DCRs the
> + * first one wins
> + */
> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
> + nd_dimm_name(nd_dimm));
> + continue;
> + }
> +
> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> + flags |= NDD_ALIASING;
Does this check work for a NVDIMM card which has multiple pmem regions
with label info, but does not have any bdw region configured?
The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
have label info. There may be an NVDIMM card with a single blk region
without label info.
Instead of using the namespace types to assume the label info, how about
adding a flag to indicate the presence of the label info? This avoids
the separation of namespace_io and namespace_pmem for the same pmem
driver.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 18:22 ` Dan Williams
@ 2015-05-01 18:19 ` Toshi Kani
2015-05-01 18:43 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Toshi Kani @ 2015-05-01 18:19 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, 2015-05-01 at 11:22 -0700, Dan Williams wrote:
> On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> >> Register the memory devices described in the nfit as libnd 'dimm'
> >> devices on an nd bus. The kernel assigned device id for dimms is
> >> dynamic. If userspace needs a more static identifier it should consult
> >> a provider-specific attribute. In the case where NFIT is the provider,
> >> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
> >> for this purpose.
> > :
> >> +
> >> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
> >> +{
> >> + struct nfit_mem *nfit_mem;
> >> +
> >> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
> >> + struct nd_dimm *nd_dimm;
> >> + unsigned long flags = 0;
> >> + u32 nfit_handle;
> >> +
> >> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
> >> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
> >> + if (nd_dimm) {
> >> + /*
> >> + * If for some reason we find multiple DCRs the
> >> + * first one wins
> >> + */
> >> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
> >> + nd_dimm_name(nd_dimm));
> >> + continue;
> >> + }
> >> +
> >> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> >> + flags |= NDD_ALIASING;
> >
> > Does this check work for a NVDIMM card which has multiple pmem regions
> > with label info, but does not have any bdw region configured?
>
> If you have multiple pmem regions then you don't have aliasing and
> don't need a label. You'll get an nd_namespace_io per region.
>
> > The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
> > have label info. There may be an NVDIMM card with a single blk region
> > without label info.
>
> I'd really like to suggest that labels are only for resolving aliasing
> and that if you have a BLK-only NVDIMM you'll get an automatic
> namespace created the same as a PMEM-only. Partitioning is always
> there to provide sub-divisions of a namespace. The only reason to
> support multiple BLK-namespaces per-region is to give each a different
> sector size. I may eventually need to relent on this position, but
> I'd really like to understand the use case for requiring labels when
> aliasing is not present as it seems like a waste to me.
By looking at the callers of is_namespace_pmem() and is_namespace_blk(),
such as nd_namespace_label_update(), I am concerned that the namespace
types are also used for indicating the presence a label. Is it OK for
nd_namespace_label_update() to do nothing when there is no aliasing?
> > Instead of using the namespace types to assume the label info, how about
> > adding a flag to indicate the presence of the label info? This avoids
> > the separation of namespace_io and namespace_pmem for the same pmem
> > driver.
>
> To what benefit?
Why do they need to be separated? Having alias or not should not make
the pmem namespace different.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 17:48 ` [Linux-nvdimm] " Toshi Kani
@ 2015-05-01 18:22 ` Dan Williams
2015-05-01 18:19 ` Toshi Kani
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-01 18:22 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
>> Register the memory devices described in the nfit as libnd 'dimm'
>> devices on an nd bus. The kernel assigned device id for dimms is
>> dynamic. If userspace needs a more static identifier it should consult
>> a provider-specific attribute. In the case where NFIT is the provider,
>> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
>> for this purpose.
> :
>> +
>> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
>> +{
>> + struct nfit_mem *nfit_mem;
>> +
>> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
>> + struct nd_dimm *nd_dimm;
>> + unsigned long flags = 0;
>> + u32 nfit_handle;
>> +
>> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
>> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
>> + if (nd_dimm) {
>> + /*
>> + * If for some reason we find multiple DCRs the
>> + * first one wins
>> + */
>> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
>> + nd_dimm_name(nd_dimm));
>> + continue;
>> + }
>> +
>> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>> + flags |= NDD_ALIASING;
>
> Does this check work for a NVDIMM card which has multiple pmem regions
> with label info, but does not have any bdw region configured?
If you have multiple pmem regions then you don't have aliasing and
don't need a label. You'll get an nd_namespace_io per region.
> The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
> have label info. There may be an NVDIMM card with a single blk region
> without label info.
I'd really like to suggest that labels are only for resolving aliasing
and that if you have a BLK-only NVDIMM you'll get an automatic
namespace created the same as a PMEM-only. Partitioning is always
there to provide sub-divisions of a namespace. The only reason to
support multiple BLK-namespaces per-region is to give each a different
sector size. I may eventually need to relent on this position, but
I'd really like to understand the use case for requiring labels when
aliasing is not present as it seems like a waste to me.
> Instead of using the namespace types to assume the label info, how about
> adding a flag to indicate the presence of the label info? This avoids
> the separation of namespace_io and namespace_pmem for the same pmem
> driver.
To what benefit?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 18:19 ` Toshi Kani
@ 2015-05-01 18:43 ` Dan Williams
2015-05-01 19:15 ` Toshi Kani
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-01 18:43 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, May 1, 2015 at 11:19 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2015-05-01 at 11:22 -0700, Dan Williams wrote:
>> On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
>> >> Register the memory devices described in the nfit as libnd 'dimm'
>> >> devices on an nd bus. The kernel assigned device id for dimms is
>> >> dynamic. If userspace needs a more static identifier it should consult
>> >> a provider-specific attribute. In the case where NFIT is the provider,
>> >> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
>> >> for this purpose.
>> > :
>> >> +
>> >> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
>> >> +{
>> >> + struct nfit_mem *nfit_mem;
>> >> +
>> >> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
>> >> + struct nd_dimm *nd_dimm;
>> >> + unsigned long flags = 0;
>> >> + u32 nfit_handle;
>> >> +
>> >> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
>> >> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
>> >> + if (nd_dimm) {
>> >> + /*
>> >> + * If for some reason we find multiple DCRs the
>> >> + * first one wins
>> >> + */
>> >> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
>> >> + nd_dimm_name(nd_dimm));
>> >> + continue;
>> >> + }
>> >> +
>> >> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>> >> + flags |= NDD_ALIASING;
>> >
>> > Does this check work for a NVDIMM card which has multiple pmem regions
>> > with label info, but does not have any bdw region configured?
>>
>> If you have multiple pmem regions then you don't have aliasing and
>> don't need a label. You'll get an nd_namespace_io per region.
>>
>> > The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
>> > have label info. There may be an NVDIMM card with a single blk region
>> > without label info.
>>
>> I'd really like to suggest that labels are only for resolving aliasing
>> and that if you have a BLK-only NVDIMM you'll get an automatic
>> namespace created the same as a PMEM-only. Partitioning is always
>> there to provide sub-divisions of a namespace. The only reason to
>> support multiple BLK-namespaces per-region is to give each a different
>> sector size. I may eventually need to relent on this position, but
>> I'd really like to understand the use case for requiring labels when
>> aliasing is not present as it seems like a waste to me.
>
> By looking at the callers of is_namespace_pmem() and is_namespace_blk(),
> such as nd_namespace_label_update(), I am concerned that the namespace
> types are also used for indicating the presence a label. Is it OK for
> nd_namespace_label_update() to do nothing when there is no aliasing?
>
>> > Instead of using the namespace types to assume the label info, how about
>> > adding a flag to indicate the presence of the label info? This avoids
>> > the separation of namespace_io and namespace_pmem for the same pmem
>> > driver.
>>
>> To what benefit?
>
> Why do they need to be separated? Having alias or not should not make
> the pmem namespace different.
The intent is to maximize the number of devices that can be
immediately attached to nd_pmem and nd_blk without user intervention.
nd_namespace_io is a pmem namespace where the boundaries are 100%
described by the NFIT / parent-region.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 18:43 ` Dan Williams
@ 2015-05-01 19:15 ` Toshi Kani
2015-05-01 19:38 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Toshi Kani @ 2015-05-01 19:15 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, 2015-05-01 at 11:43 -0700, Dan Williams wrote:
> On Fri, May 1, 2015 at 11:19 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Fri, 2015-05-01 at 11:22 -0700, Dan Williams wrote:
> >> On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> >> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> >> >> Register the memory devices described in the nfit as libnd 'dimm'
> >> >> devices on an nd bus. The kernel assigned device id for dimms is
> >> >> dynamic. If userspace needs a more static identifier it should consult
> >> >> a provider-specific attribute. In the case where NFIT is the provider,
> >> >> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
> >> >> for this purpose.
> >> > :
> >> >> +
> >> >> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
> >> >> +{
> >> >> + struct nfit_mem *nfit_mem;
> >> >> +
> >> >> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
> >> >> + struct nd_dimm *nd_dimm;
> >> >> + unsigned long flags = 0;
> >> >> + u32 nfit_handle;
> >> >> +
> >> >> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
> >> >> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
> >> >> + if (nd_dimm) {
> >> >> + /*
> >> >> + * If for some reason we find multiple DCRs the
> >> >> + * first one wins
> >> >> + */
> >> >> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
> >> >> + nd_dimm_name(nd_dimm));
> >> >> + continue;
> >> >> + }
> >> >> +
> >> >> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> >> >> + flags |= NDD_ALIASING;
> >> >
> >> > Does this check work for a NVDIMM card which has multiple pmem regions
> >> > with label info, but does not have any bdw region configured?
> >>
> >> If you have multiple pmem regions then you don't have aliasing and
> >> don't need a label. You'll get an nd_namespace_io per region.
> >>
> >> > The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
> >> > have label info. There may be an NVDIMM card with a single blk region
> >> > without label info.
> >>
> >> I'd really like to suggest that labels are only for resolving aliasing
> >> and that if you have a BLK-only NVDIMM you'll get an automatic
> >> namespace created the same as a PMEM-only. Partitioning is always
> >> there to provide sub-divisions of a namespace. The only reason to
> >> support multiple BLK-namespaces per-region is to give each a different
> >> sector size. I may eventually need to relent on this position, but
> >> I'd really like to understand the use case for requiring labels when
> >> aliasing is not present as it seems like a waste to me.
> >
> > By looking at the callers of is_namespace_pmem() and is_namespace_blk(),
> > such as nd_namespace_label_update(), I am concerned that the namespace
> > types are also used for indicating the presence a label. Is it OK for
> > nd_namespace_label_update() to do nothing when there is no aliasing?
Did you forget to answer this question? I am not asking to have a
label. I am asking if the namespace types can handle it correctly.
Restating the nd_namespace_label_update() example:
- namespace_io case: Skip, but a label may still exist. Correct?
- namespace_blk case: Proceed, but blk does not require a label.
> >> > Instead of using the namespace types to assume the label info, how about
> >> > adding a flag to indicate the presence of the label info? This avoids
> >> > the separation of namespace_io and namespace_pmem for the same pmem
> >> > driver.
> >>
> >> To what benefit?
> >
> > Why do they need to be separated? Having alias or not should not make
> > the pmem namespace different.
>
> The intent is to maximize the number of devices that can be
> immediately attached to nd_pmem and nd_blk without user intervention.
I agree with your intention. Again, I am not asking to have a label.
> nd_namespace_io is a pmem namespace where the boundaries are 100%
> described by the NFIT / parent-region.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 19:15 ` Toshi Kani
@ 2015-05-01 19:38 ` Dan Williams
2015-05-01 20:08 ` Toshi Kani
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-01 19:38 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, May 1, 2015 at 12:15 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2015-05-01 at 11:43 -0700, Dan Williams wrote:
>> On Fri, May 1, 2015 at 11:19 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> > On Fri, 2015-05-01 at 11:22 -0700, Dan Williams wrote:
>> >> On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> >> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
>> >> >> Register the memory devices described in the nfit as libnd 'dimm'
>> >> >> devices on an nd bus. The kernel assigned device id for dimms is
>> >> >> dynamic. If userspace needs a more static identifier it should consult
>> >> >> a provider-specific attribute. In the case where NFIT is the provider,
>> >> >> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
>> >> >> for this purpose.
>> >> > :
>> >> >> +
>> >> >> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
>> >> >> +{
>> >> >> + struct nfit_mem *nfit_mem;
>> >> >> +
>> >> >> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
>> >> >> + struct nd_dimm *nd_dimm;
>> >> >> + unsigned long flags = 0;
>> >> >> + u32 nfit_handle;
>> >> >> +
>> >> >> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
>> >> >> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
>> >> >> + if (nd_dimm) {
>> >> >> + /*
>> >> >> + * If for some reason we find multiple DCRs the
>> >> >> + * first one wins
>> >> >> + */
>> >> >> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
>> >> >> + nd_dimm_name(nd_dimm));
>> >> >> + continue;
>> >> >> + }
>> >> >> +
>> >> >> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>> >> >> + flags |= NDD_ALIASING;
>> >> >
>> >> > Does this check work for a NVDIMM card which has multiple pmem regions
>> >> > with label info, but does not have any bdw region configured?
>> >>
>> >> If you have multiple pmem regions then you don't have aliasing and
>> >> don't need a label. You'll get an nd_namespace_io per region.
>> >>
>> >> > The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
>> >> > have label info. There may be an NVDIMM card with a single blk region
>> >> > without label info.
>> >>
>> >> I'd really like to suggest that labels are only for resolving aliasing
>> >> and that if you have a BLK-only NVDIMM you'll get an automatic
>> >> namespace created the same as a PMEM-only. Partitioning is always
>> >> there to provide sub-divisions of a namespace. The only reason to
>> >> support multiple BLK-namespaces per-region is to give each a different
>> >> sector size. I may eventually need to relent on this position, but
>> >> I'd really like to understand the use case for requiring labels when
>> >> aliasing is not present as it seems like a waste to me.
>> >
>> > By looking at the callers of is_namespace_pmem() and is_namespace_blk(),
>> > such as nd_namespace_label_update(), I am concerned that the namespace
>> > types are also used for indicating the presence a label. Is it OK for
>> > nd_namespace_label_update() to do nothing when there is no aliasing?
>
> Did you forget to answer this question? I am not asking to have a
> label. I am asking if the namespace types can handle it correctly.
> Restating the nd_namespace_label_update() example:
> - namespace_io case: Skip, but a label may still exist. Correct?
> - namespace_blk case: Proceed, but blk does not require a label.
Ah, ok. This is handled by nd_namespace_attr_visible() only labelled
namespaces have writable sysfs attributes. This would need to be
extended for a label-less BLK namespace type.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices
2015-05-01 19:38 ` Dan Williams
@ 2015-05-01 20:08 ` Toshi Kani
0 siblings, 0 replies; 47+ messages in thread
From: Toshi Kani @ 2015-05-01 20:08 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Fri, 2015-05-01 at 12:38 -0700, Dan Williams wrote:
> On Fri, May 1, 2015 at 12:15 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Fri, 2015-05-01 at 11:43 -0700, Dan Williams wrote:
> >> On Fri, May 1, 2015 at 11:19 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> >> > On Fri, 2015-05-01 at 11:22 -0700, Dan Williams wrote:
> >> >> On Fri, May 1, 2015 at 10:48 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> >> >> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> >> >> >> Register the memory devices described in the nfit as libnd 'dimm'
> >> >> >> devices on an nd bus. The kernel assigned device id for dimms is
> >> >> >> dynamic. If userspace needs a more static identifier it should consult
> >> >> >> a provider-specific attribute. In the case where NFIT is the provider,
> >> >> >> the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used
> >> >> >> for this purpose.
> >> >> > :
> >> >> >> +
> >> >> >> +static int nd_acpi_register_dimms(struct acpi_nfit_desc *acpi_desc)
> >> >> >> +{
> >> >> >> + struct nfit_mem *nfit_mem;
> >> >> >> +
> >> >> >> + list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
> >> >> >> + struct nd_dimm *nd_dimm;
> >> >> >> + unsigned long flags = 0;
> >> >> >> + u32 nfit_handle;
> >> >> >> +
> >> >> >> + nfit_handle = __to_nfit_memdev(nfit_mem)->nfit_handle;
> >> >> >> + nd_dimm = nd_acpi_dimm_by_handle(acpi_desc, nfit_handle);
> >> >> >> + if (nd_dimm) {
> >> >> >> + /*
> >> >> >> + * If for some reason we find multiple DCRs the
> >> >> >> + * first one wins
> >> >> >> + */
> >> >> >> + dev_err(acpi_desc->dev, "duplicate DCR detected: %s\n",
> >> >> >> + nd_dimm_name(nd_dimm));
> >> >> >> + continue;
> >> >> >> + }
> >> >> >> +
> >> >> >> + if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> >> >> >> + flags |= NDD_ALIASING;
> >> >> >
> >> >> > Does this check work for a NVDIMM card which has multiple pmem regions
> >> >> > with label info, but does not have any bdw region configured?
> >> >>
> >> >> If you have multiple pmem regions then you don't have aliasing and
> >> >> don't need a label. You'll get an nd_namespace_io per region.
> >> >>
> >> >> > The code assumes that namespace_pmem (NDD_ALIASING) and namespace_blk
> >> >> > have label info. There may be an NVDIMM card with a single blk region
> >> >> > without label info.
> >> >>
> >> >> I'd really like to suggest that labels are only for resolving aliasing
> >> >> and that if you have a BLK-only NVDIMM you'll get an automatic
> >> >> namespace created the same as a PMEM-only. Partitioning is always
> >> >> there to provide sub-divisions of a namespace. The only reason to
> >> >> support multiple BLK-namespaces per-region is to give each a different
> >> >> sector size. I may eventually need to relent on this position, but
> >> >> I'd really like to understand the use case for requiring labels when
> >> >> aliasing is not present as it seems like a waste to me.
> >> >
> >> > By looking at the callers of is_namespace_pmem() and is_namespace_blk(),
> >> > such as nd_namespace_label_update(), I am concerned that the namespace
> >> > types are also used for indicating the presence a label. Is it OK for
> >> > nd_namespace_label_update() to do nothing when there is no aliasing?
> >
> > Did you forget to answer this question? I am not asking to have a
> > label. I am asking if the namespace types can handle it correctly.
> > Restating the nd_namespace_label_update() example:
> > - namespace_io case: Skip, but a label may still exist. Correct?
> > - namespace_blk case: Proceed, but blk does not require a label.
>
> Ah, ok. This is handled by nd_namespace_attr_visible() only labelled
> namespaces have writable sysfs attributes. This would need to be
> extended for a label-less BLK namespace type.
I prefer not to duplicate each namespace type with and without a label,
but I am OK as long as the presence of labels is handled properly.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-04-28 18:24 ` [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory) Dan Williams
2015-04-29 15:53 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
@ 2015-05-04 20:26 ` Toshi Kani
2015-05-09 23:55 ` Dan Williams
1 sibling, 1 reply; 47+ messages in thread
From: Toshi Kani @ 2015-05-04 20:26 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel, linux-acpi
On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
:
> +
> +static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
> + struct nfit_spa *nfit_spa)
> +{
> + static struct nd_mapping nd_mappings[ND_MAX_MAPPINGS];
> + struct acpi_nfit_spa *spa = nfit_spa->spa;
> + struct nfit_memdev *nfit_memdev;
> + struct nd_region_desc ndr_desc;
> + int spa_type, count = 0;
> + struct resource res;
> + u16 spa_index;
> +
> + spa_type = nfit_spa_type(spa);
> + spa_index = spa->spa_index;
> + if (spa_index == 0) {
> + dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
> + __func__);
> + return 0;
> + }
> +
> + memset(&res, 0, sizeof(res));
> + memset(&nd_mappings, 0, sizeof(nd_mappings));
> + memset(&ndr_desc, 0, sizeof(ndr_desc));
> + res.start = spa->spa_base;
> + res.end = res.start + spa->spa_length - 1;
> + ndr_desc.res = &res;
> + ndr_desc.provider_data = nfit_spa;
> + ndr_desc.attr_groups = nd_acpi_region_attribute_groups;
> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
> + struct acpi_nfit_memdev *memdev = nfit_memdev->memdev;
> + struct nd_mapping *nd_mapping;
> + struct nd_dimm *nd_dimm;
> +
> + if (memdev->spa_index != spa_index)
> + continue;
The libnd does not support memdev->flags, which contains "Memory Device
State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
errors, we should only allow a failed NVDIMM be accessed with read-only
for possible data recovery (or not allow any access when the data is
completely lost), and should not let users operate normally over the
corrupted data until the error is dealt properly.
Can you set memdev->flags to nd_region(_desc) so that the pmem driver
can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
the disk read-only or fail probing, and log errors accordingly.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-05-04 23:58 ` Rafael J. Wysocki
@ 2015-05-04 23:46 ` Dan Williams
0 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-05-04 23:46 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-nvdimm@lists.01.org, Linux ACPI, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, David Box
On Mon, May 4, 2015 at 4:58 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, May 01, 2015 09:23:38 AM Dan Williams wrote:
>> On Thu, Apr 30, 2015 at 6:21 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote:
>> >> On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
[..]
>> > If ND_E820 and ND_ACPI aren't mutually exclusive, I still don't see a good
>> > enough reason for asking users about ND_ACPI. Why would I ever say "No"
>> > here if I said "Yes" or "Module" to ND_DEVICES?
>>
>> I agree that if the user selects ND_DEVICES then ND_ACPI should
>> probably default on, but otherwise turning it off is a useful option.
>> If you know your system is pre-ACPI-6 then why bother including
>> support?
>
> If you're a distro, you don't care. You have to support it regardless.
>
> You might care if you're an end user building a kernel for yourself and just
> for this particular specific machine. Honestly, how many *server* users do
> that?
>
> And fewer user-selectable options means fewer combination of options to test
> during development/validation.
>
> Also unrelated, but applies to this patch.
>
> Since your new driver will handle device ID ACPI0012 which is defined by the
> spec proper, it should go into drivers/acpi/, because there's where such things
> go as a rule.
Ok, I think the move to drivers/acpi/ will kill two birds with one
stone as selecting ACPI_NFIT from there will select the libnd support
without prompting.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-05-01 16:23 ` Dan Williams
@ 2015-05-04 23:58 ` Rafael J. Wysocki
2015-05-04 23:46 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2015-05-04 23:58 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Linux ACPI, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, David Box
On Friday, May 01, 2015 09:23:38 AM Dan Williams wrote:
> On Thu, Apr 30, 2015 at 6:21 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote:
> >> On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> [..]
> >> >> +if ND_DEVICES
> >> >> +
> >> >> +config LIBND
> >> >> + tristate "LIBND: libnd device driver support"
> >> >> + help
> >> >> + Platform agnostic device model for a libnd bus. Publishes
> >> >> + resources for a PMEM (persistent-memory) driver and/or BLK
> >> >> + (sliding mmio window(s)) driver to attach. Exposes a device
> >> >> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
> >> >> + message passing interface, and a "/dev/nmemX" dimm-ioctl
> >> >> + message interface for each memory device registered on the
> >> >> + bus. instance. A userspace library "ndctl" provides an API
> >> >> + to enumerate/manage this subsystem.
> >> >> +
> >> >> +config ND_ACPI
> >> >> + tristate "ACPI: NFIT to libnd bus support"
> >> >> + select LIBND
> >> >> + depends on ACPI
> >> >> + help
> >> >> + Infrastructure to probe ACPI 6 compliant platforms for
> >> >> + NVDIMMs (NFIT) and register a libnd device tree. In
> >> >> + addition to storage devices this also enables libnd craft
> >> >> + ACPI._DSM messages for platform/dimm configuration.
> >> >
> >> > I'm wondering if the two CONFIG options above really need to be user-selectable?
> >> >
> >> > For example, what reason people (who've already selected ND_DEVICES) may have
> >> > for not selecting ND_ACPI if ACPI is set?
> >>
> >>
> >> Later on in the series we introduce ND_E820 which supports creating a
> >> libnd-bus from e820-type-12 memory ranges on pre-NFIT systems. I'm
> >> also considering a configfs defined libnd-bus because e820 types are
> >> not nearly enough information to safely define nvdimm resources
> >> outside of NFIT.
> >
> > I hope these are not mutually exclusive with ND_ACPI? Otherwise distros
> > will have problems with supporting them in one kernel.
>
> You can have ND_E820 support and ND_ACPI support in the same system.
> Likely an NFIT enabled system will never have e820-type-12 ranges, but
> if a user messes up and uses the new memmap=ss!nn command line to
> overlap NFIT-defined memory then the request_mem_region() calls in the
> driver will collide. First to load wins in that scenario.
>
> > If ND_E820 and ND_ACPI aren't mutually exclusive, I still don't see a good
> > enough reason for asking users about ND_ACPI. Why would I ever say "No"
> > here if I said "Yes" or "Module" to ND_DEVICES?
>
> I agree that if the user selects ND_DEVICES then ND_ACPI should
> probably default on, but otherwise turning it off is a useful option.
> If you know your system is pre-ACPI-6 then why bother including
> support?
If you're a distro, you don't care. You have to support it regardless.
You might care if you're an end user building a kernel for yourself and just
for this particular specific machine. Honestly, how many *server* users do
that?
And fewer user-selectable options means fewer combination of options to test
during development/validation.
Also unrelated, but applies to this patch.
Since your new driver will handle device ID ACPI0012 which is defined by the
spec proper, it should go into drivers/acpi/, because there's where such things
go as a rule.
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-29 1:22 ` Dan Williams
@ 2015-05-05 0:06 ` Rafael J. Wysocki
2015-05-08 6:31 ` Williams, Dan J
0 siblings, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2015-05-05 0:06 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Boaz Harrosh, Neil Brown, Dave Chinner,
H. Peter Anvin, Ingo Molnar, Rafael J. Wysocki, Robert Moore,
Christoph Hellwig, Linux ACPI, Jeff Moyer, Nicholas Moulin,
Matthew Wilcox, Ross Zwisler, Vishal Verma, Jens Axboe,
Borislav Petkov, Thomas Gleixner, Greg KH,
linux-kernel@vger.kernel.org, Andy Lutomirski, Andrew Morton
On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
> On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> >> Changes since v1 [1]: Incorporates feedback received prior to April 24.
> >>
[cut]
> >
> > I'm wondering what's wrong with CCing all of the series to linux-acpi?
> >
> > Is there anything in it that the people on that list should not see, by any
> > chance?
>
> linux-acpi may not care about the dimm-metadata labeling patches that
> are completely independent of ACPI, but might as well include
> linux-acpi on the whole series at this point.
I've gone through the ACPI-related patches in this series (other than [2/20]
that I've commented directly) and while I haven't found anything horrible in
them, I don't quite feel confident enough to ACK them.
What I'm really missing in this series is a design document describing all that
from a high-level perspective and making it clear where all of the pieces go
and what their respective roles are. Also reordering the series to introduce
the nd subsystem to start with and then its users might help here.
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
2015-04-28 22:15 ` Dan Williams
@ 2015-05-07 7:29 ` Christoph Hellwig
0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2015-05-07 7:29 UTC (permalink / raw)
To: Dan Williams
Cc: Elliott, Robert (Server Storage), linux-nvdimm@lists.01.org,
Neil Brown, Dave Chinner, H. Peter Anvin, Christoph Hellwig,
Wysocki, Rafael J, Moore, Robert, Ingo Molnar,
linux-acpi@vger.kernel.org, Jens Axboe, Borislav Petkov,
Thomas Gleixner, Greg KH, linux-kernel@vger.kernel.org,
Andy Lutomirski, Andrew Morton, Linus Torvalds
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote:
> > lsblk's blkdev_scsi_type_to_name() considers 4 to mean
> > SCSI_TYPE_WORM (write once read many ... used for certain optical
> > and tape drives).
>
> Why is lsblk assuming these are scsi devices? I'll need to go check that out.
It's a very common assumption unfortunately. I rember fixing it in
various in-house tools at customers and stumbled over it in targetcli
recently.
Please use a prefix for your type attribute to avoid this problem.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v2 00/20] libnd: non-volatile memory device support
2015-05-05 0:06 ` Rafael J. Wysocki
@ 2015-05-08 6:31 ` Williams, Dan J
0 siblings, 0 replies; 47+ messages in thread
From: Williams, Dan J @ 2015-05-08 6:31 UTC (permalink / raw)
To: rjw@rjwysocki.net
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org,
nicholas.w.moulin@linux.intel.com, neilb@suse.de,
jmoyer@redhat.com, tglx@linutronix.de,
torvalds@linux-foundation.org, hch@lst.de, Moore, Robert,
Wysocki, Rafael J, hpa@zytor.com, linux-nvdimm@lists.01.org,
axboe@fb.com, vishal.l.verma@linux.intel.com,
willy@linux.intel.com, bp@alien8.de, ross.zwisler@linux.intel.com,
luto@amacapital.net, gregkh
On Tue, 2015-05-05 at 02:06 +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
> > On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> > >> Changes since v1 [1]: Incorporates feedback received prior to April 24.
> > >>
>
> [cut]
>
> > >
> > > I'm wondering what's wrong with CCing all of the series to linux-acpi?
> > >
> > > Is there anything in it that the people on that list should not see, by any
> > > chance?
> >
> > linux-acpi may not care about the dimm-metadata labeling patches that
> > are completely independent of ACPI, but might as well include
> > linux-acpi on the whole series at this point.
>
> I've gone through the ACPI-related patches in this series (other than [2/20]
> that I've commented directly) and while I haven't found anything horrible in
> them, I don't quite feel confident enough to ACK them.
>
> What I'm really missing in this series is a design document describing all that
> from a high-level perspective and making it clear where all of the pieces go
> and what their respective roles are. Also reordering the series to introduce
> the nd subsystem to start with and then its users might help here.
Here you go, and also see the "Supporting Documents" section if you need
more details, or just ask. This is the reworked document after pushing
NFIT specifics out of the core implementation. The core apis are
nd_bus_register(), nd_dimm_create(), nd_pmem_region_create(), and
nd_blk_region_create().
---
LIBND: Non-volatile Devices
libnd - kernel / libndctl - userspace helper library
linux-nvdimm@lists.01.org
v10
Glossary
Overview
Supporting Documents
Git Trees
LIBND PMEM and BLK
Why BLK?
PMEM vs BLK
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
Example NVDIMM Platform
LIBND Kernel Device Model and LIBNDCTL Userspace API
LIBNDCTL: Context
libndctl: instantiate a new library context example
LIBND/LIBNDCTL: Bus
libnd: control class device in /sys/class
libnd: bus
libndctl: bus enumeration example
LIBND/LIBNDCTL: DIMM (NMEM)
libnd: DIMM (NMEM)
libndctl: DIMM enumeration example
LIBND/LIBNDCTL: Region
libnd: region
libndctl: region enumeration example
Why Not Encode the Region Type into the Region Name?
How Do I Determine the Major Type of a Region?
LIBND/LIBNDCTL: Namespace
libnd: namespace
libndctl: namespace enumeration example
libndctl: namespace creation example
Why the Term "namespace"?
LIBND/LIBNDCTL: Block Translation Table "btt"
libnd: btt layout
libndctl: btt creation example
Summary LIBNDCTL Diagram
Glossary
--------
PMEM: A system physical address range where writes are persistent. A
block device composed of PMEM is capable of DAX. A PMEM address range
may span/interleave several DIMMs.
BLK: A set of one or more programmable memory mapped apertures provided
by a DIMM to access its media. This indirection precludes the
performance benefit of interleaving, but enables DIMM-bounded failure
modes .
DPA: DIMM Physical Address, is a DIMM-relative offset. With one DIMM in
the system there would be a 1:1 system-physical-address:DPA association.
Once more DIMMs are added an memory controller interleave must be
decoded to determine the DPA associated with a given
system-physical-address. BLK capacity always has a 1:1 relationship
with a single-dimm's DPA range.
DAX: File system extensions to bypass the page cache and block layer to
mmap persistent memory, from a PMEM block device, directly into a
process address space.
BTT: Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes. The BTT is an indirection
table with atomic update semantics to front a PMEM/BLK block device
driver and present arbitrary atomic sector sizes.
LABEL: Metadata stored on a DIMM device that partitions and identifies
(persistently names) storage between PMEM and BLK. It also partitions
BLK storage to host BTTs with different parameters per BLK-partition.
Note that traditional partition tables, GPT/MBR, are layered on top of a
BLK or PMEM device.
Overview
--------
The libnd subsystem provides support for three types of NVDIMMs, PMEM,
BLK, and NVDIMM platforms that can simultaneously support PMEM and BLK
mode access capabilities on a given set of DIMMs. These three modes of
operation are described by the "NVDIMM Firmware Interface Table" (NFIT)
in ACPI 6. While the libnd implementation is generic and supports
pre-NFIT platforms, it was guided by the superset of capabilities need
to support this ACPI 6 definition for NVDIMM resources. The bulk of the
kernel implementation is in place to handle the case where DPA
accessible via PMEM is aliased with DPA accessible via BLK. When that
occurs a LABEL is needed to reserve DPA for exclusive access via one
mode a time.
Supporting Documents
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
Git Trees
LIBND: https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/log/?h=nd
LIBNDCTL: https://github.com/pmem/ndctl.git
PMEM: https://github.com/01org/prd
LIBND PMEM and BLK
------------------
Prior to the arrival of the NFIT, non-volatile memory was described to a
system in various ad-hoc ways. Usually only the bare minimum was
provided, namely, a single system-physical-address range where writes
are expected to be durable after a system power loss. Now, the NFIT
specification standardizes not only the description of PMEM, but also
BLK and platform message-passing entry points for control and
configuration.
For each NVDIMM access method (PMEM, BLK), LIBND provides a block device driver:
1. PMEM (nd_pmem.ko): Drives a system-physical-address range. This
range is contiguous in system memory and may be interleaved (hardware
memory controller striped) across multiple DIMMs. When interleaved the
platform may optionally provide details of which DIMMs are participating
in the interleave.
Note, LIBND describes system-physical-address ranges that may alias with
BLK access ND_NAMESPACE_PMEM ranges and those without alias as
ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no distinction.
The different device-types are an implementation detail that userspace
can exploit to implement policies like "only interface with address
ranges from certain DIMMs". It is worth noting that when aliasing is
present and a DIMM lacks a label, then no block device can be created by
default as userspace needs to do at least one allocation of DPA to the
PMEM range. In contrast ND_NAMESPACE_IO ranges, once registered, can be
immediately attached to nd_pmem.
2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
defined apertures. A set of apertures will all access just one DIMM.
Multiple windows allow multiple concurrent accesses, much like
tagged-command-queuing, and would likely be used by different threads or
different CPUs.
The NFIT specification defines a standard format for a BLK-aperture, but
the spec also allows for vendor specific layouts, and non-NFIT BLK
implementations may other designs for BLK I/O. For this reason "nd_blk"
calls back into platform-specific code to perform the I/O. One such
implementation is defined in the "Driver Writer's Guide" an "DSM
Interface Example".
Why BLK?
--------
While PMEM provides direct byte-addressable CPU-load/store access to
NVDIMM storage, it does not provide the best system RAS (recovery,
availability, and serviceability) model. An access to a corrupted
system-physical-address address causes a cpu exception while an access
to a corrupted address through an BLK-aperture causes that block window
to raise an error status in a register. The latter is more aligned with
the standard error model that host-bus-adapter attached disks present.
Also, if an administrator ever wants to replace a memory it is easier to
service a system at DIMM module boundaries. Compare this to PMEM where
data could be interleaved in an opaque hardware specific manner across
several DIMMs.
PMEM vs BLK
BLK-apertures solve this RAS problem, but their presence is also the
major contributing factor to the complexity of the ND subsystem. They
complicate the implementation because PMEM and BLK alias in DPA space.
Any given DIMM's DPA-range may contribute to one or more
system-physical-address sets of interleaved DIMMs, *and* may also be
accessed in its entirety through its BLK-aperture. Accessing a DPA
through a system-physical-address while simultaneously accessing the
same DPA through a BLK-aperture has undefined results. For this reason,
DIMM's with this dual interface configuration include a DSM function to
store/retrieve a LABEL. The LABEL effectively partitions the DPA-space
into exclusive system-physical-address and BLK-aperture accessible
regions. For simplicity a DIMM is allowed a PMEM "region" per each
interleave set in which it is a member. The remaining DPA space can be
carved into an arbitrary number of BLK devices with discontiguous
extents.
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
--------------------------------------------------
One of the few
reasons to allow multiple BLK namespaces per REGION is so that each
BLK-namespace can be configured with a BTT with unique atomic sector
sizes. While a PMEM device can host a BTT the LABEL specification does
not provide for a sector size to be specified for a PMEM namespace.
This is due to the expectation that the primary usage model for PMEM is
via DAX, and the BTT is incompatible with DAX. However, for the cases
where an application or filesystem still needs atomic sector update
guarantees it can register a BTT on a PMEM device or partition. See
LIBND/NDCTL: Block Translation Table "btt"
Example NVDIMM Platform
-----------------------
For the remainder of this document the following diagram will be
referenced for any example sysfs layouts.
(a) (b) DIMM BLK-REGION
+-------------------+--------+--------+--------+
+------+ | pm0.0 | blk2.0 | pm1.0 | blk2.1 | 0 region2
| imc0 +--+- - - region0- - - +--------+ +--------+
+--+---+ | pm0.0 | blk3.0 | pm1.0 | blk3.1 | 1 region3
| +-------------------+--------v v--------+
+--+---+ | |
| cpu0 | region1
+--+---+ | |
| +----------------------------^ ^--------+
+--+---+ | blk4.0 | pm1.0 | blk4.0 | 2 region4
| imc1 +--+----------------------------| +--------+
+------+ | blk5.0 | pm1.0 | blk5.0 | 3 region5
+----------------------------+--------+--------+
In this platform we have four DIMMs and two memory controllers in one
socket. Each unique interface (BLK or PMEM) to DPA space is identified
by a region device with a dynamically assigned id (REGION0 - REGION5).
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
single PMEM namespace is created in the REGION0-SPA-range that spans
DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
interleaved system-physical-address range is reclaimed as BLK-aperture
accessed space starting at DPA-offset (a) into each DIMM. In that
reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
REGION3 where "blk2.0" and "blk3.0" are just human readable names that
could be set to any user-desired name in the LABEL.
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
system-physical-address range, REGION1, that spans those two DIMMs as
well as DIMM2 and DIMM3. Some of REGION1 allocated to a PMEM namespace
named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for
each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
"blk5.0".
3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
interleaved system-physical-address range (i.e. the DPA address below
offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
Note, that this example shows that BLK-aperture namespaces don't need to
be contiguous in DPA-space.
This bus is provided by the kernel under the device
/sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
the nfit_test.ko module is loaded. This not only test LIBND but the
acpi_nfit.ko driver as well.
LIBND Kernel Device Model and LIBNDCTL Userspace API
----------------------------------------------------
What follows is a description of the LIBND sysfs layout and a
corresponding object hierarchy diagram as viewed through the LIBNDCTL
api. The example sysfs paths and diagrams are relative to the Example
NVDIMM Platform which is also the libnd bus used in the libndctl unit
test.
LIBNDCTL: Context
Every api call in the LIBNDCTL library requires a context that holds the
logging parameters and other library instance state. The library is
based on the libabc template:
https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git/ libndctl:
instantiate a new library context example
struct ndctl_ctx *ctx;
if (ndctl_new(&ctx) == 0)
return ctx;
else
return NULL;
LIBND/LIBNDCTL: Bus
-------------------
A bus has a 1:1 relationship with an NFIT. The current expectation for
ACPI based systems is that there is only ever one platform-global NFIT.
That said, it is trivial to register multiple NFITs, the specification
does not preclude it. The infrastructure supports multiple busses and
we we use this capability to test multiple NFIT configurations in the
unit test.
libnd: control class device in /sys/class
This character device accepts DSM messages to be passed to DIMM
identified by its NFIT handle.
/sys/class/nd/ndctl0
|-- dev
|-- device -> ../../../ndbus0
|-- subsystem -> ../../../../../../../class/nd
libnd: bus
struct nd_bus *nd_bus_register(struct device *parent,
struct nd_bus_descriptor *nfit_desc);
/sys/devices/platform/nfit_test.0/ndbus0
|-- btt0
|-- btt_seed
|-- commands
|-- nd
|-- nfit
|-- nmem0
|-- nmem1
|-- nmem2
|-- nmem3
|-- power
|-- provider
|-- region0
|-- region1
|-- region2
|-- region3
|-- region4
|-- region5
|-- uevent
`-- wait_probe
libndctl: bus enumeration example
Find the bus handle that describes the bus from Example NVDIMM Platform
static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
const char *provider)
{
struct ndctl_bus *bus;
ndctl_bus_foreach(ctx, bus)
if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
return bus;
return NULL;
}
bus = get_bus_by_provider(ctx, "nfit_test.0");
LIBND/LIBNDCTL: DIMM (NMEM)
---------------------------
The DIMM device provides a character device for sending commands to
hardware, and it is a container for LABELs. If the DIMM is defined by
NFIT then an optional 'nfit' attribute sub-directory is available to add
NFIT-specifics.
Note that the kernel device name for "DIMMs" is "nmemX". The NFIT
describes these devices via "Memory Device to System Physical Address
Range Mapping Structure", and there is no requirement that they actually
be physical DIMMs, so we use a more generic name.
libnd: DIMM (NMEM)
struct nd_dimm *nd_dimm_create(struct nd_bus *nd_bus, void *provider_data,
const struct attribute_group **groups, unsigned long flags,
unsigned long *dsm_mask);
/sys/devices/platform/nfit_test.0/ndbus0
|-- nmem0
| |-- available_slots
| |-- commands
| |-- dev
| |-- devtype
| |-- driver -> ../../../../../bus/nd/drivers/nd_dimm
| |-- modalias
| |-- nfit
| | |-- device
| | |-- format
| | |-- handle
| | |-- phys_id
| | |-- rev_id
| | |-- serial
| | `-- vendor
| |-- state
| |-- subsystem -> ../../../../../bus/nd
| `-- uevent
|-- nmem1
[..]
libndctl: DIMM enumeration example
Note, in this example we are assuming NFIT-defined DIMMs which are
identified by an "nfit_handle" a 32-bit value where:
Bit 3:0 DIMM number within the memory channel
Bit 7:4 memory channel number
Bit 11:8 memory controller ID
Bit 15:12 socket ID (within scope of a Node controller if node controller is present)
Bit 27:16 Node Controller ID
Bit 31:28 Reserved
static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
unsigned int handle)
{
struct ndctl_dimm *dimm;
ndctl_dimm_foreach(bus, dimm)
if (ndctl_dimm_get_handle(dimm) == handle)
return dimm;
return NULL;
}
#define DIMM_HANDLE(n, s, i, c, d) \
(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
| ((c & 0xf) << 4) | (d & 0xf))
dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
LIBND/LIBNDCTL: Region
----------------------
A generic REGION device is registered for each PMEM range orBLK-aperture
set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
sets on the "nfit_test.0" bus. The primary role of regions are to be a
container of "mappings". A mapping is a tuple of <DIMM,
DPA-start-offset, length>.
LIBND provides a built-in driver for these REGION devices. This driver
is responsible for reconciling the aliased DPA mappings across all
regions, parsing the LABEL, if present, and then emitting NAMESPACE
devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
nd_blk device driver to consume.
In addition to the generic attributes of "mapping"s, "interleave_ways"
and "size" the REGION device also exports some convenience attributes.
"nstype" indicates the integer type of namespace-device this region
emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
'add' event, "modalias" duplicates the MODALIAS variable stored by udev
at the 'add' event, and finally, the optional "spa_index" is provided in
the case where the region is defined by a SPA.
libnd: region
struct nd_region *nd_pmem_region_create(struct nd_bus *nd_bus,
struct nd_region_desc *ndr_desc);
struct nd_region *nd_blk_region_create(struct nd_bus *nd_bus,
struct nd_region_desc *ndr_desc);
/sys/devices/platform/nfit_test.0/ndbus0
|-- region0
| |-- available_size
| |-- devtype
| |-- driver -> ../../../../../bus/nd/drivers/nd_region
| |-- init_namespaces
| |-- mapping0
| |-- mapping1
| |-- mappings
| |-- modalias
| |-- namespace0.0
| |-- namespace_seed
| |-- nfit
| | `-- spa_index
| |-- nstype
| |-- set_cookie
| |-- size
| |-- subsystem -> ../../../../../bus/nd
| `-- uevent
|-- region1
[..]
libndctl: region enumeration example
Sample region retrieval routines based on NFIT-unique data like
"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
BLK.
static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
unsigned int spa_index)
{
struct ndctl_region *region;
ndctl_region_foreach(bus, region) {
if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
continue;
if (ndctl_region_get_spa_index(region) == spa_index)
return region;
}
return NULL;
}
static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
unsigned int handle)
{
struct ndctl_region *region;
ndctl_region_foreach(bus, region) {
struct ndctl_mapping *map;
if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
continue;
ndctl_mapping_foreach(region, map) {
struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
if (ndctl_dimm_get_handle(dimm) == handle)
return region;
}
}
return NULL;
}
Why Not Encode the Region Type into the Region Name?
----------------------------------------------------
At first glance it seems since NFIT defines just PMEM and BLK interface
types that we should simply name REGION devices with something derived
from those type names. However, the ND subsystem explicitly keeps the
REGION name generic and expects userspace to always consider the
region-attributes for 4 reasons:
1. There are already more than two REGION and "namespace" types. For
PMEM there are two subtypes. As mentioned previously we have PMEM where
the constituent DIMM devices are known and anonymous PMEM. For BLK
regions the NFIT specification already anticipates vendor specific
implementations. The exact distinction of what a region contains is in
the region-attributes not the region-name or the region-devtype.
2. A region with zero child-namespaces is a possible configuration. For
example, the NFIT allows for a DCR to be published without a
corresponding BLK-aperture. This equates to a DIMM that can only accept
control/configuration messages, but no i/o through a descendant block
device. Again, this "type" is advertised in the attributes ('mappings'
== 0) and the name does not tell you much.
3. What if a third major interface type arises in the future? Outside
of vendor specific implementations, it's not difficult to envision a
third class of interface type beyond BLK and PMEM. With a generic name
for the REGION level of the device-hierarchy old userspace
implementations can still make sense of new kernel advertised
region-types. Userspace can always rely on the generic region
attributes like "mappings", "size", etc and the expected child devices
named "namespace". This generic format of the device-model hierarchy
allows the LIBND and LIBNDCTL implementations to be more uniform and
future-proof.
4. There are more robust mechanisms for determining the major type of a
region than a device name. See the next section, How Do I Determine the
Major Type of a Region?
How Do I Determine the Major Type of a Region?
----------------------------------------------
Outside of the blanket recommendation of "use libndctl", or simply
looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
"nstype" integer attribute, here are some other options.
1. module alias lookup:
The whole point of region/namespace device type differentiation is to
decide which block-device driver will attach to a given LIBND namespace.
One can simply use the modalias to lookup the resulting module. It's
important to note that this method is robust in the presence of a
vendor-specific driver down the road. If a vendor-specific
implementation wants to supplant the standard nd_blk driver it can with
minimal impact to the rest of LIBND.
In fact, a vendor may also want to have a vendor-specific region-driver
(outside of nd_region). For example, if a vendor defined its own LABEL
format it would need its own region driver to parse that LABEL and emit
the resulting namespaces. The output from module resolution is more
accurate than a region-name or region-devtype.
2. udev:
The kernel "devtype" is registered in the udev database
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
P: /devices/platform/nfit_test.0/ndbus0/region0
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
E: DEVTYPE=nd_pmem
E: MODALIAS=nd:t2
E: SUBSYSTEM=nd
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
P: /devices/platform/nfit_test.0/ndbus0/region4
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
E: DEVTYPE=nd_blk
E: MODALIAS=nd:t3
E: SUBSYSTEM=nd
...and is available as a region attribute, but keep in mind that the
"devtype" does not indicate sub-type variations and scripts should
really be understanding the other attributes.
3. type specific attributes:
As it currently stands a BLK-aperture region will never have a
"nfit/spa_index" attribute, but neither will a non-NFIT PMEM region. A
BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
that does not allow I/O. A PMEM region with a "mappings" value of zero
is a simple system-physical-address range.
LIBND/LIBNDCTL: Namespace
-------------------------
A REGION, after resolving DPA aliasing and LABEL specified boundaries,
surfaces one or more "namespace" devices. The arrival of a "namespace"
device currently triggers either the nd_blk or nd_pmem driver to load
and register a disk/block device.
libnd: namespace
Here is a sample layout from the three major types of NAMESPACE where
namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
attribute), namespace2.0 represents a BLK namespace (note it has a
'sector_size' attribute) that, and namespace6.0 represents an anonymous
PMEM namespace (note that has no 'uuid' attribute due to not support a
LABEL).
/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
|-- alt_name
|-- devtype
|-- dpa_extents
|-- modalias
|-- resource
|-- size
|-- subsystem -> ../../../../../../bus/nd
|-- type
|-- uevent
`-- uuid
/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
|-- alt_name
|-- devtype
|-- dpa_extents
|-- modalias
|-- sector_size
|-- size
|-- subsystem -> ../../../../../../bus/nd
|-- type
|-- uevent
`-- uuid
/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
|-- block
| `-- pmem0
|-- devtype
|-- driver -> ../../../../../../bus/nd/drivers/pmem
|-- modalias
|-- resource
|-- size
|-- subsystem -> ../../../../../../bus/nd
|-- type
`-- uevent
libndctl: namespace enumeration example
Namespaces are indexed relative to their parent region, example below. These indexes are mostly static from boot to boot, but subsystem makes no guarantees in this regard. For a static namespace identifier use its 'uuid' attribute.
static struct ndctl_namespace *get_namespace_by_id(struct ndctl_region *region,
unsigned int id)
{
struct ndctl_namespace *ndns;
ndctl_namespace_foreach(region, ndns)
if (ndctl_namespace_get_id(ndns) == id)
return ndns;
return NULL;
}
libndctl: namespace creation example
Idle namespaces are automatically created by the kernel if a given region has enough available capacity to create a new namespace. Namespace instantiation involves finding an idle namespace and configuring it. For the most part the setting of namespace attributes can occur in any order, the only constraint is that 'uuid' must be set before 'size'. This enables the kernel to track DPA allocations internally with a static identifier.
static int configure_namespace(struct ndctl_region *region,
struct ndctl_namespace *ndns,
struct namespace_parameters *parameters)
{
char devname[50];
snprintf(devname, sizeof(devname), "namespace%d.%d",
ndctl_region_get_id(region), paramaters->id);
ndctl_namespace_set_alt_name(ndns, devname);
/* 'uuid' must be set prior to setting size! */
ndctl_namespace_set_uuid(ndns, paramaters->uuid);
ndctl_namespace_set_size(ndns, paramaters->size);
/* unlike pmem namespaces, blk namespaces have a sector size */
if (parameters->lbasize)
ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
ndctl_namespace_enable(ndns);
}
Why the Term "namespace"?
1. Why not "volume" for instance? "volume" ran the risk of confusing ND
as a volume manager like device-mapper.
2. The term originated to describe the sub-devices that can be created
within a NVME controller (see the nvme specification:
http://www.nvmexpress.org/specifications/), and NFIT namespaces are
meant to parallel the capabilities and configurability of
NVME-namespaces.
LIBND/LIBNDCTL: Block Translation Table "btt"
---------------------------------------------
A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
block device driver that fronts either the whole block device or a
partition of a block device emitted by either a PMEM or BLK NAMESPACE.
libnd: btt layout
Every bus will start out with at least one BTT device which is the seed
device. To activate it set the "backing_dev", "uuid", and "sector_size"
attributes and then bind the device to the nd_btt driver.
/sys/devices/platform/nfit_test.1/ndbus0/btt0/
|-- backing_dev
|-- delete
|-- devtype
|-- modalias
|-- sector_size
|-- subsystem -> ../../../../../bus/nd
|-- uevent
`-- uuid
libndctl: btt creation example
Similar to namespaces an idle BTT device is automatically created per
bus. Each time this "seed" btt device is configured and enabled a new
seed is created. Creating a BTT configuration involves two steps of
finding and idle BTT and assigning it to front a PMEM or BLK namespace.
static struct ndctl_btt *get_idle_btt(struct ndctl_bus *bus)
{
struct ndctl_btt *btt;
ndctl_btt_foreach(bus, btt)
if (!ndctl_btt_is_enabled(btt) && !ndctl_btt_is_configured(btt))
return btt;
return NULL;
}
static int configure_btt(struct ndctl_bus *bus, struct btt_parameters *parameters)
{
btt = get_idle_btt(bus);
sprintf(bdevpath, "/dev/%s",
ndctl_namespace_get_block_device(parameters->ndns));
ndctl_btt_set_uuid(btt, parameters->uuid);
ndctl_btt_set_sector_size(btt, parameters->sector_size);
ndctl_btt_set_backing_dev(btt, parametes->bdevpath);
ndctl_btt_enable(btt);
}
Once instantiated a "nd_btt" link will be created under the
"backing_dev" (pmem0) block device:
/sys/block/pmem0/
|-- alignment_offset
|-- bdi -> ../../../../../../../virtual/bdi/259:0
|-- capability
|-- dev
|-- device -> ../../../namespace0.0
|-- discard_alignment
|-- ext_range
|-- holders
|-- inflight
|-- nd_btt -> ../../../../btt0
...and a new inactive seed device will appear on the bus.
Once a "backing_dev" is disabled its associated BTT will be
automatically deleted. This deletion is only at the device model level.
In order to destroy a BTT the "info block" needs to be destroyed.
Summary LIBNDCTL Diagram
------------------------
For the given example above, here is the view of the objects as seen by the LIBNDCTL api:
+---+
|CTX| +---------+ +--------------+ +---------------+
+-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
| | +---------+ +--------------+ +---------------+
+-------+ | | +---------+ +--------------+ +---------------+
| DIMM0 <-+ | +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
+-------+ | | | +---------+ +--------------+ +---------------+
| DIMM1 <-+ +-v--+ | +---------+ +--------------+ +---------------+
+-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6 "blk2.0" |
| DIMM2 <-+ +----+ | +---------+ | +--------------+ +----------------------+
+-------+ | | +-> NAMESPACE2.1 +--> ND5 "blk2.1" | BTT2 |
| DIMM3 <-+ | +--------------+ +----------------------+
+-------+ | +---------+ +--------------+ +---------------+
+-> REGION3 +-+-> NAMESPACE3.0 +--> ND4 "blk3.0" |
| +---------+ | +--------------+ +----------------------+
| +-> NAMESPACE3.1 +--> ND3 "blk3.1" | BTT1 |
| +--------------+ +----------------------+
| +---------+ +--------------+ +---------------+
+-> REGION4 +---> NAMESPACE4.0 +--> ND2 "blk4.0" |
| +---------+ +--------------+ +---------------+
| +---------+ +--------------+ +----------------------+
+-> REGION5 +---> NAMESPACE5.0 +--> ND1 "blk5.0" | BTT0 |
+---------+ +--------------+ +---------------+------+
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-05-04 20:26 ` Toshi Kani
@ 2015-05-09 23:55 ` Dan Williams
2015-05-28 18:36 ` Toshi Kani
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-09 23:55 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Mon, May 4, 2015 at 1:26 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> :
>> +
>> +static int nd_acpi_register_region(struct acpi_nfit_desc *acpi_desc,
>> + struct nfit_spa *nfit_spa)
>> +{
>> + static struct nd_mapping nd_mappings[ND_MAX_MAPPINGS];
>> + struct acpi_nfit_spa *spa = nfit_spa->spa;
>> + struct nfit_memdev *nfit_memdev;
>> + struct nd_region_desc ndr_desc;
>> + int spa_type, count = 0;
>> + struct resource res;
>> + u16 spa_index;
>> +
>> + spa_type = nfit_spa_type(spa);
>> + spa_index = spa->spa_index;
>> + if (spa_index == 0) {
>> + dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
>> + __func__);
>> + return 0;
>> + }
>> +
>> + memset(&res, 0, sizeof(res));
>> + memset(&nd_mappings, 0, sizeof(nd_mappings));
>> + memset(&ndr_desc, 0, sizeof(ndr_desc));
>> + res.start = spa->spa_base;
>> + res.end = res.start + spa->spa_length - 1;
>> + ndr_desc.res = &res;
>> + ndr_desc.provider_data = nfit_spa;
>> + ndr_desc.attr_groups = nd_acpi_region_attribute_groups;
>> + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
>> + struct acpi_nfit_memdev *memdev = nfit_memdev->memdev;
>> + struct nd_mapping *nd_mapping;
>> + struct nd_dimm *nd_dimm;
>> +
>> + if (memdev->spa_index != spa_index)
>> + continue;
>
> The libnd does not support memdev->flags, which contains "Memory Device
> State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
> errors, we should only allow a failed NVDIMM be accessed with read-only
> for possible data recovery (or not allow any access when the data is
> completely lost), and should not let users operate normally over the
> corrupted data until the error is dealt properly.
I agree with setting read-only access when these flags show that the
battery is not ready to persist new writes, but I don't think we
should block access in the case where the restore from flash failed.
If the data is potentially corrupted we should log that fact, but
otherwise enable access. I.e. potentially corrupt data is better than
unavailable data. It's up to filesystem or application to maintain
its own checksums to catch data corruption.
> Can you set memdev->flags to nd_region(_desc) so that the pmem driver
> can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
> the disk read-only or fail probing, and log errors accordingly.
Will do.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
2015-04-30 23:23 ` Rafael J. Wysocki
@ 2015-05-15 19:44 ` Jeff Moyer
2015-05-15 20:41 ` Dan Williams
1 sibling, 1 reply; 47+ messages in thread
From: Jeff Moyer @ 2015-05-15 19:44 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, linux-acpi, Rafael J. Wysocki, Robert Moore,
linux-kernel
Dan Williams <dan.j.williams@intel.com> writes:
Looks like the Kconfig stuff has been worked out between you and Rafael,
so I won't comment on that.
> diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
> new file mode 100644
> index 000000000000..9f0b24390d1b
> --- /dev/null
> +++ b/drivers/block/nd/acpi.c
> @@ -0,0 +1,475 @@
> +/*
> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +#include <linux/list_sort.h>
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/acpi.h>
> +#include "acpi_nfit.h"
> +#include "libnd.h"
> +
> +static bool warn_checksum;
> +module_param(warn_checksum, bool, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(warn_checksum, "Turn checksum errors into warnings");
Is this just a debugging option?
> +
> +enum {
> + NFIT_ACPI_NOTIFY_TABLE = 0x80,
> +};
This is unused by this patch.
The rest looks ok to me.
-Jeff
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development
2015-04-28 18:24 ` [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development Dan Williams
@ 2015-05-15 20:25 ` Jeff Moyer
2015-05-15 20:50 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Jeff Moyer @ 2015-05-15 20:25 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, linux-acpi, Rafael J. Wysocki, Robert Moore,
linux-kernel
Dan Williams <dan.j.williams@intel.com> writes:
> +config NFIT_TEST
> + tristate "NFIT TEST: Manufactured NFIT for interface testing"
> + depends on DMA_CMA
> + depends on LIBND=m
> + depends on ND_ACPI
> + depends on m
> + help
> + For development purposes register a manufactured
> + NFIT table to verify the resulting device model topology.
> + Note, this module arranges for ioremap_cache() to be
> + overridden locally to allow simulation of system-memory as an
> + io-memory-resource.
> +
> + Note, this test expects to be able to find at least 256MB of
> + CMA space (CONFIG_CMA_SIZE_MBYTES, cma=) or it will fail to
> + load.
> +
> + Say N unless you are doing development of the 'nd' subsystem.
> +
Too many TLAs. I'm guessing CMA means Conventional Memory Area to you.
To me it means contiguous memory allocator. Anyway, please define
acronyms when you use them, especially in help text. The help text also
doesn't really explain where it will find this memory. Would it be
possible to provide more direction there?
I don't have any useful commentary on the patch itself. I do wonder if
you shouldn't move this to the end, as it's hardly an integral part of
the patch set.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
2015-05-15 19:44 ` [Linux-nvdimm] " Jeff Moyer
@ 2015-05-15 20:41 ` Dan Williams
0 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-05-15 20:41 UTC (permalink / raw)
To: Jeff Moyer
Cc: linux-nvdimm, Linux ACPI, Rafael J. Wysocki, Robert Moore,
linux-kernel@vger.kernel.org
On Fri, May 15, 2015 at 12:44 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
> Looks like the Kconfig stuff has been worked out between you and Rafael,
> so I won't comment on that.
>
>> diff --git a/drivers/block/nd/acpi.c b/drivers/block/nd/acpi.c
>> new file mode 100644
>> index 000000000000..9f0b24390d1b
>> --- /dev/null
>> +++ b/drivers/block/nd/acpi.c
>> @@ -0,0 +1,475 @@
>> +/*
>> + * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of version 2 of the GNU General Public License as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + */
>> +#include <linux/list_sort.h>
>> +#include <linux/module.h>
>> +#include <linux/list.h>
>> +#include <linux/acpi.h>
>> +#include "acpi_nfit.h"
>> +#include "libnd.h"
>> +
>> +static bool warn_checksum;
>> +module_param(warn_checksum, bool, S_IRUGO|S_IWUSR);
>> +MODULE_PARM_DESC(warn_checksum, "Turn checksum errors into warnings");
>
> Is this just a debugging option?
Yes, but I've deleted it in the next rev of the code since the ACPI
core will have already done the checksum. The driver need not
implement it's own checksum.
>> +
>> +enum {
>> + NFIT_ACPI_NOTIFY_TABLE = 0x80,
>> +};
>
> This is unused by this patch.
Yes. I went ahead and deleted all the ACPI notification
infrastructure until we're ready to implement hot-add.
> The rest looks ok to me.
Thanks Jeff.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development
2015-05-15 20:25 ` [Linux-nvdimm] " Jeff Moyer
@ 2015-05-15 20:50 ` Dan Williams
0 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-05-15 20:50 UTC (permalink / raw)
To: Jeff Moyer
Cc: linux-nvdimm, Linux ACPI, Rafael J. Wysocki, Robert Moore,
linux-kernel@vger.kernel.org
On Fri, May 15, 2015 at 1:25 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> +config NFIT_TEST
>> + tristate "NFIT TEST: Manufactured NFIT for interface testing"
>> + depends on DMA_CMA
>> + depends on LIBND=m
>> + depends on ND_ACPI
>> + depends on m
>> + help
>> + For development purposes register a manufactured
>> + NFIT table to verify the resulting device model topology.
>> + Note, this module arranges for ioremap_cache() to be
>> + overridden locally to allow simulation of system-memory as an
>> + io-memory-resource.
>> +
>> + Note, this test expects to be able to find at least 256MB of
>> + CMA space (CONFIG_CMA_SIZE_MBYTES, cma=) or it will fail to
>> + load.
>> +
>> + Say N unless you are doing development of the 'nd' subsystem.
>> +
>
> Too many TLAs. I'm guessing CMA means Conventional Memory Area to you.
> To me it means contiguous memory allocator.
I means Contiguous Memory Allocator to me too.
> Anyway, please define
> acronyms when you use them, especially in help text. The help text also
> doesn't really explain where it will find this memory. Would it be
> possible to provide more direction there?
I didn't think I needed to define CMA in the context of Kconfig, but
I'll replace it with CONFIG_DMA_CMA to be more clear.
> I don't have any useful commentary on the patch itself. I do wonder if
> you shouldn't move this to the end, as it's hardly an integral part of
> the patch set.
True. The patches are currently in "development order" in that I
created the test infrastructure before the rest of the implementation.
But, I agree it makes sense to move this to the end for the next
posting.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes
2015-04-28 18:24 ` [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes Dan Williams
@ 2015-05-15 21:00 ` Jeff Moyer
0 siblings, 0 replies; 47+ messages in thread
From: Jeff Moyer @ 2015-05-15 21:00 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel, linux-acpi
Dan Williams <dan.j.williams@intel.com> writes:
> This is the position (device topology) independent method to find all
> the libnd buses in the system. The expectation is that there will only
> ever be one "nd" bus discovered via /sys/class/nd/ndctl0. However, we
> allow for the possibility of multiple buses and they will listed in
> discovery order as ndctl0...ndctlN. This character device hosts the
> ioctl for passing control messages (inspired by the ACPI-NFIT DSM
> interface commands).
>
> Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
> a subsequent patch.
Acked-by: Jeff Moyer <jmoyer@redhat.com>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-05-09 23:55 ` Dan Williams
@ 2015-05-28 18:36 ` Toshi Kani
2015-05-28 19:59 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Toshi Kani @ 2015-05-28 18:36 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI
On Sat, 2015-05-09 at 16:55 -0700, Dan Williams wrote:
> On Mon, May 4, 2015 at 1:26 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
:
> >
> > The libnd does not support memdev->flags, which contains "Memory Device
> > State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
> > errors, we should only allow a failed NVDIMM be accessed with read-only
> > for possible data recovery (or not allow any access when the data is
> > completely lost), and should not let users operate normally over the
> > corrupted data until the error is dealt properly.
>
> I agree with setting read-only access when these flags show that the
> battery is not ready to persist new writes, but I don't think we
> should block access in the case where the restore from flash failed.
> If the data is potentially corrupted we should log that fact, but
> otherwise enable access. I.e. potentially corrupt data is better than
> unavailable data. It's up to filesystem or application to maintain
> its own checksums to catch data corruption.
>
> > Can you set memdev->flags to nd_region(_desc) so that the pmem driver
> > can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
> > the disk read-only or fail probing, and log errors accordingly.
>
> Will do.
I do not see this change in v4. Is this part of the pending changes
behind this release?
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-05-28 18:36 ` Toshi Kani
@ 2015-05-28 19:59 ` Dan Williams
2015-05-28 20:51 ` Linda Knippers
0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2015-05-28 19:59 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI,
Christoph Hellwig
On Thu, May 28, 2015 at 11:36 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Sat, 2015-05-09 at 16:55 -0700, Dan Williams wrote:
>> On Mon, May 4, 2015 at 1:26 PM, Toshi Kani <toshi.kani@hp.com> wrote:
>> > On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
> :
>> >
>> > The libnd does not support memdev->flags, which contains "Memory Device
>> > State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
>> > errors, we should only allow a failed NVDIMM be accessed with read-only
>> > for possible data recovery (or not allow any access when the data is
>> > completely lost), and should not let users operate normally over the
>> > corrupted data until the error is dealt properly.
>>
>> I agree with setting read-only access when these flags show that the
>> battery is not ready to persist new writes, but I don't think we
>> should block access in the case where the restore from flash failed.
>> If the data is potentially corrupted we should log that fact, but
>> otherwise enable access. I.e. potentially corrupt data is better than
>> unavailable data. It's up to filesystem or application to maintain
>> its own checksums to catch data corruption.
>>
>> > Can you set memdev->flags to nd_region(_desc) so that the pmem driver
>> > can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
>> > the disk read-only or fail probing, and log errors accordingly.
>>
>> Will do.
>
> I do not see this change in v4. Is this part of the pending changes
> behind this release?
Yes, I was holding it off until we had an upstream acceptance baseline
set. That is on hold pending Christoph's review. He's looking to
come back next Wednesday with deeper review comments. The runway to
land this in v4.2 is running short...
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-05-28 19:59 ` Dan Williams
@ 2015-05-28 20:51 ` Linda Knippers
2015-05-28 20:58 ` Dan Williams
0 siblings, 1 reply; 47+ messages in thread
From: Linda Knippers @ 2015-05-28 20:51 UTC (permalink / raw)
To: Dan Williams, Toshi Kani
Cc: linux-nvdimm@lists.01.org, Neil Brown, Greg KH, Rafael J. Wysocki,
Robert Moore, linux-kernel@vger.kernel.org, Linux ACPI,
Christoph Hellwig
On 5/28/2015 3:59 PM, Dan Williams wrote:
> On Thu, May 28, 2015 at 11:36 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>> On Sat, 2015-05-09 at 16:55 -0700, Dan Williams wrote:
>>> On Mon, May 4, 2015 at 1:26 PM, Toshi Kani <toshi.kani@hp.com> wrote:
>>>> On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
>> :
>>>>
>>>> The libnd does not support memdev->flags, which contains "Memory Device
>>>> State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
>>>> errors, we should only allow a failed NVDIMM be accessed with read-only
>>>> for possible data recovery (or not allow any access when the data is
>>>> completely lost), and should not let users operate normally over the
>>>> corrupted data until the error is dealt properly.
>>>
>>> I agree with setting read-only access when these flags show that the
>>> battery is not ready to persist new writes, but I don't think we
>>> should block access in the case where the restore from flash failed.
>>> If the data is potentially corrupted we should log that fact, but
>>> otherwise enable access. I.e. potentially corrupt data is better than
>>> unavailable data. It's up to filesystem or application to maintain
>>> its own checksums to catch data corruption.
>>>
>>>> Can you set memdev->flags to nd_region(_desc) so that the pmem driver
>>>> can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
>>>> the disk read-only or fail probing, and log errors accordingly.
>>>
>>> Will do.
>>
>> I do not see this change in v4. Is this part of the pending changes
>> behind this release?
>
> Yes, I was holding it off until we had an upstream acceptance baseline
> set. That is on hold pending Christoph's review. He's looking to
> come back next Wednesday with deeper review comments. The runway to
> land this in v4.2 is running short...
Hi Dan,
Do you have a short list of pending changes, especially if some weren't
discussed on the list? That might help reviewers.
I know we're still looking at and trying a number of things, like using
the BTT on today's NVDIMMs and adding another example DSM, so we will
have more feedback and patches and may need to adapt some of the
structure to do that. This can happen after the initial patches are
pulled in but I just wanted to let you know where we are. Not sure
about others.
-- ljk
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Linux-nvdimm] [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
2015-05-28 20:51 ` Linda Knippers
@ 2015-05-28 20:58 ` Dan Williams
0 siblings, 0 replies; 47+ messages in thread
From: Dan Williams @ 2015-05-28 20:58 UTC (permalink / raw)
To: Linda Knippers
Cc: Toshi Kani, linux-nvdimm@lists.01.org, Neil Brown, Greg KH,
Rafael J. Wysocki, Robert Moore, linux-kernel@vger.kernel.org,
Linux ACPI, Christoph Hellwig
On Thu, May 28, 2015 at 1:51 PM, Linda Knippers <linda.knippers@hp.com> wrote:
> On 5/28/2015 3:59 PM, Dan Williams wrote:
>> On Thu, May 28, 2015 at 11:36 AM, Toshi Kani <toshi.kani@hp.com> wrote:
>>> On Sat, 2015-05-09 at 16:55 -0700, Dan Williams wrote:
>>>> On Mon, May 4, 2015 at 1:26 PM, Toshi Kani <toshi.kani@hp.com> wrote:
>>>>> On Tue, 2015-04-28 at 14:24 -0400, Dan Williams wrote:
>>> :
>>>>>
>>>>> The libnd does not support memdev->flags, which contains "Memory Device
>>>>> State Flags" defined in Table 5-129 of ACPI 6.0. In case of major
>>>>> errors, we should only allow a failed NVDIMM be accessed with read-only
>>>>> for possible data recovery (or not allow any access when the data is
>>>>> completely lost), and should not let users operate normally over the
>>>>> corrupted data until the error is dealt properly.
>>>>
>>>> I agree with setting read-only access when these flags show that the
>>>> battery is not ready to persist new writes, but I don't think we
>>>> should block access in the case where the restore from flash failed.
>>>> If the data is potentially corrupted we should log that fact, but
>>>> otherwise enable access. I.e. potentially corrupt data is better than
>>>> unavailable data. It's up to filesystem or application to maintain
>>>> its own checksums to catch data corruption.
>>>>
>>>>> Can you set memdev->flags to nd_region(_desc) so that the pmem driver
>>>>> can check the status in nd_pmem_probe()? nd_pmem_probe() can then set
>>>>> the disk read-only or fail probing, and log errors accordingly.
>>>>
>>>> Will do.
>>>
>>> I do not see this change in v4. Is this part of the pending changes
>>> behind this release?
>>
>> Yes, I was holding it off until we had an upstream acceptance baseline
>> set. That is on hold pending Christoph's review. He's looking to
>> come back next Wednesday with deeper review comments. The runway to
>> land this in v4.2 is running short...
>
> Hi Dan,
>
> Do you have a short list of pending changes, especially if some weren't
> discussed on the list? That might help reviewers.
>
> I know we're still looking at and trying a number of things, like using
> the BTT on today's NVDIMMs and adding another example DSM, so we will
> have more feedback and patches and may need to adapt some of the
> structure to do that. This can happen after the initial patches are
> pulled in but I just wanted to let you know where we are. Not sure
> about others.
>
It seems it's just Christoph that has asserted there are things he'd
liked changed, so I don't see much potential for confusion in letting
out the pending backlog. I'll see to it.
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2015-05-28 20:58 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-28 18:24 [PATCH v2 00/20] libnd: non-volatile memory device support Dan Williams
2015-04-28 18:24 ` [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support Dan Williams
2015-04-30 23:23 ` Rafael J. Wysocki
2015-05-01 0:39 ` Dan Williams
2015-05-01 1:21 ` Rafael J. Wysocki
2015-05-01 16:23 ` Dan Williams
2015-05-04 23:58 ` Rafael J. Wysocki
2015-05-04 23:46 ` Dan Williams
2015-05-15 19:44 ` [Linux-nvdimm] " Jeff Moyer
2015-05-15 20:41 ` Dan Williams
2015-04-28 18:24 ` [PATCH v2 03/20] nd_acpi, nfit-test: manufactured NFITs for interface development Dan Williams
2015-05-15 20:25 ` [Linux-nvdimm] " Jeff Moyer
2015-05-15 20:50 ` Dan Williams
2015-04-28 18:24 ` [PATCH v2 04/20] libnd: ndctl class device, and nd bus attributes Dan Williams
2015-05-15 21:00 ` [Linux-nvdimm] " Jeff Moyer
2015-04-28 18:24 ` [PATCH v2 05/20] libnd, nd_acpi: dimm/memory-devices Dan Williams
2015-05-01 17:48 ` [Linux-nvdimm] " Toshi Kani
2015-05-01 18:22 ` Dan Williams
2015-05-01 18:19 ` Toshi Kani
2015-05-01 18:43 ` Dan Williams
2015-05-01 19:15 ` Toshi Kani
2015-05-01 19:38 ` Dan Williams
2015-05-01 20:08 ` Toshi Kani
2015-04-28 18:24 ` [PATCH v2 06/20] libnd: ndctl.h, the nd ioctl abi Dan Williams
2015-04-28 18:24 ` [PATCH v2 08/20] libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory) Dan Williams
2015-04-29 15:53 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
2015-04-29 15:59 ` Dan Williams
2015-05-04 20:26 ` Toshi Kani
2015-05-09 23:55 ` Dan Williams
2015-05-28 18:36 ` Toshi Kani
2015-05-28 19:59 ` Dan Williams
2015-05-28 20:51 ` Linda Knippers
2015-05-28 20:58 ` Dan Williams
2015-04-28 18:25 ` [PATCH v2 12/20] libnd, nd_acpi: add interleave-set state-tracking infrastructure Dan Williams
2015-04-28 20:52 ` [PATCH v2 00/20] libnd: non-volatile memory device support Andy Lutomirski
2015-04-28 20:59 ` Dan Williams
2015-04-28 21:06 ` Andy Lutomirski
2015-04-28 22:28 ` Dan Williams
2015-04-28 23:05 ` Andy Lutomirski
2015-04-30 20:56 ` Ross Zwisler
2015-04-28 21:24 ` [Linux-nvdimm] " Elliott, Robert (Server Storage)
2015-04-28 22:15 ` Dan Williams
2015-05-07 7:29 ` Christoph Hellwig
2015-04-29 0:25 ` Rafael J. Wysocki
2015-04-29 1:22 ` Dan Williams
2015-05-05 0:06 ` Rafael J. Wysocki
2015-05-08 6:31 ` Williams, Dan J
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox