* [PATCH v6 0/3] ndctl: add error clearing support for dev dax
@ 2017-05-04 21:48 Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Dave Jiang @ 2017-05-04 21:48 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm
The following series implements support for error clearing and error
listing for device dax through ndctl.
---
v2: Addressed comments from Vishal
- added bounds checking for the badblocks region.
- updated verbiage to use badblocks instead of poison.
- set default len to 1.
- fixed error out for stat
- fixed error out that was copy/paste error
- remove duplicate check_min_kver() in shell script
- fixed logic of checking empty badblocks
v3: Addressed comments from Toshi
- Fixed bad region path for badblocks
v4: Address comments from Toshi
- Added error output for length that exceeds badblock coverage.
v5: Address comments from Toshi
- Made -l 0 to error out and no length means 1 block cleared.
v6: Address comments from Dan
- Fixed clear-error to be based off of device instead of region.
- Added list-errors command to ndctl
Dave Jiang (3):
ndctl: add clear-error to ndctl for device dax
ndctl: add list-errors support
ndctl: add test for clear-error
Documentation/ndctl-clear-error.txt | 38 ++++
Documentation/ndctl-list-errors.txt | 26 +++
builtin.h | 2
ndctl/Makefile.am | 1
ndctl/clear-error.c | 352 +++++++++++++++++++++++++++++++++++
ndctl/lib/libndctl.c | 72 +++++++
ndctl/lib/libndctl.sym | 2
ndctl/libndctl.h.in | 10 +
ndctl/ndctl.c | 2 +
test/Makefile.am | 1 +
test/ndctl-clear-error-dax.sh | 66 +++++++
11 files changed, 572 insertions(+)
create mode 100644 Documentation/ndctl-clear-error.txt
create mode 100644 Documentation/ndctl-list-errors.txt
create mode 100644 ndctl/clear-error.c
create mode 100755 test/ndctl-clear-error-dax.sh
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
@ 2017-05-04 21:49 ` Dave Jiang
2017-05-05 19:14 ` Dan Williams
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
2017-05-04 21:49 ` [PATCH v6 3/3] ndctl: add test for clear-error Dave Jiang
2 siblings, 1 reply; 15+ messages in thread
From: Dave Jiang @ 2017-05-04 21:49 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm
Adding ndctl support that will allow clearing of bad blocks for a device.
Initial implementation will only support device dax devices. The ndctl
takes a device path and parameters of the starting bad block, and the number
of bad blocks to clear.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
Documentation/ndctl-clear-error.txt | 38 ++++++
builtin.h | 1
ndctl/Makefile.am | 1
ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
ndctl/lib/libndctl.c | 72 +++++++++++
ndctl/lib/libndctl.sym | 2
ndctl/libndctl.h.in | 10 +
ndctl/ndctl.c | 1
8 files changed, 364 insertions(+)
create mode 100644 Documentation/ndctl-clear-error.txt
create mode 100644 ndctl/clear-error.c
diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
new file mode 100644
index 0000000..b14521a
--- /dev/null
+++ b/Documentation/ndctl-clear-error.txt
@@ -0,0 +1,38 @@
+ndctl-clear-error(1)
+====================
+
+NAME
+----
+ndctl-clear-error - clear badblocks for a device
+
+SYNOPSIS
+--------
+[verse]
+'ndctl clear-error' [<options>]
+
+EXAMPLES
+--------
+
+Clear poison (bad blocks) for the provided device
+[verse]
+ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
+
+Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
+
+OPTIONS
+-------
+-f::
+--file::
+ The device/file to be cleared of poison (bad blocks).
+
+-s::
+--start::
+ The offset where the poison (bad block) starts for this device.
+ Typically this is acquired from the sysfs badblocks file.
+
+-l::
+--len::
+ The number of badblocks to clear in size of 512 bytes increments. The
+ length must fit within the badblocks range. If the length exceeds the
+ badblock range or is 0, the command will fail. Not providing a len
+ will result in a single block being cleared.
diff --git a/builtin.h b/builtin.h
index a8bc848..f522d00 100644
--- a/builtin.h
+++ b/builtin.h
@@ -30,4 +30,5 @@ int cmd_test(int argc, const char **argv, void *ctx);
#ifdef ENABLE_DESTRUCTIVE
int cmd_bat(int argc, const char **argv, void *ctx);
#endif
+int cmd_clear_error(int argc, const char **argv, void *ctx);
#endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index d346c04..8123169 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -11,6 +11,7 @@ ndctl_SOURCES = ndctl.c \
../util/log.c \
list.c \
test.c \
+ clear-error.c \
../util/json.c
if ENABLE_SMART
diff --git a/ndctl/clear-error.c b/ndctl/clear-error.c
new file mode 100644
index 0000000..29cd471
--- /dev/null
+++ b/ndctl/clear-error.c
@@ -0,0 +1,239 @@
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <string.h>
+#include <limits.h>
+#include <ccan/short_types/short_types.h>
+#include <ccan/array_size/array_size.h>
+#include <util/parse-options.h>
+#include <util/log.h>
+#include <ndctl/libndctl.h>
+#include <ndctl.h>
+
+struct clear_err {
+ const char *dev_name;
+ u64 bb_start;
+ unsigned int bb_len;
+ struct ndctl_cmd *ars_cap;
+ struct ndctl_cmd *clear_err;
+ struct ndctl_bus *bus;
+ struct ndctl_region *region;
+ struct ndctl_dax *dax;
+ struct ndctl_ctx *ctx;
+} clear_err = {
+ .bb_len = 1,
+};
+
+static int send_clear_error(struct ndctl_bus *bus, u64 start, u64 size)
+{
+ u64 cleared;
+ int rc;
+
+ clear_err.clear_err = ndctl_bus_cmd_new_clear_error(
+ start, size, clear_err.ars_cap);
+ if (!clear_err.clear_err) {
+ error("%s: bus: %s failed to create cmd\n",
+ __func__, ndctl_bus_get_provider(bus));
+ return -ENXIO;
+ }
+
+ rc = ndctl_cmd_submit(clear_err.clear_err);
+ if (rc) {
+ error("%s: bus: %s failed to submit cmd: %d\n",
+ __func__, ndctl_bus_get_provider(bus), rc);
+ ndctl_cmd_unref(clear_err.clear_err);
+ return rc;
+ }
+
+ cleared = ndctl_cmd_clear_error_get_cleared(clear_err.clear_err);
+ if (cleared != size) {
+ error("%s: bus: %s expected to clear: %ld actual: %ld\n",
+ __func__, ndctl_bus_get_provider(bus),
+ size, cleared);
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
+static int get_ars_cap(struct ndctl_bus *bus, u64 start, u64 size)
+{
+ int rc;
+
+ clear_err.ars_cap = ndctl_bus_cmd_new_ars_cap(bus, start, size);
+ if (!clear_err.ars_cap) {
+ error("%s: bus: %s failed to create cmd\n",
+ __func__, ndctl_bus_get_provider(bus));
+ return -ENOTTY;
+ }
+
+ rc = ndctl_cmd_submit(clear_err.ars_cap);
+ if (rc) {
+ error("%s: bus: %s failed to submit cmd: %d\n",
+ __func__, ndctl_bus_get_provider(bus), rc);
+ ndctl_cmd_unref(clear_err.ars_cap);
+ return rc;
+ }
+
+ if (ndctl_cmd_ars_cap_get_size(clear_err.ars_cap) <
+ sizeof(struct nd_cmd_ars_status)){
+ error("%s: bus: %s expected size >= %zd got: %d\n",
+ __func__, ndctl_bus_get_provider(bus),
+ sizeof(struct nd_cmd_ars_status),
+ ndctl_cmd_ars_cap_get_size(clear_err.ars_cap));
+ ndctl_cmd_unref(clear_err.ars_cap);
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
+static int match_dev(struct clear_err *ce, char *dev_name)
+{
+ ndctl_bus_foreach(ce->ctx, ce->bus)
+ ndctl_region_foreach(ce->bus, ce->region)
+ ndctl_dax_foreach(ce->region, ce->dax)
+ if (strncmp(basename(dev_name),
+ ndctl_dax_get_devname(ce->dax), 256)
+ == 0)
+ return 0;
+
+ return -ENODEV;
+}
+
+static int check_user_input_range(struct ndctl_dax *dax,
+ unsigned long long start, unsigned int len)
+{
+ struct ndctl_region *region = ndctl_dax_get_region(dax);
+ struct badblock *bb;
+ uint64_t rbegin;
+ uint64_t dbegin;
+ uint64_t bb_start;
+
+ rbegin = ndctl_region_get_resource(region);
+ /* physical addr of DAX begin */
+ dbegin = ndctl_dax_get_resource(dax);
+ /* badblocks start relative to region from DAX device */
+ bb_start = ((dbegin - rbegin) >> 9) + start;
+
+ /* check region */
+ ndctl_region_badblock_foreach(region, bb)
+ if (bb_start >= bb->offset &&
+ bb_start + len <= bb->offset + bb->len)
+ return 1;
+
+ return 0;
+}
+
+static int clear_error(struct clear_err *ce)
+{
+ struct stat stats;
+ int rc;
+ char dev_name[256];
+ uint64_t dev_base;
+ unsigned long long start;
+ unsigned int len;
+
+ strncpy(dev_name, ce->dev_name, 256);
+
+ rc = stat(dev_name, &stats);
+ if (rc < 0) {
+ perror("stat failed");
+ error("Unable to stat %s\n", dev_name);
+ return -1;
+ }
+
+ if (!S_ISCHR(stats.st_mode)) {
+ error("%s not DAX device\n", dev_name);
+ return -1;
+ }
+
+ rc = ndctl_new(&ce->ctx);
+ if (rc)
+ return rc;
+
+ if ((rc = match_dev(ce, dev_name)) < 0)
+ goto cleanup;
+
+ dev_base = ndctl_dax_get_resource(ce->dax);
+ if (dev_base == ULLONG_MAX) {
+ rc = -ERANGE;
+ goto cleanup;
+ }
+
+ if (check_user_input_range(ce->dax, clear_err.bb_start,
+ clear_err.bb_len) == 0) {
+ rc = -EINVAL;
+ error("Badblocks region input out of range.\n");
+ goto cleanup;
+ }
+
+ start = dev_base + clear_err.bb_start * 512;
+ len = clear_err.bb_len * 512;
+
+ rc = get_ars_cap(ce->bus, start, len);
+ if (rc) {
+ fprintf(stderr, "get_ars_cap failed\n");
+ goto cleanup;
+ }
+
+ rc = send_clear_error(ce->bus, start, len);
+ if (rc) {
+ fprintf(stderr, "send_clear_error failed\n");
+ goto cleanup;
+ }
+
+ rc = 0;
+
+cleanup:
+ ndctl_unref(ce->ctx);
+ return rc;
+}
+
+int cmd_clear_error(int argc, const char **argv, void *ctx)
+{
+ int i, rc;
+ const char * const u[] = {
+ "ndctl clear-error [<options>]",
+ NULL
+ };
+ const struct option options[] = {
+ OPT_STRING('f', "file", &clear_err.dev_name, "device-name",
+ "device/file name to be operated on"),
+ OPT_U64('s', "start", &clear_err.bb_start,
+ "badblock start"),
+ OPT_UINTEGER('l', "len", &clear_err.bb_len, "badblock length"),
+ OPT_END(),
+ };
+
+ argc = parse_options(argc, argv, options, u, 0);
+
+ for (i = 0; i < argc; i++)
+ error("Unknown parameter \"%s\"\n", argv[i]);
+
+ if (argc)
+ usage_with_options(u, options);
+
+ if (!clear_err.dev_name) {
+ error("Missing device/file name passed in\n");
+ usage_with_options(u, options);
+ return -EINVAL;
+ }
+
+ if (clear_err.bb_len == 0) {
+ error("Clearing with 0 length, failed.\n");
+ return -EINVAL;
+ }
+
+ rc = clear_error(&clear_err);
+ if (rc)
+ return rc;
+
+ return 0;
+}
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index ac1fc63..9bf9d96 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -229,6 +229,8 @@ struct ndctl_region {
int state;
unsigned long long cookie;
} iset;
+ FILE *badblocks;
+ struct badblock bb;
};
/**
@@ -1867,6 +1869,76 @@ NDCTL_EXPORT struct ndctl_dimm *ndctl_region_get_next_dimm(struct ndctl_region *
return NULL;
}
+static int regions_badblocks_init(struct ndctl_region *region)
+{
+ struct ndctl_ctx *ctx = ndctl_region_get_ctx(region);
+ char *bb_path;
+ int rc = 0;
+
+ /* if the file is already open */
+ if (region->badblocks) {
+ fclose(region->badblocks);
+ region->badblocks = NULL;
+ }
+
+ if (asprintf(&bb_path, "%s/badblocks",
+ region->region_path) < 0) {
+ rc = -errno;
+ err(ctx, "region badblocks path allocation failure\n");
+ return rc;
+ }
+
+ region->badblocks = fopen(bb_path, "r");
+ if (!region->badblocks) {
+ rc = -errno;
+ err(ctx, "region badblocks fopen failed\n");
+ return -rc;
+ }
+
+ free(bb_path);
+ return rc;
+}
+
+NDCTL_EXPORT struct badblock *ndctl_region_get_next_badblock(struct ndctl_region *region)
+{
+ int rc;
+ char *buf = NULL;
+ size_t rlen = 0;
+
+ if (!region->badblocks)
+ return NULL;
+
+ rc = getline(&buf, &rlen, region->badblocks);
+ if (rc == -1) {
+ free(buf);
+ return NULL;
+ }
+
+ rc = sscanf(buf, "%llu %u", ®ion->bb.offset, ®ion->bb.len);
+ free(buf);
+ if (rc != 2) {
+ /* end of the road, clean up */
+ fclose(region->badblocks);
+ region->badblocks = NULL;
+ region->bb.offset = 0;
+ region->bb.len = 0;
+ return NULL;
+ }
+
+ return ®ion->bb;
+}
+
+NDCTL_EXPORT struct badblock *ndctl_region_get_first_badblock(struct ndctl_region *region)
+{
+ int rc;
+
+ rc = regions_badblocks_init(region);
+ if (rc < 0)
+ return NULL;
+
+ return ndctl_region_get_next_badblock(region);
+}
+
static struct nd_cmd_vendor_tail *to_vendor_tail(struct ndctl_cmd *cmd)
{
struct nd_cmd_vendor_tail *tail = (struct nd_cmd_vendor_tail *)
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index b5a085c..9bc36a3 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -140,6 +140,8 @@ global:
ndctl_region_get_ro;
ndctl_region_set_ro;
ndctl_region_get_resource;
+ ndctl_region_get_first_badblock;
+ ndctl_region_get_next_badblock;
ndctl_interleave_set_get_first;
ndctl_interleave_set_get_next;
ndctl_interleave_set_is_active;
diff --git a/ndctl/libndctl.h.in b/ndctl/libndctl.h.in
index 6ee8a35..2c45d2d 100644
--- a/ndctl/libndctl.h.in
+++ b/ndctl/libndctl.h.in
@@ -372,6 +372,10 @@ int ndctl_cmd_get_status(struct ndctl_cmd *cmd);
unsigned int ndctl_cmd_get_firmware_status(struct ndctl_cmd *cmd);
int ndctl_cmd_submit(struct ndctl_cmd *cmd);
+struct badblock {
+ unsigned long long offset;
+ unsigned int len;
+};
struct ndctl_region;
struct ndctl_region *ndctl_region_get_first(struct ndctl_bus *bus);
struct ndctl_region *ndctl_region_get_next(struct ndctl_region *region);
@@ -379,6 +383,12 @@ struct ndctl_region *ndctl_region_get_next(struct ndctl_region *region);
for (region = ndctl_region_get_first(bus); \
region != NULL; \
region = ndctl_region_get_next(region))
+struct badblock *ndctl_region_get_first_badblock(struct ndctl_region *region);
+struct badblock *ndctl_region_get_next_badblock(struct ndctl_region *region);
+#define ndctl_region_badblock_foreach(region, badblock) \
+ for (badblock = ndctl_region_get_first_badblock(region); \
+ badblock != NULL; \
+ badblock = ndctl_region_get_next_badblock(region))
unsigned int ndctl_region_get_id(struct ndctl_region *region);
const char *ndctl_region_get_devname(struct ndctl_region *region);
unsigned int ndctl_region_get_interleave_ways(struct ndctl_region *region);
diff --git a/ndctl/ndctl.c b/ndctl/ndctl.c
index 4b08c9b..144dc23 100644
--- a/ndctl/ndctl.c
+++ b/ndctl/ndctl.c
@@ -67,6 +67,7 @@ static struct cmd_struct commands[] = {
{ "write-labels", cmd_write_labels },
{ "init-labels", cmd_init_labels },
{ "check-labels", cmd_check_labels },
+ { "clear-error", cmd_clear_error },
{ "list", cmd_list },
{ "help", cmd_help },
#ifdef ENABLE_TEST
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 2/3] ndctl: add list-errors support
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
@ 2017-05-04 21:49 ` Dave Jiang
2017-05-05 17:07 ` Kani, Toshimitsu
2017-05-04 21:49 ` [PATCH v6 3/3] ndctl: add test for clear-error Dave Jiang
2 siblings, 1 reply; 15+ messages in thread
From: Dave Jiang @ 2017-05-04 21:49 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm
Adding list-errors command to support displaying of all badblocks in
relation to the device rather than the region. This allows the user to
know what badblocks to pass in when doing clear-error calls.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
Documentation/ndctl-list-errors.txt | 26 +++++++
builtin.h | 1
ndctl/clear-error.c | 133 ++++++++++++++++++++++++++++++++---
ndctl/ndctl.c | 1
4 files changed, 151 insertions(+), 10 deletions(-)
create mode 100644 Documentation/ndctl-list-errors.txt
diff --git a/Documentation/ndctl-list-errors.txt b/Documentation/ndctl-list-errors.txt
new file mode 100644
index 0000000..f831ba0
--- /dev/null
+++ b/Documentation/ndctl-list-errors.txt
@@ -0,0 +1,26 @@
+ndctl-list-errors(1)
+====================
+
+NAME
+----
+ndctl-list-errors - list badblocks specifically in relation to a device
+
+SYNOPSIS
+--------
+[verse]
+'ndctl list-errors' [<options>]
+
+EXAMPLES
+--------
+
+List bad blocks for the provided device
+[verse]
+ndctl list-errors -f /dev/dax0.0
+
+List all badblocks for device /dev/dax0.0.
+
+OPTIONS
+-------
+-f::
+--file::
+ The device/file to show the badblocks.
diff --git a/builtin.h b/builtin.h
index f522d00..655904f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -31,4 +31,5 @@ int cmd_test(int argc, const char **argv, void *ctx);
int cmd_bat(int argc, const char **argv, void *ctx);
#endif
int cmd_clear_error(int argc, const char **argv, void *ctx);
+int cmd_list_errors(int argc, const char **argv, void *ctx);
#endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/clear-error.c b/ndctl/clear-error.c
index 29cd471..e15e821 100644
--- a/ndctl/clear-error.c
+++ b/ndctl/clear-error.c
@@ -131,29 +131,37 @@ static int check_user_input_range(struct ndctl_dax *dax,
return 0;
}
-static int clear_error(struct clear_err *ce)
+static int check_dax_dev(const char *dev_name)
{
- struct stat stats;
int rc;
- char dev_name[256];
- uint64_t dev_base;
- unsigned long long start;
- unsigned int len;
-
- strncpy(dev_name, ce->dev_name, 256);
+ struct stat stats;
rc = stat(dev_name, &stats);
if (rc < 0) {
+ rc = -errno;
perror("stat failed");
error("Unable to stat %s\n", dev_name);
- return -1;
+ return rc;
}
if (!S_ISCHR(stats.st_mode)) {
error("%s not DAX device\n", dev_name);
- return -1;
+ return -ENODEV;
}
+ return 0;
+}
+
+static int clear_error(struct clear_err *ce)
+{
+ int rc;
+ char dev_name[256];
+ uint64_t dev_base;
+ unsigned long long start;
+ unsigned int len;
+
+ strncpy(dev_name, ce->dev_name, 256);
+
rc = ndctl_new(&ce->ctx);
if (rc)
return rc;
@@ -231,9 +239,114 @@ int cmd_clear_error(int argc, const char **argv, void *ctx)
return -EINVAL;
}
+ if ((rc = check_dax_dev(clear_err.dev_name)) < 0)
+ return rc;
+
rc = clear_error(&clear_err);
if (rc)
return rc;
return 0;
}
+
+static int print_dax_badblocks(struct ndctl_dax *dax)
+{
+ struct ndctl_region *region = ndctl_dax_get_region(dax);
+ struct badblock *bb;
+ uint64_t region_begin;
+ uint64_t dax_begin, dax_end;
+
+ region_begin = ndctl_region_get_resource(region);
+ dax_begin = ndctl_dax_get_resource(dax);
+ dax_end = dax_begin + ndctl_dax_get_size(dax) - 1;
+
+ /* check region */
+ ndctl_region_badblock_foreach(region, bb) {
+ uint64_t bb_begin, bb_end;
+ uint64_t begin, end;
+
+ bb_begin = region_begin + (bb->offset << 9);
+ bb_end = bb_begin + (bb->len << 9) - 1;
+
+ if (bb_begin <= dax_begin)
+ begin = dax_begin;
+ else if (bb_begin < dax_end)
+ begin = bb_begin;
+ else
+ begin = 0;
+
+ if (begin) {
+ if (bb_end <= dax_end)
+ end = bb_end;
+ else
+ end = dax_end;
+ } else
+ continue;
+
+ printf("%llu %u\n",
+ (unsigned long long)(begin - dax_begin) >> 9,
+ (unsigned int)(end - begin + 1) >> 9);
+ }
+
+ return 0;
+}
+
+static int list_errors(struct clear_err *ce)
+{
+ int rc;
+ char dev_name[256];
+
+ strncpy(dev_name, ce->dev_name, 256);
+
+ rc = ndctl_new(&ce->ctx);
+ if (rc)
+ return rc;
+
+ if ((rc = match_dev(ce, dev_name)) < 0)
+ goto cleanup;
+
+ if ((rc = print_dax_badblocks(ce->dax)) < 0)
+ goto cleanup;
+
+ rc = 0;
+
+cleanup:
+ ndctl_unref(ce->ctx);
+ return rc;
+}
+
+int cmd_list_errors(int argc, const char **argv, void *ctx)
+{
+ int i, rc;
+ const char * const u[] = {
+ "ndctl clear-error [<options>]",
+ NULL
+ };
+ const struct option options[] = {
+ OPT_STRING('f', "file", &clear_err.dev_name, "device-name",
+ "device/file name to be operated on"),
+ OPT_END(),
+ };
+
+ argc = parse_options(argc, argv, options, u, 0);
+
+ for (i = 0; i < argc; i++)
+ error("Unknown parameter \"%s\"\n", argv[i]);
+
+ if (argc)
+ usage_with_options(u, options);
+
+ if (!clear_err.dev_name) {
+ error("Missing device/file name passed in\n");
+ usage_with_options(u, options);
+ return -EINVAL;
+ }
+
+ if ((rc = check_dax_dev(clear_err.dev_name)) < 0)
+ return rc;
+
+ if ((rc = list_errors(&clear_err)) < 0)
+ return rc;
+
+ return 0;
+}
diff --git a/ndctl/ndctl.c b/ndctl/ndctl.c
index 144dc23..a698096 100644
--- a/ndctl/ndctl.c
+++ b/ndctl/ndctl.c
@@ -68,6 +68,7 @@ static struct cmd_struct commands[] = {
{ "init-labels", cmd_init_labels },
{ "check-labels", cmd_check_labels },
{ "clear-error", cmd_clear_error },
+ { "list-errors", cmd_list_errors },
{ "list", cmd_list },
{ "help", cmd_help },
#ifdef ENABLE_TEST
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 3/3] ndctl: add test for clear-error
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
@ 2017-05-04 21:49 ` Dave Jiang
2 siblings, 0 replies; 15+ messages in thread
From: Dave Jiang @ 2017-05-04 21:49 UTC (permalink / raw)
To: dan.j.williams; +Cc: linux-nvdimm
Adding unit test for ndctl clear-error (and list-errors)
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
test/Makefile.am | 1 +
test/ndctl-clear-error-dax.sh | 66 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+)
create mode 100755 test/ndctl-clear-error-dax.sh
diff --git a/test/Makefile.am b/test/Makefile.am
index 9353a34..3cd159e 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -10,6 +10,7 @@ TESTS =\
clear.sh \
dax-errors.sh \
daxdev-errors.sh \
+ ndctl-clear-error-dax.sh \
btt-check.sh \
label-compat.sh \
blk-exhaust.sh
diff --git a/test/ndctl-clear-error-dax.sh b/test/ndctl-clear-error-dax.sh
new file mode 100755
index 0000000..1608db9
--- /dev/null
+++ b/test/ndctl-clear-error-dax.sh
@@ -0,0 +1,66 @@
+#!/bin/bash -x
+DEV=""
+NDCTL="../ndctl/ndctl"
+BUS="-b nfit_test.0"
+BUS1="-b nfit_test.1"
+json2var="s/[{}\",]//g; s/:/=/g"
+rc=77
+
+check_min_kver()
+{
+ local ver="$1"
+ : "${KVER:=$(uname -r)}"
+
+ [ -n "$ver" ] || return 1
+ [[ "$ver" == "$(echo -e "$ver\n$KVER" | sort -V | head -1)" ]]
+}
+
+check_min_kver "4.12" || { echo "kernel $KVER lacks dax dev error handling"; exit $rc; }
+
+set -e
+
+err() {
+ echo "test/clear: failed at line $1"
+ exit $rc
+}
+
+set -e
+trap 'err $LINENO' ERR
+
+# setup (reset nfit_test dimms)
+modprobe nfit_test
+$NDCTL disable-region $BUS all
+$NDCTL zero-labels $BUS all
+$NDCTL enable-region $BUS all
+
+rc=1
+
+query=". | sort_by(.available_size) | reverse | .[0].dev"
+region=$($NDCTL list $BUS -t pmem -Ri | jq -r "$query")
+
+# create dax
+chardev="x"
+json=$($NDCTL create-namespace $BUS -r $region -t pmem -m dax -a 4096)
+chardev=$(echo $json | jq -r ". | select(.mode == \"dax\") | .daxregion.devices[0].chardev")
+[ $chardev = "x" ] && echo "fail: $LINENO" && exit 1
+
+json1=$($NDCTL list $BUS)
+eval $(sed -e "$json2var" <<< "$json1")
+
+read sector len <<< `$NDCTL list-errors -f /dev/$chardev`
+echo "sector: $sector len: $len"
+
+# clearing using ndctl
+$NDCTL clear-error -f /dev/$chardev -s $sector -l $len
+
+# check badblocks, should be empty
+read sector len <<< `$NDCTL list-errors -f /dev/$chardev`
+[ -n "$sector" ] && echo "fail: $LINENO" && exit 1
+
+echo "badblocks empty, expected"
+
+$NDCTL disable-region $BUS all
+$NDCTL disable-region $BUS1 all
+modprobe -r nfit_test
+
+exit 0
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
@ 2017-05-05 17:07 ` Kani, Toshimitsu
2017-05-05 17:14 ` Dave Jiang
0 siblings, 1 reply; 15+ messages in thread
From: Kani, Toshimitsu @ 2017-05-05 17:07 UTC (permalink / raw)
To: dan.j.williams@intel.com, dave.jiang@intel.com; +Cc: linux-nvdimm@lists.01.org
On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
> Adding list-errors command to support displaying of all badblocks in
> relation to the device rather than the region. This allows the user
> to know what badblocks to pass in when doing clear-error calls.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> Documentation/ndctl-list-errors.txt | 26 +++++++
> builtin.h | 1
> ndctl/clear-error.c | 133
> ++++++++++++++++++++++++++++++++---
> ndctl/ndctl.c | 1
> 4 files changed, 151 insertions(+), 10 deletions(-)
> create mode 100644 Documentation/ndctl-list-errors.txt
>
> diff --git a/Documentation/ndctl-list-errors.txt
> b/Documentation/ndctl-list-errors.txt
> new file mode 100644
> index 0000000..f831ba0
> --- /dev/null
> +++ b/Documentation/ndctl-list-errors.txt
> @@ -0,0 +1,26 @@
> +ndctl-list-errors(1)
> +====================
> +
> +NAME
> +----
> +ndctl-list-errors - list badblocks specifically in relation to a
> device
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'ndctl list-errors' [<options>]
> +
> +EXAMPLES
> +--------
> +
> +List bad blocks for the provided device
> +[verse]
> +ndctl list-errors -f /dev/dax0.0
> +
> +List all badblocks for device /dev/dax0.0.
Hi Dave,
I am not getting sensible values from list-errors. Also, please
describe the output format of this command in the document.
# cat /sys/bus/nd/devices/region0/size
17179869184
# cat /sys/class/dax/dax0.0/size
16909336576
# cat /sys/bus/nd/devices/region0/badblocks
1048576 1
1572864 1
# ndctl list-errors -f /dev/dax0.0
0 3145729
0 3670017
Thanks,
-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-05 17:07 ` Kani, Toshimitsu
@ 2017-05-05 17:14 ` Dave Jiang
2017-05-05 17:21 ` Kani, Toshimitsu
0 siblings, 1 reply; 15+ messages in thread
From: Dave Jiang @ 2017-05-05 17:14 UTC (permalink / raw)
To: Kani, Toshimitsu, Williams, Dan J; +Cc: linux-nvdimm@lists.01.org
On 05/05/2017 10:07 AM, Kani, Toshimitsu wrote:
> On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
>> Adding list-errors command to support displaying of all badblocks in
>> relation to the device rather than the region. This allows the user
>> to know what badblocks to pass in when doing clear-error calls.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> Documentation/ndctl-list-errors.txt | 26 +++++++
>> builtin.h | 1
>> ndctl/clear-error.c | 133
>> ++++++++++++++++++++++++++++++++---
>> ndctl/ndctl.c | 1
>> 4 files changed, 151 insertions(+), 10 deletions(-)
>> create mode 100644 Documentation/ndctl-list-errors.txt
>>
>> diff --git a/Documentation/ndctl-list-errors.txt
>> b/Documentation/ndctl-list-errors.txt
>> new file mode 100644
>> index 0000000..f831ba0
>> --- /dev/null
>> +++ b/Documentation/ndctl-list-errors.txt
>> @@ -0,0 +1,26 @@
>> +ndctl-list-errors(1)
>> +====================
>> +
>> +NAME
>> +----
>> +ndctl-list-errors - list badblocks specifically in relation to a
>> device
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'ndctl list-errors' [<options>]
>> +
>> +EXAMPLES
>> +--------
>> +
>> +List bad blocks for the provided device
>> +[verse]
>> +ndctl list-errors -f /dev/dax0.0
>> +
>> +List all badblocks for device /dev/dax0.0.
>
> Hi Dave,
>
> I am not getting sensible values from list-errors. Also, please
> describe the output format of this command in the document.
>
> # cat /sys/bus/nd/devices/region0/size
> 17179869184
>
> # cat /sys/class/dax/dax0.0/size
> 16909336576
>
> # cat /sys/bus/nd/devices/region0/badblocks
> 1048576 1
> 1572864 1
>
> # ndctl list-errors -f /dev/dax0.0
> 0 3145729
> 0 3670017
That is very strange. It looks correct with nfit_test for me. How do I
reproduce your case or do I actually need actual hardware?
>
> Thanks,
> -Toshi
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-05 17:14 ` Dave Jiang
@ 2017-05-05 17:21 ` Kani, Toshimitsu
2017-05-05 17:27 ` Dave Jiang
0 siblings, 1 reply; 15+ messages in thread
From: Kani, Toshimitsu @ 2017-05-05 17:21 UTC (permalink / raw)
To: dan.j.williams@intel.com, dave.jiang@intel.com; +Cc: linux-nvdimm@lists.01.org
On Fri, 2017-05-05 at 10:14 -0700, Dave Jiang wrote:
>
> On 05/05/2017 10:07 AM, Kani, Toshimitsu wrote:
> > On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
> > > Adding list-errors command to support displaying of all badblocks
> > > in relation to the device rather than the region. This allows the
> > > user to know what badblocks to pass in when doing clear-error
> > > calls.
> > >
> > > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > > ---
> > > Documentation/ndctl-list-errors.txt | 26 +++++++
> > > builtin.h | 1
> > > ndctl/clear-error.c | 133
> > > ++++++++++++++++++++++++++++++++---
> > > ndctl/ndctl.c | 1
> > > 4 files changed, 151 insertions(+), 10 deletions(-)
> > > create mode 100644 Documentation/ndctl-list-errors.txt
> > >
> > > diff --git a/Documentation/ndctl-list-errors.txt
> > > b/Documentation/ndctl-list-errors.txt
> > > new file mode 100644
> > > index 0000000..f831ba0
> > > --- /dev/null
> > > +++ b/Documentation/ndctl-list-errors.txt
> > > @@ -0,0 +1,26 @@
> > > +ndctl-list-errors(1)
> > > +====================
> > > +
> > > +NAME
> > > +----
> > > +ndctl-list-errors - list badblocks specifically in relation to a
> > > device
> > > +
> > > +SYNOPSIS
> > > +--------
> > > +[verse]
> > > +'ndctl list-errors' [<options>]
> > > +
> > > +EXAMPLES
> > > +--------
> > > +
> > > +List bad blocks for the provided device
> > > +[verse]
> > > +ndctl list-errors -f /dev/dax0.0
> > > +
> > > +List all badblocks for device /dev/dax0.0.
> >
> > Hi Dave,
> >
> > I am not getting sensible values from list-errors. Also, please
> > describe the output format of this command in the document.
> >
> > # cat /sys/bus/nd/devices/region0/size
> > 17179869184
> >
> > # cat /sys/class/dax/dax0.0/size
> > 16909336576
> >
> > # cat /sys/bus/nd/devices/region0/badblocks
> > 1048576 1
> > 1572864 1
> >
> > # ndctl list-errors -f /dev/dax0.0
> > 0 3145729
> > 0 3670017
>
> That is very strange. It looks correct with nfit_test for me. How do
> I reproduce your case or do I actually need actual hardware?
Yes, my steps need a hardware, but this badblocks handling itself
should be platform-independent. I will check to see why I got these
values.
Thanks,
-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-05 17:21 ` Kani, Toshimitsu
@ 2017-05-05 17:27 ` Dave Jiang
2017-05-05 17:35 ` Kani, Toshimitsu
0 siblings, 1 reply; 15+ messages in thread
From: Dave Jiang @ 2017-05-05 17:27 UTC (permalink / raw)
To: Kani, Toshimitsu, Williams, Dan J; +Cc: linux-nvdimm@lists.01.org
On 05/05/2017 10:21 AM, Kani, Toshimitsu wrote:
> On Fri, 2017-05-05 at 10:14 -0700, Dave Jiang wrote:
>>
>> On 05/05/2017 10:07 AM, Kani, Toshimitsu wrote:
>>> On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
>>>> Adding list-errors command to support displaying of all badblocks
>>>> in relation to the device rather than the region. This allows the
>>>> user to know what badblocks to pass in when doing clear-error
>>>> calls.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>> Documentation/ndctl-list-errors.txt | 26 +++++++
>>>> builtin.h | 1
>>>> ndctl/clear-error.c | 133
>>>> ++++++++++++++++++++++++++++++++---
>>>> ndctl/ndctl.c | 1
>>>> 4 files changed, 151 insertions(+), 10 deletions(-)
>>>> create mode 100644 Documentation/ndctl-list-errors.txt
>>>>
>>>> diff --git a/Documentation/ndctl-list-errors.txt
>>>> b/Documentation/ndctl-list-errors.txt
>>>> new file mode 100644
>>>> index 0000000..f831ba0
>>>> --- /dev/null
>>>> +++ b/Documentation/ndctl-list-errors.txt
>>>> @@ -0,0 +1,26 @@
>>>> +ndctl-list-errors(1)
>>>> +====================
>>>> +
>>>> +NAME
>>>> +----
>>>> +ndctl-list-errors - list badblocks specifically in relation to a
>>>> device
>>>> +
>>>> +SYNOPSIS
>>>> +--------
>>>> +[verse]
>>>> +'ndctl list-errors' [<options>]
>>>> +
>>>> +EXAMPLES
>>>> +--------
>>>> +
>>>> +List bad blocks for the provided device
>>>> +[verse]
>>>> +ndctl list-errors -f /dev/dax0.0
>>>> +
>>>> +List all badblocks for device /dev/dax0.0.
>>>
>>> Hi Dave,
>>>
>>> I am not getting sensible values from list-errors. Also, please
>>> describe the output format of this command in the document.
>>>
>>> # cat /sys/bus/nd/devices/region0/size
>>> 17179869184
>>>
>>> # cat /sys/class/dax/dax0.0/size
>>> 16909336576
>>>
>>> # cat /sys/bus/nd/devices/region0/badblocks
>>> 1048576 1
>>> 1572864 1
>>>
>>> # ndctl list-errors -f /dev/dax0.0
>>> 0 3145729
>>> 0 3670017
>>
>> That is very strange. It looks correct with nfit_test for me. How do
>> I reproduce your case or do I actually need actual hardware?
>
> Yes, my steps need a hardware, but this badblocks handling itself
> should be platform-independent. I will check to see why I got these
> values.
Thanks! what is your region/resource and dax/resource?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-05 17:27 ` Dave Jiang
@ 2017-05-05 17:35 ` Kani, Toshimitsu
2017-05-05 17:49 ` Kani, Toshimitsu
0 siblings, 1 reply; 15+ messages in thread
From: Kani, Toshimitsu @ 2017-05-05 17:35 UTC (permalink / raw)
To: dan.j.williams@intel.com, dave.jiang@intel.com; +Cc: linux-nvdimm@lists.01.org
On Fri, 2017-05-05 at 10:27 -0700, Dave Jiang wrote:
>
> On 05/05/2017 10:21 AM, Kani, Toshimitsu wrote:
> > On Fri, 2017-05-05 at 10:14 -0700, Dave Jiang wrote:
> > >
> > > On 05/05/2017 10:07 AM, Kani, Toshimitsu wrote:
> > > > On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
> > > > > Adding list-errors command to support displaying of all
> > > > > badblocks in relation to the device rather than the region.
> > > > > This allows the user to know what badblocks to pass in when
> > > > > doing clear-error calls.
> > > > >
> > > > > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > > > > ---
> > > > > Documentation/ndctl-list-errors.txt | 26 +++++++
> > > > > builtin.h | 1
> > > > > ndctl/clear-error.c | 133
> > > > > ++++++++++++++++++++++++++++++++---
> > > > > ndctl/ndctl.c | 1
> > > > > 4 files changed, 151 insertions(+), 10 deletions(-)
> > > > > create mode 100644 Documentation/ndctl-list-errors.txt
> > > > >
> > > > > diff --git a/Documentation/ndctl-list-errors.txt
> > > > > b/Documentation/ndctl-list-errors.txt
> > > > > new file mode 100644
> > > > > index 0000000..f831ba0
> > > > > --- /dev/null
> > > > > +++ b/Documentation/ndctl-list-errors.txt
> > > > > @@ -0,0 +1,26 @@
> > > > > +ndctl-list-errors(1)
> > > > > +====================
> > > > > +
> > > > > +NAME
> > > > > +----
> > > > > +ndctl-list-errors - list badblocks specifically in relation
> > > > > to a
> > > > > device
> > > > > +
> > > > > +SYNOPSIS
> > > > > +--------
> > > > > +[verse]
> > > > > +'ndctl list-errors' [<options>]
> > > > > +
> > > > > +EXAMPLES
> > > > > +--------
> > > > > +
> > > > > +List bad blocks for the provided device
> > > > > +[verse]
> > > > > +ndctl list-errors -f /dev/dax0.0
> > > > > +
> > > > > +List all badblocks for device /dev/dax0.0.
> > > >
> > > > Hi Dave,
> > > >
> > > > I am not getting sensible values from list-errors. Also,
> > > > please describe the output format of this command in the
> > > > document.
> > > >
> > > > # cat /sys/bus/nd/devices/region0/size
> > > > 17179869184
> > > >
> > > > # cat /sys/class/dax/dax0.0/size
> > > > 16909336576
> > > >
> > > > # cat /sys/bus/nd/devices/region0/badblocks
> > > > 1048576 1
> > > > 1572864 1
> > > >
> > > > # ndctl list-errors -f /dev/dax0.0
> > > > 0 3145729
> > > > 0 3670017
> > >
> > > That is very strange. It looks correct with nfit_test for me. How
> > > do I reproduce your case or do I actually need actual hardware?
> >
> > Yes, my steps need a hardware, but this badblocks handling itself
> > should be platform-independent. I will check to see why I got
> > these values.
>
> Thanks! what is your region/resource and dax/resource?
See attached. Let me know if you need more info.
Thanks,
-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] ndctl: add list-errors support
2017-05-05 17:35 ` Kani, Toshimitsu
@ 2017-05-05 17:49 ` Kani, Toshimitsu
0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshimitsu @ 2017-05-05 17:49 UTC (permalink / raw)
To: dan.j.williams@intel.com, dave.jiang@intel.com; +Cc: linux-nvdimm@lists.01.org
On Fri, 2017-05-05 at 11:35 -0600, Toshi Kani wrote:
> On Fri, 2017-05-05 at 10:27 -0700, Dave Jiang wrote:
> >
> > On 05/05/2017 10:21 AM, Kani, Toshimitsu wrote:
> > > On Fri, 2017-05-05 at 10:14 -0700, Dave Jiang wrote:
> > > >
> > > > On 05/05/2017 10:07 AM, Kani, Toshimitsu wrote:
> > > > > On Thu, 2017-05-04 at 14:49 -0700, Dave Jiang wrote:
> > > > > > Adding list-errors command to support displaying of all
> > > > > > badblocks in relation to the device rather than the region.
> > > > > > This allows the user to know what badblocks to pass in when
> > > > > > doing clear-error calls.
> > > > > >
> > > > > > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > > > > > ---
> > > > > > Documentation/ndctl-list-errors.txt | 26 +++++++
> > > > > > builtin.h | 1
> > > > > > ndctl/clear-error.c | 133
> > > > > > ++++++++++++++++++++++++++++++++---
> > > > > > ndctl/ndctl.c | 1
> > > > > > 4 files changed, 151 insertions(+), 10 deletions(-)
> > > > > > create mode 100644 Documentation/ndctl-list-errors.txt
> > > > > >
> > > > > > diff --git a/Documentation/ndctl-list-errors.txt
> > > > > > b/Documentation/ndctl-list-errors.txt
> > > > > > new file mode 100644
> > > > > > index 0000000..f831ba0
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/ndctl-list-errors.txt
> > > > > > @@ -0,0 +1,26 @@
> > > > > > +ndctl-list-errors(1)
> > > > > > +====================
> > > > > > +
> > > > > > +NAME
> > > > > > +----
> > > > > > +ndctl-list-errors - list badblocks specifically in
> > > > > > relation
> > > > > > to a
> > > > > > device
> > > > > > +
> > > > > > +SYNOPSIS
> > > > > > +--------
> > > > > > +[verse]
> > > > > > +'ndctl list-errors' [<options>]
> > > > > > +
> > > > > > +EXAMPLES
> > > > > > +--------
> > > > > > +
> > > > > > +List bad blocks for the provided device
> > > > > > +[verse]
> > > > > > +ndctl list-errors -f /dev/dax0.0
> > > > > > +
> > > > > > +List all badblocks for device /dev/dax0.0.
> > > > >
> > > > > Hi Dave,
> > > > >
> > > > > I am not getting sensible values from list-errors. Also,
> > > > > please describe the output format of this command in the
> > > > > document.
> > > > >
> > > > > # cat /sys/bus/nd/devices/region0/size
> > > > > 17179869184
> > > > >
> > > > > # cat /sys/class/dax/dax0.0/size
> > > > > 16909336576
> > > > >
> > > > > # cat /sys/bus/nd/devices/region0/badblocks
> > > > > 1048576 1
> > > > > 1572864 1
> > > > >
> > > > > # ndctl list-errors -f /dev/dax0.0
> > > > > 0 3145729
> > > > > 0 3670017
> > > >
> > > > That is very strange. It looks correct with nfit_test for me.
> > > > How do I reproduce your case or do I actually need actual
> > > > hardware?
> > >
> > > Yes, my steps need a hardware, but this badblocks handling itself
> > > should be platform-independent. I will check to see why I got
> > > these values.
> >
> > Thanks! what is your region/resource and dax/resource?
>
> See attached. Let me know if you need more info.
dax_begin and dax_end get -1 in print_dax_badblocks() because
ndctl_dax_get_resource() and ndctl_dax_get_size() failed. I think this
is due to the numbering of sysfs dax files, which Dan an I talked about
yesterday.
Thanks,
-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
@ 2017-05-05 19:14 ` Dan Williams
2017-05-05 19:33 ` Dan Williams
2017-05-05 19:41 ` Linda Knippers
0 siblings, 2 replies; 15+ messages in thread
From: Dan Williams @ 2017-05-05 19:14 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-nvdimm@lists.01.org
On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
> Adding ndctl support that will allow clearing of bad blocks for a device.
> Initial implementation will only support device dax devices. The ndctl
> takes a device path and parameters of the starting bad block, and the number
> of bad blocks to clear.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> Documentation/ndctl-clear-error.txt | 38 ++++++
> builtin.h | 1
> ndctl/Makefile.am | 1
> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
> ndctl/lib/libndctl.c | 72 +++++++++++
> ndctl/lib/libndctl.sym | 2
> ndctl/libndctl.h.in | 10 +
> ndctl/ndctl.c | 1
> 8 files changed, 364 insertions(+)
> create mode 100644 Documentation/ndctl-clear-error.txt
> create mode 100644 ndctl/clear-error.c
>
> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
> new file mode 100644
> index 0000000..b14521a
> --- /dev/null
> +++ b/Documentation/ndctl-clear-error.txt
> @@ -0,0 +1,38 @@
> +ndctl-clear-error(1)
> +====================
> +
> +NAME
> +----
> +ndctl-clear-error - clear badblocks for a device
I think "clear-error" is too ambiguous of a name, lets call the
commands repair-media-errors and list-media-errors.
I'd also like to see the list-media-errors patch first in the series
since repairing is just an incremental step on top of listing. I don't
think the word "badblocks" should appear anywhere in this document.
That's a kernel implementation detail.
The other benefit of "repair" vs "clear" is communicating that the
data may be indeterminate after repair. Hopefully in the future we'll
get a standard mechanism to determine if the platform supports atomic
error clearing with a determinate value.
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'ndctl clear-error' [<options>]
> +
> +EXAMPLES
> +--------
> +
> +Clear poison (bad blocks) for the provided device
> +[verse]
> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
> +
> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
The "poison" term is x86 specific. Lets just use "media error" everywhere.
> +
> +OPTIONS
> +-------
> +-f::
> +--file::
> + The device/file to be cleared of poison (bad blocks).
Let's make this --d/--device and drop the "file" option. For files on
filesystems there's other ways to clear errors and I wouldn't
necessarily want them to use this tool. Should also make it clear that
the device argument must refer to a device in an nvdimm bus hierarchy.
> +
> +-s::
> +--start::
> + The offset where the poison (bad block) starts for this device.
> + Typically this is acquired from the sysfs badblocks file.
I assume this is in terms of 512-byte blocks, but we should clarify
the units. The second sentence can be reworded to recommend using the
list-media-errors command. Lets call the option -o / --offset and also
allow a "-o all" to repair all errors reported by list-media-errors.
> +
> +-l::
> +--len::
> + The number of badblocks to clear in size of 512 bytes increments. The
> + length must fit within the badblocks range. If the length exceeds the
> + badblock range or is 0, the command will fail. Not providing a len
> + will result in a single block being cleared.
s/badblocks/media error/
s/--len/--length/
> diff --git a/builtin.h b/builtin.h
> index a8bc848..f522d00 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -30,4 +30,5 @@ int cmd_test(int argc, const char **argv, void *ctx);
> #ifdef ENABLE_DESTRUCTIVE
> int cmd_bat(int argc, const char **argv, void *ctx);
> #endif
> +int cmd_clear_error(int argc, const char **argv, void *ctx);
> #endif /* _NDCTL_BUILTIN_H_ */
> diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
> index d346c04..8123169 100644
> --- a/ndctl/Makefile.am
> +++ b/ndctl/Makefile.am
> @@ -11,6 +11,7 @@ ndctl_SOURCES = ndctl.c \
> ../util/log.c \
> list.c \
> test.c \
> + clear-error.c \
> ../util/json.c
>
> if ENABLE_SMART
> diff --git a/ndctl/clear-error.c b/ndctl/clear-error.c
> new file mode 100644
> index 0000000..29cd471
> --- /dev/null
> +++ b/ndctl/clear-error.c
> @@ -0,0 +1,239 @@
> +#include <stdio.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +#include <libgen.h>
> +#include <string.h>
> +#include <limits.h>
> +#include <ccan/short_types/short_types.h>
> +#include <ccan/array_size/array_size.h>
> +#include <util/parse-options.h>
> +#include <util/log.h>
> +#include <ndctl/libndctl.h>
> +#include <ndctl.h>
> +
> +struct clear_err {
> + const char *dev_name;
> + u64 bb_start;
> + unsigned int bb_len;
> + struct ndctl_cmd *ars_cap;
> + struct ndctl_cmd *clear_err;
> + struct ndctl_bus *bus;
> + struct ndctl_region *region;
> + struct ndctl_dax *dax;
> + struct ndctl_ctx *ctx;
> +} clear_err = {
> + .bb_len = 1,
> +};
> +
> +static int send_clear_error(struct ndctl_bus *bus, u64 start, u64 size)
> +{
> + u64 cleared;
> + int rc;
> +
> + clear_err.clear_err = ndctl_bus_cmd_new_clear_error(
> + start, size, clear_err.ars_cap);
> + if (!clear_err.clear_err) {
> + error("%s: bus: %s failed to create cmd\n",
> + __func__, ndctl_bus_get_provider(bus));
> + return -ENXIO;
> + }
> +
> + rc = ndctl_cmd_submit(clear_err.clear_err);
> + if (rc) {
> + error("%s: bus: %s failed to submit cmd: %d\n",
> + __func__, ndctl_bus_get_provider(bus), rc);
> + ndctl_cmd_unref(clear_err.clear_err);
> + return rc;
> + }
> +
> + cleared = ndctl_cmd_clear_error_get_cleared(clear_err.clear_err);
> + if (cleared != size) {
> + error("%s: bus: %s expected to clear: %ld actual: %ld\n",
> + __func__, ndctl_bus_get_provider(bus),
> + size, cleared);
> + return -ENXIO;
> + }
> +
> + return 0;
> +}
> +
> +static int get_ars_cap(struct ndctl_bus *bus, u64 start, u64 size)
> +{
> + int rc;
> +
> + clear_err.ars_cap = ndctl_bus_cmd_new_ars_cap(bus, start, size);
> + if (!clear_err.ars_cap) {
> + error("%s: bus: %s failed to create cmd\n",
> + __func__, ndctl_bus_get_provider(bus));
> + return -ENOTTY;
> + }
> +
> + rc = ndctl_cmd_submit(clear_err.ars_cap);
> + if (rc) {
> + error("%s: bus: %s failed to submit cmd: %d\n",
> + __func__, ndctl_bus_get_provider(bus), rc);
> + ndctl_cmd_unref(clear_err.ars_cap);
> + return rc;
> + }
> +
> + if (ndctl_cmd_ars_cap_get_size(clear_err.ars_cap) <
> + sizeof(struct nd_cmd_ars_status)){
> + error("%s: bus: %s expected size >= %zd got: %d\n",
> + __func__, ndctl_bus_get_provider(bus),
> + sizeof(struct nd_cmd_ars_status),
> + ndctl_cmd_ars_cap_get_size(clear_err.ars_cap));
> + ndctl_cmd_unref(clear_err.ars_cap);
> + return -ENXIO;
> + }
> +
> + return 0;
> +}
> +
> +static int match_dev(struct clear_err *ce, char *dev_name)
> +{
> + ndctl_bus_foreach(ce->ctx, ce->bus)
> + ndctl_region_foreach(ce->bus, ce->region)
> + ndctl_dax_foreach(ce->region, ce->dax)
> + if (strncmp(basename(dev_name),
> + ndctl_dax_get_devname(ce->dax), 256)
This device is an nvdimm nd_dax device, not a device-dax character
device instance. See calls to util_daxctl_region_to_json() and the
implementation of that routine for how to lookup the device-dax
character device from the ndctl_dax instance returned by
ndctl_dax_foreach().
I don't like that this routine is silently trampling on ce->dax. If it
finds the proper dax instance it should return it directly or NULL
otherwise.
Lastly you'll need to match by actual major:minor number otherwise how
do you know that an argument of /dev/my-special-device-dax-symlink
refers to the device-dax instance you're attempting to match?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-05 19:14 ` Dan Williams
@ 2017-05-05 19:33 ` Dan Williams
2017-05-05 19:41 ` Linda Knippers
1 sibling, 0 replies; 15+ messages in thread
From: Dan Williams @ 2017-05-05 19:33 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-nvdimm@lists.01.org
On Fri, May 5, 2017 at 12:14 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>> Adding ndctl support that will allow clearing of bad blocks for a device.
>> Initial implementation will only support device dax devices. The ndctl
>> takes a device path and parameters of the starting bad block, and the number
>> of bad blocks to clear.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> Documentation/ndctl-clear-error.txt | 38 ++++++
>> builtin.h | 1
>> ndctl/Makefile.am | 1
>> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
>> ndctl/lib/libndctl.c | 72 +++++++++++
>> ndctl/lib/libndctl.sym | 2
>> ndctl/libndctl.h.in | 10 +
>> ndctl/ndctl.c | 1
>> 8 files changed, 364 insertions(+)
>> create mode 100644 Documentation/ndctl-clear-error.txt
>> create mode 100644 ndctl/clear-error.c
>>
>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>> new file mode 100644
>> index 0000000..b14521a
>> --- /dev/null
>> +++ b/Documentation/ndctl-clear-error.txt
>> @@ -0,0 +1,38 @@
>> +ndctl-clear-error(1)
>> +====================
>> +
>> +NAME
>> +----
>> +ndctl-clear-error - clear badblocks for a device
>
> I think "clear-error" is too ambiguous of a name, lets call the
> commands repair-media-errors and list-media-errors.
Actually let's kill the separate 'list-media-errors' command and just
add it as optional output to our existing 'list' command. With that in
place the repair command can be shortened to "repair-media"
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-05 19:14 ` Dan Williams
2017-05-05 19:33 ` Dan Williams
@ 2017-05-05 19:41 ` Linda Knippers
2017-05-05 20:01 ` Dan Williams
1 sibling, 1 reply; 15+ messages in thread
From: Linda Knippers @ 2017-05-05 19:41 UTC (permalink / raw)
To: Dan Williams, Dave Jiang; +Cc: linux-nvdimm@lists.01.org
On 05/05/2017 03:14 PM, Dan Williams wrote:
> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>> Adding ndctl support that will allow clearing of bad blocks for a device.
>> Initial implementation will only support device dax devices. The ndctl
>> takes a device path and parameters of the starting bad block, and the number
>> of bad blocks to clear.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> Documentation/ndctl-clear-error.txt | 38 ++++++
>> builtin.h | 1
>> ndctl/Makefile.am | 1
>> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
>> ndctl/lib/libndctl.c | 72 +++++++++++
>> ndctl/lib/libndctl.sym | 2
>> ndctl/libndctl.h.in | 10 +
>> ndctl/ndctl.c | 1
>> 8 files changed, 364 insertions(+)
>> create mode 100644 Documentation/ndctl-clear-error.txt
>> create mode 100644 ndctl/clear-error.c
>>
>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>> new file mode 100644
>> index 0000000..b14521a
>> --- /dev/null
>> +++ b/Documentation/ndctl-clear-error.txt
>> @@ -0,0 +1,38 @@
>> +ndctl-clear-error(1)
>> +====================
>> +
>> +NAME
>> +----
>> +ndctl-clear-error - clear badblocks for a device
>
> I think "clear-error" is too ambiguous of a name, lets call the
> commands repair-media-errors and list-media-errors.
>
> I'd also like to see the list-media-errors patch first in the series
> since repairing is just an incremental step on top of listing. I don't
> think the word "badblocks" should appear anywhere in this document.
> That's a kernel implementation detail.
Is this just for device dax? If so, that should be more clear in the text,
rather than just used in the example. If it's for other types of pmem devices,
then badblocks would make sense to keep, assuming it relates to the
badblocks information exposed in /sys.
> The other benefit of "repair" vs "clear" is communicating that the
> data may be indeterminate after repair.
For me, repair means you fixed it. If you want to communicate that the
data is indeterminate, I think you should say that in the doc rather
than assuming people interpret the word the same way you do.
> Hopefully in the future we'll
> get a standard mechanism to determine if the platform supports atomic
> error clearing with a determinate value.
>
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'ndctl clear-error' [<options>]
>> +
>> +EXAMPLES
>> +--------
>> +
>> +Clear poison (bad blocks) for the provided device
>> +[verse]
>> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
>> +
>> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
>
> The "poison" term is x86 specific. Lets just use "media error" everywhere.
If you really mean uncorrectable error reported by an Address Range Scrub, you
could say that too. Or are there other types of media errors that this could
work with?
>
>> +
>> +OPTIONS
>> +-------
>> +-f::
>> +--file::
>> + The device/file to be cleared of poison (bad blocks).
>
> Let's make this --d/--device and drop the "file" option. For files on
> filesystems there's other ways to clear errors and I wouldn't
> necessarily want them to use this tool. Should also make it clear that
> the device argument must refer to a device in an nvdimm bus hierarchy.
>
>> +
>> +-s::
>> +--start::
>> + The offset where the poison (bad block) starts for this device.
>> + Typically this is acquired from the sysfs badblocks file.
>
> I assume this is in terms of 512-byte blocks, but we should clarify
> the units. The second sentence can be reworded to recommend using the
> list-media-errors command. Lets call the option -o / --offset and also
> allow a "-o all" to repair all errors reported by list-media-errors.
>
>> +
>> +-l::
>> +--len::
>> + The number of badblocks to clear in size of 512 bytes increments. The
>> + length must fit within the badblocks range. If the length exceeds the
>> + badblock range or is 0, the command will fail. Not providing a len
>> + will result in a single block being cleared.
>
> s/badblocks/media error/
If it's really not a badblock, does the 512 byte increment still make sense?
There's a DSM to get the unit size for what can be cleared which may not
be 512. If the clearable unit is smaller, we're clearing too much. If the unit
is larger, then we can't clear less than that.
-- ljk
>
> s/--len/--length/
>
>> diff --git a/builtin.h b/builtin.h
>> index a8bc848..f522d00 100644
>> --- a/builtin.h
>> +++ b/builtin.h
>> @@ -30,4 +30,5 @@ int cmd_test(int argc, const char **argv, void *ctx);
>> #ifdef ENABLE_DESTRUCTIVE
>> int cmd_bat(int argc, const char **argv, void *ctx);
>> #endif
>> +int cmd_clear_error(int argc, const char **argv, void *ctx);
>> #endif /* _NDCTL_BUILTIN_H_ */
>> diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
>> index d346c04..8123169 100644
>> --- a/ndctl/Makefile.am
>> +++ b/ndctl/Makefile.am
>> @@ -11,6 +11,7 @@ ndctl_SOURCES = ndctl.c \
>> ../util/log.c \
>> list.c \
>> test.c \
>> + clear-error.c \
>> ../util/json.c
>>
>> if ENABLE_SMART
>> diff --git a/ndctl/clear-error.c b/ndctl/clear-error.c
>> new file mode 100644
>> index 0000000..29cd471
>> --- /dev/null
>> +++ b/ndctl/clear-error.c
>> @@ -0,0 +1,239 @@
>> +#include <stdio.h>
>> +#include <errno.h>
>> +#include <stdlib.h>
>> +#include <stdint.h>
>> +#include <stdbool.h>
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <unistd.h>
>> +#include <libgen.h>
>> +#include <string.h>
>> +#include <limits.h>
>> +#include <ccan/short_types/short_types.h>
>> +#include <ccan/array_size/array_size.h>
>> +#include <util/parse-options.h>
>> +#include <util/log.h>
>> +#include <ndctl/libndctl.h>
>> +#include <ndctl.h>
>> +
>> +struct clear_err {
>> + const char *dev_name;
>> + u64 bb_start;
>> + unsigned int bb_len;
>> + struct ndctl_cmd *ars_cap;
>> + struct ndctl_cmd *clear_err;
>> + struct ndctl_bus *bus;
>> + struct ndctl_region *region;
>> + struct ndctl_dax *dax;
>> + struct ndctl_ctx *ctx;
>> +} clear_err = {
>> + .bb_len = 1,
>> +};
>> +
>> +static int send_clear_error(struct ndctl_bus *bus, u64 start, u64 size)
>> +{
>> + u64 cleared;
>> + int rc;
>> +
>> + clear_err.clear_err = ndctl_bus_cmd_new_clear_error(
>> + start, size, clear_err.ars_cap);
>> + if (!clear_err.clear_err) {
>> + error("%s: bus: %s failed to create cmd\n",
>> + __func__, ndctl_bus_get_provider(bus));
>> + return -ENXIO;
>> + }
>> +
>> + rc = ndctl_cmd_submit(clear_err.clear_err);
>> + if (rc) {
>> + error("%s: bus: %s failed to submit cmd: %d\n",
>> + __func__, ndctl_bus_get_provider(bus), rc);
>> + ndctl_cmd_unref(clear_err.clear_err);
>> + return rc;
>> + }
>> +
>> + cleared = ndctl_cmd_clear_error_get_cleared(clear_err.clear_err);
>> + if (cleared != size) {
>> + error("%s: bus: %s expected to clear: %ld actual: %ld\n",
>> + __func__, ndctl_bus_get_provider(bus),
>> + size, cleared);
>> + return -ENXIO;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int get_ars_cap(struct ndctl_bus *bus, u64 start, u64 size)
>> +{
>> + int rc;
>> +
>> + clear_err.ars_cap = ndctl_bus_cmd_new_ars_cap(bus, start, size);
>> + if (!clear_err.ars_cap) {
>> + error("%s: bus: %s failed to create cmd\n",
>> + __func__, ndctl_bus_get_provider(bus));
>> + return -ENOTTY;
>> + }
>> +
>> + rc = ndctl_cmd_submit(clear_err.ars_cap);
>> + if (rc) {
>> + error("%s: bus: %s failed to submit cmd: %d\n",
>> + __func__, ndctl_bus_get_provider(bus), rc);
>> + ndctl_cmd_unref(clear_err.ars_cap);
>> + return rc;
>> + }
>> +
>> + if (ndctl_cmd_ars_cap_get_size(clear_err.ars_cap) <
>> + sizeof(struct nd_cmd_ars_status)){
>> + error("%s: bus: %s expected size >= %zd got: %d\n",
>> + __func__, ndctl_bus_get_provider(bus),
>> + sizeof(struct nd_cmd_ars_status),
>> + ndctl_cmd_ars_cap_get_size(clear_err.ars_cap));
>> + ndctl_cmd_unref(clear_err.ars_cap);
>> + return -ENXIO;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int match_dev(struct clear_err *ce, char *dev_name)
>> +{
>> + ndctl_bus_foreach(ce->ctx, ce->bus)
>> + ndctl_region_foreach(ce->bus, ce->region)
>> + ndctl_dax_foreach(ce->region, ce->dax)
>> + if (strncmp(basename(dev_name),
>> + ndctl_dax_get_devname(ce->dax), 256)
>
> This device is an nvdimm nd_dax device, not a device-dax character
> device instance. See calls to util_daxctl_region_to_json() and the
> implementation of that routine for how to lookup the device-dax
> character device from the ndctl_dax instance returned by
> ndctl_dax_foreach().
>
> I don't like that this routine is silently trampling on ce->dax. If it
> finds the proper dax instance it should return it directly or NULL
> otherwise.
>
> Lastly you'll need to match by actual major:minor number otherwise how
> do you know that an argument of /dev/my-special-device-dax-symlink
> refers to the device-dax instance you're attempting to match?
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-05 19:41 ` Linda Knippers
@ 2017-05-05 20:01 ` Dan Williams
2017-05-05 20:23 ` Linda Knippers
0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2017-05-05 20:01 UTC (permalink / raw)
To: Linda Knippers; +Cc: linux-nvdimm@lists.01.org
On Fri, May 5, 2017 at 12:41 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 05/05/2017 03:14 PM, Dan Williams wrote:
>> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>>> Adding ndctl support that will allow clearing of bad blocks for a device.
>>> Initial implementation will only support device dax devices. The ndctl
>>> takes a device path and parameters of the starting bad block, and the number
>>> of bad blocks to clear.
>>>
>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>> ---
>>> Documentation/ndctl-clear-error.txt | 38 ++++++
>>> builtin.h | 1
>>> ndctl/Makefile.am | 1
>>> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
>>> ndctl/lib/libndctl.c | 72 +++++++++++
>>> ndctl/lib/libndctl.sym | 2
>>> ndctl/libndctl.h.in | 10 +
>>> ndctl/ndctl.c | 1
>>> 8 files changed, 364 insertions(+)
>>> create mode 100644 Documentation/ndctl-clear-error.txt
>>> create mode 100644 ndctl/clear-error.c
>>>
>>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>>> new file mode 100644
>>> index 0000000..b14521a
>>> --- /dev/null
>>> +++ b/Documentation/ndctl-clear-error.txt
>>> @@ -0,0 +1,38 @@
>>> +ndctl-clear-error(1)
>>> +====================
>>> +
>>> +NAME
>>> +----
>>> +ndctl-clear-error - clear badblocks for a device
>>
>> I think "clear-error" is too ambiguous of a name, lets call the
>> commands repair-media-errors and list-media-errors.
>>
>> I'd also like to see the list-media-errors patch first in the series
>> since repairing is just an incremental step on top of listing. I don't
>> think the word "badblocks" should appear anywhere in this document.
>> That's a kernel implementation detail.
>
> Is this just for device dax? If so, that should be more clear in the text,
> rather than just used in the example. If it's for other types of pmem devices,
> then badblocks would make sense to keep, assuming it relates to the
> badblocks information exposed in /sys.
It's not necessarily just for device-dax, that's just a starting
point. The tool could gather errors some other way, so I don't think
we want to leak the "badblocks" name to this interface.
>> The other benefit of "repair" vs "clear" is communicating that the
>> data may be indeterminate after repair.
>
> For me, repair means you fixed it. If you want to communicate that the
> data is indeterminate, I think you should say that in the doc rather
> than assuming people interpret the word the same way you do.
True, and I think this aligns with another discomfort that we're
trying to define an explicit error clearing interface when it really
should an interface that writes data with error clearing as a side
effect. Just like the block device error clearing case. I think we
should abandon this command and jsut go create a command that just
knows how to clear errors while writing, then a lot of these
interpretation problems go away.
>> Hopefully in the future we'll
>> get a standard mechanism to determine if the platform supports atomic
>> error clearing with a determinate value.
>>
>>> +
>>> +SYNOPSIS
>>> +--------
>>> +[verse]
>>> +'ndctl clear-error' [<options>]
>>> +
>>> +EXAMPLES
>>> +--------
>>> +
>>> +Clear poison (bad blocks) for the provided device
>>> +[verse]
>>> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
>>> +
>>> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
>>
>> The "poison" term is x86 specific. Lets just use "media error" everywhere.
>
> If you really mean uncorrectable error reported by an Address Range Scrub, you
> could say that too. Or are there other types of media errors that this could
> work with?
I'm thinking about other architectures outside of x86 + ACPI that may
have different terminology. So, the intent was to have something
generic, but this concern also goes away if we just move to a model
where the tool writes new data to clear errors.
>>> +
>>> +-l::
>>> +--len::
>>> + The number of badblocks to clear in size of 512 bytes increments. The
>>> + length must fit within the badblocks range. If the length exceeds the
>>> + badblock range or is 0, the command will fail. Not providing a len
>>> + will result in a single block being cleared.
>>
>> s/badblocks/media error/
>
> If it's really not a badblock, does the 512 byte increment still make sense?
> There's a DSM to get the unit size for what can be cleared which may not
> be 512. If the clearable unit is smaller, we're clearing too much. If the unit
> is larger, then we can't clear less than that.
Hopefully the units never get larger than 512. Since we can't address
smaller than 512 bytes for block devices, I'd rather not expose the
smaller than 512-byte error clearing support with this interface.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
2017-05-05 20:01 ` Dan Williams
@ 2017-05-05 20:23 ` Linda Knippers
0 siblings, 0 replies; 15+ messages in thread
From: Linda Knippers @ 2017-05-05 20:23 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-nvdimm@lists.01.org
On 05/05/2017 04:01 PM, Dan Williams wrote:
> On Fri, May 5, 2017 at 12:41 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 05/05/2017 03:14 PM, Dan Williams wrote:
>>> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>>>> Adding ndctl support that will allow clearing of bad blocks for a device.
>>>> Initial implementation will only support device dax devices. The ndctl
>>>> takes a device path and parameters of the starting bad block, and the number
>>>> of bad blocks to clear.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>> Documentation/ndctl-clear-error.txt | 38 ++++++
>>>> builtin.h | 1
>>>> ndctl/Makefile.am | 1
>>>> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
>>>> ndctl/lib/libndctl.c | 72 +++++++++++
>>>> ndctl/lib/libndctl.sym | 2
>>>> ndctl/libndctl.h.in | 10 +
>>>> ndctl/ndctl.c | 1
>>>> 8 files changed, 364 insertions(+)
>>>> create mode 100644 Documentation/ndctl-clear-error.txt
>>>> create mode 100644 ndctl/clear-error.c
>>>>
>>>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>>>> new file mode 100644
>>>> index 0000000..b14521a
>>>> --- /dev/null
>>>> +++ b/Documentation/ndctl-clear-error.txt
>>>> @@ -0,0 +1,38 @@
>>>> +ndctl-clear-error(1)
>>>> +====================
>>>> +
>>>> +NAME
>>>> +----
>>>> +ndctl-clear-error - clear badblocks for a device
>>>
>>> I think "clear-error" is too ambiguous of a name, lets call the
>>> commands repair-media-errors and list-media-errors.
>>>
>>> I'd also like to see the list-media-errors patch first in the series
>>> since repairing is just an incremental step on top of listing. I don't
>>> think the word "badblocks" should appear anywhere in this document.
>>> That's a kernel implementation detail.
>>
>> Is this just for device dax? If so, that should be more clear in the text,
>> rather than just used in the example. If it's for other types of pmem devices,
>> then badblocks would make sense to keep, assuming it relates to the
>> badblocks information exposed in /sys.
>
> It's not necessarily just for device-dax, that's just a starting
> point. The tool could gather errors some other way, so I don't think
> we want to leak the "badblocks" name to this interface.
If the current implementation is for device-dax, then the doc should say that.
I can change later (assuming the comment option survives).
>
>>> The other benefit of "repair" vs "clear" is communicating that the
>>> data may be indeterminate after repair.
>>
>> For me, repair means you fixed it. If you want to communicate that the
>> data is indeterminate, I think you should say that in the doc rather
>> than assuming people interpret the word the same way you do.
>
> True, and I think this aligns with another discomfort that we're
> trying to define an explicit error clearing interface when it really
> should an interface that writes data with error clearing as a side
> effect. Just like the block device error clearing case. I think we
> should abandon this command and jsut go create a command that just
> knows how to clear errors while writing, then a lot of these
> interpretation problems go away.
I'd be happy if the documentation just said what it actually does
or doesn't do.
>
>
>>> Hopefully in the future we'll
>>> get a standard mechanism to determine if the platform supports atomic
>>> error clearing with a determinate value.
>>>
>>>> +
>>>> +SYNOPSIS
>>>> +--------
>>>> +[verse]
>>>> +'ndctl clear-error' [<options>]
>>>> +
>>>> +EXAMPLES
>>>> +--------
>>>> +
>>>> +Clear poison (bad blocks) for the provided device
>>>> +[verse]
>>>> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
>>>> +
>>>> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
>>>
>>> The "poison" term is x86 specific. Lets just use "media error" everywhere.
>>
>> If you really mean uncorrectable error reported by an Address Range Scrub, you
>> could say that too. Or are there other types of media errors that this could
>> work with?
>
> I'm thinking about other architectures outside of x86 + ACPI that may
> have different terminology. So, the intent was to have something
> generic, but this concern also goes away if we just move to a model
> where the tool writes new data to clear errors.
I can definitely see going outside of the x86 part but can you really
get away without ACPI for this?
>
>>>> +
>>>> +-l::
>>>> +--len::
>>>> + The number of badblocks to clear in size of 512 bytes increments. The
>>>> + length must fit within the badblocks range. If the length exceeds the
>>>> + badblock range or is 0, the command will fail. Not providing a len
>>>> + will result in a single block being cleared.
>>>
>>> s/badblocks/media error/
>>
>> If it's really not a badblock, does the 512 byte increment still make sense?
>> There's a DSM to get the unit size for what can be cleared which may not
>> be 512. If the clearable unit is smaller, we're clearing too much. If the unit
>> is larger, then we can't clear less than that.
>
> Hopefully the units never get larger than 512. Since we can't address
> smaller than 512 bytes for block devices, I'd rather not expose the
> smaller than 512-byte error clearing support with this interface.
In any case, some argument checking would be helpful for better error messages
in whatever tool ends up doing this.
-- ljk
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-05-05 20:23 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
2017-05-05 19:14 ` Dan Williams
2017-05-05 19:33 ` Dan Williams
2017-05-05 19:41 ` Linda Knippers
2017-05-05 20:01 ` Dan Williams
2017-05-05 20:23 ` Linda Knippers
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
2017-05-05 17:07 ` Kani, Toshimitsu
2017-05-05 17:14 ` Dave Jiang
2017-05-05 17:21 ` Kani, Toshimitsu
2017-05-05 17:27 ` Dave Jiang
2017-05-05 17:35 ` Kani, Toshimitsu
2017-05-05 17:49 ` Kani, Toshimitsu
2017-05-04 21:49 ` [PATCH v6 3/3] ndctl: add test for clear-error Dave Jiang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.