From: Linda Knippers <linda.knippers@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
Date: Fri, 5 May 2017 16:23:23 -0400 [thread overview]
Message-ID: <590CDF3B.8020303@hpe.com> (raw)
In-Reply-To: <CAPcyv4h9NfpF_n376u69FK5ddfJELBNhSaGFDrD5ku0LcMaxvw@mail.gmail.com>
On 05/05/2017 04:01 PM, Dan Williams wrote:
> On Fri, May 5, 2017 at 12:41 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 05/05/2017 03:14 PM, Dan Williams wrote:
>>> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>>>> Adding ndctl support that will allow clearing of bad blocks for a device.
>>>> Initial implementation will only support device dax devices. The ndctl
>>>> takes a device path and parameters of the starting bad block, and the number
>>>> of bad blocks to clear.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>> Documentation/ndctl-clear-error.txt | 38 ++++++
>>>> builtin.h | 1
>>>> ndctl/Makefile.am | 1
>>>> ndctl/clear-error.c | 239 +++++++++++++++++++++++++++++++++++
>>>> ndctl/lib/libndctl.c | 72 +++++++++++
>>>> ndctl/lib/libndctl.sym | 2
>>>> ndctl/libndctl.h.in | 10 +
>>>> ndctl/ndctl.c | 1
>>>> 8 files changed, 364 insertions(+)
>>>> create mode 100644 Documentation/ndctl-clear-error.txt
>>>> create mode 100644 ndctl/clear-error.c
>>>>
>>>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>>>> new file mode 100644
>>>> index 0000000..b14521a
>>>> --- /dev/null
>>>> +++ b/Documentation/ndctl-clear-error.txt
>>>> @@ -0,0 +1,38 @@
>>>> +ndctl-clear-error(1)
>>>> +====================
>>>> +
>>>> +NAME
>>>> +----
>>>> +ndctl-clear-error - clear badblocks for a device
>>>
>>> I think "clear-error" is too ambiguous of a name, lets call the
>>> commands repair-media-errors and list-media-errors.
>>>
>>> I'd also like to see the list-media-errors patch first in the series
>>> since repairing is just an incremental step on top of listing. I don't
>>> think the word "badblocks" should appear anywhere in this document.
>>> That's a kernel implementation detail.
>>
>> Is this just for device dax? If so, that should be more clear in the text,
>> rather than just used in the example. If it's for other types of pmem devices,
>> then badblocks would make sense to keep, assuming it relates to the
>> badblocks information exposed in /sys.
>
> It's not necessarily just for device-dax, that's just a starting
> point. The tool could gather errors some other way, so I don't think
> we want to leak the "badblocks" name to this interface.
If the current implementation is for device-dax, then the doc should say that.
I can change later (assuming the comment option survives).
>
>>> The other benefit of "repair" vs "clear" is communicating that the
>>> data may be indeterminate after repair.
>>
>> For me, repair means you fixed it. If you want to communicate that the
>> data is indeterminate, I think you should say that in the doc rather
>> than assuming people interpret the word the same way you do.
>
> True, and I think this aligns with another discomfort that we're
> trying to define an explicit error clearing interface when it really
> should an interface that writes data with error clearing as a side
> effect. Just like the block device error clearing case. I think we
> should abandon this command and jsut go create a command that just
> knows how to clear errors while writing, then a lot of these
> interpretation problems go away.
I'd be happy if the documentation just said what it actually does
or doesn't do.
>
>
>>> Hopefully in the future we'll
>>> get a standard mechanism to determine if the platform supports atomic
>>> error clearing with a determinate value.
>>>
>>>> +
>>>> +SYNOPSIS
>>>> +--------
>>>> +[verse]
>>>> +'ndctl clear-error' [<options>]
>>>> +
>>>> +EXAMPLES
>>>> +--------
>>>> +
>>>> +Clear poison (bad blocks) for the provided device
>>>> +[verse]
>>>> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
>>>> +
>>>> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
>>>
>>> The "poison" term is x86 specific. Lets just use "media error" everywhere.
>>
>> If you really mean uncorrectable error reported by an Address Range Scrub, you
>> could say that too. Or are there other types of media errors that this could
>> work with?
>
> I'm thinking about other architectures outside of x86 + ACPI that may
> have different terminology. So, the intent was to have something
> generic, but this concern also goes away if we just move to a model
> where the tool writes new data to clear errors.
I can definitely see going outside of the x86 part but can you really
get away without ACPI for this?
>
>>>> +
>>>> +-l::
>>>> +--len::
>>>> + The number of badblocks to clear in size of 512 bytes increments. The
>>>> + length must fit within the badblocks range. If the length exceeds the
>>>> + badblock range or is 0, the command will fail. Not providing a len
>>>> + will result in a single block being cleared.
>>>
>>> s/badblocks/media error/
>>
>> If it's really not a badblock, does the 512 byte increment still make sense?
>> There's a DSM to get the unit size for what can be cleared which may not
>> be 512. If the clearable unit is smaller, we're clearing too much. If the unit
>> is larger, then we can't clear less than that.
>
> Hopefully the units never get larger than 512. Since we can't address
> smaller than 512 bytes for block devices, I'd rather not expose the
> smaller than 512-byte error clearing support with this interface.
In any case, some argument checking would be helpful for better error messages
in whatever tool ends up doing this.
-- ljk
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2017-05-05 20:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
2017-05-05 19:14 ` Dan Williams
2017-05-05 19:33 ` Dan Williams
2017-05-05 19:41 ` Linda Knippers
2017-05-05 20:01 ` Dan Williams
2017-05-05 20:23 ` Linda Knippers [this message]
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
2017-05-05 17:07 ` Kani, Toshimitsu
2017-05-05 17:14 ` Dave Jiang
2017-05-05 17:21 ` Kani, Toshimitsu
2017-05-05 17:27 ` Dave Jiang
2017-05-05 17:35 ` Kani, Toshimitsu
2017-05-05 17:49 ` Kani, Toshimitsu
2017-05-04 21:49 ` [PATCH v6 3/3] ndctl: add test for clear-error Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=590CDF3B.8020303@hpe.com \
--to=linda.knippers@hpe.com \
--cc=dan.j.williams@intel.com \
--cc=linux-nvdimm@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox