Linux-NVDIMM Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "Williams, Dan J" <dan.j.williams@intel.com>,
	"Rudoff, Andy" <andy.rudoff@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [ndctl PATCH] ndctl, check: Add a sigbus handler to detect metadata corruption
Date: Fri, 14 Apr 2017 19:40:22 +0000	[thread overview]
Message-ID: <1492198820.1657.3.camel@intel.com> (raw)
In-Reply-To: <D56299A3-A5A4-4B49-AFC6-5F03118F1F85@intel.com>

On Fri, 2017-04-14 at 19:26 +0000, Rudoff, Andy wrote:
>     On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L
[...]
>     
> Yeah, I thought about this a little more too.  Here’s what I think
> should happen:
> 
> Without --repair, I agree with this patch, although the error message
> might
> tell the poor user that running with --repair is the next step if
> they want to
> try to fix it (see below).
> 
> With --repair, the repair code should try to replace the poison with
> repaired metadata, clearing the appropriate error bits when done.
> 
> For example, if the poison is in the map, the repair code should
> figure
> out which entries have no map entry, clear the poison, and update the
> map, printing a strong warning that the blocks were repaired but may
> not be in the right place any more.  

If we hit a known badblock, that is 512B worth of map entries (128).
Should we really (almost certainly) scramble 64 blocks? :)
If it is a latent error, it will still be at least a cache line worth
of map entries, i.e. 16.

If an error is in the log, then in the badblock case, we lose both log
and log' for four lanes. Which means we can't tell is one of those 4
entries needed a map update, leaving a potential corruption window
open.

> If the poison is in something like
> the arena info block, a replacement can be constructed from the
> backup info.  The point is that the tool shouldn’t die on poison, it
> should be the way you fix it and get rid of the poison.

This might be the only case where we can clear the poison
successfully..

> 
> Finally, the kernel should be setting and honoring the error bits,
> both
> in map entries and arena info blocks.  When poison is discovered in
> metadata, the arena info block error state should be set and that
> should not disallow getting to any available data as you stated, but
> should instead make that arena read-only.  

Agreed about the error bit and read-only, I will look at that.

> The only way to get out
> of read-only mode for the arena should be to run the check tool
> with --repair and all the stuff I said above then happens.
>  
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2017-04-14 19:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14  0:02 [ndctl PATCH] ndctl, check: Add a sigbus handler to detect metadata corruption Vishal Verma
2017-04-14  0:10 ` Rudoff, Andy
2017-04-14 19:00   ` Verma, Vishal L
2017-04-14 19:04     ` Dan Williams
2017-04-14 19:26       ` Rudoff, Andy
2017-04-14 19:40         ` Verma, Vishal L [this message]
2017-04-14 19:52           ` Rudoff, Andy
2017-04-14 20:28             ` Verma, Vishal L
2017-04-14 20:31               ` Rudoff, Andy
2017-04-17 15:37 ` Jeff Moyer
2017-04-18 16:09   ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1492198820.1657.3.camel@intel.com \
    --to=vishal.l.verma@intel.com \
    --cc=andy.rudoff@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox