All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "Kani, Toshimitsu" <toshi.kani-ZPxbGqLxI0U@public.gmane.org>
Cc: "linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Wysocki,
	Rafael J"
	<rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org"
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
Subject: Re: [PATCH v4 0/6] BTT error clearing rework
Date: Tue, 1 Aug 2017 13:56:31 -0600	[thread overview]
Message-ID: <20170801195630.GC18006@omniknight.lm.intel.com> (raw)
In-Reply-To: <1501614143.2042.101.camel-ZPxbGqLxI0U@public.gmane.org>

On 08/01, Kani, Toshimitsu wrote:
> On Tue, 2017-08-01 at 09:19 -0600, Toshi Kani wrote:
> > On Mon, 2017-07-31 at 23:35 +0000, Verma, Vishal L wrote:
> > > On Mon, 2017-07-31 at 23:15 +0000, Kani, Toshimitsu wrote:
> > > > On Wed, 2017-07-26 at 17:35 -0600, Vishal Verma wrote:
>  :
> > > Thanks for the test Toshi, I will try and reproduce it.
> > > My first guess is - are the injected errors potentially in the BTT
> > > metadata area towards the end?
> > >
> > > ->rw_bytes can only clear errors on properly aligned writes, and
> > > the btt metadata writes will be too small to clear metadata
> > > errors..
> >
> > I picked an injected offset without careful thoughts, so it is
> > possible that I might have stepped into such area.  I just tested
> > with a block device interface with multiple different offsets, and
> > they failed in clearing as well...  I will look into further as well
> > as my test setup.
> 
> The change/hack below takes care of the issue in my setup.  There is a
> discrepancy in offset between the pre-check in btt_write_pg() and the
> actual check in nsio_rw_bytes().
> 
> Thanks,
> -Toshi
> 
> ---
>  drivers/nvdimm/btt.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 8a959f8..83ad4c6 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1137,7 +1137,7 @@ static bool btt_is_badblock(struct btt *btt,
> struct arena_info *arena,
>                 u32 postmap)
>  {
>         u64 nsoff = to_namespace_offset(arena, postmap);
> -       sector_t phys_sector = nsoff >> 9;
> +       sector_t phys_sector = (nsoff + arena->nd_btt->initial_offset) >> 9;
> 
>         return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize);
>  }

Hey Toshi, that is a great catch (I'm going to look at enhancing the
unit tests too to catch this).

Your patch is correct. Normally whenever we calculate a 'nsoff', we use
that to do arena_{read,write}_bytes, and that takes care of adding the
initial_offset. This was the first instance where we calculated nsoff
and did something other than rw_bytes with it (i.e. is_bad_pmem), and
therefore this nsoff needs the initial_offset adjustment manually..

However just adding that here is inviting future bugs like this, so I'm
going to refactor the initial_offset adjustment into some helpers and
use those across the board. That should prevent this sort of a bug
again.

Thanks for the report and the root cause!

	-Vishal

WARNING: multiple messages have this Message-ID (diff)
From: Vishal Verma <vishal.l.verma@intel.com>
To: "Kani, Toshimitsu" <toshi.kani@hpe.com>
Cc: "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"Wysocki, Rafael J  <rafael.j.wysocki@intel.com>,
	linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH v4 0/6] BTT error clearing rework
Date: Tue, 1 Aug 2017 13:56:31 -0600	[thread overview]
Message-ID: <20170801195630.GC18006@omniknight.lm.intel.com> (raw)
In-Reply-To: <1501614143.2042.101.camel@hpe.com>

On 08/01, Kani, Toshimitsu wrote:
> On Tue, 2017-08-01 at 09:19 -0600, Toshi Kani wrote:
> > On Mon, 2017-07-31 at 23:35 +0000, Verma, Vishal L wrote:
> > > On Mon, 2017-07-31 at 23:15 +0000, Kani, Toshimitsu wrote:
> > > > On Wed, 2017-07-26 at 17:35 -0600, Vishal Verma wrote:
>  :
> > > Thanks for the test Toshi, I will try and reproduce it.
> > > My first guess is - are the injected errors potentially in the BTT
> > > metadata area towards the end?
> > >
> > > ->rw_bytes can only clear errors on properly aligned writes, and
> > > the btt metadata writes will be too small to clear metadata
> > > errors..
> >
> > I picked an injected offset without careful thoughts, so it is
> > possible that I might have stepped into such area.  I just tested
> > with a block device interface with multiple different offsets, and
> > they failed in clearing as well...  I will look into further as well
> > as my test setup.
> 
> The change/hack below takes care of the issue in my setup.  There is a
> discrepancy in offset between the pre-check in btt_write_pg() and the
> actual check in nsio_rw_bytes().
> 
> Thanks,
> -Toshi
> 
> ---
>  drivers/nvdimm/btt.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 8a959f8..83ad4c6 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1137,7 +1137,7 @@ static bool btt_is_badblock(struct btt *btt,
> struct arena_info *arena,
>                 u32 postmap)
>  {
>         u64 nsoff = to_namespace_offset(arena, postmap);
> -       sector_t phys_sector = nsoff >> 9;
> +       sector_t phys_sector = (nsoff + arena->nd_btt->initial_offset) >> 9;
> 
>         return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize);
>  }

Hey Toshi, that is a great catch (I'm going to look at enhancing the
unit tests too to catch this).

Your patch is correct. Normally whenever we calculate a 'nsoff', we use
that to do arena_{read,write}_bytes, and that takes care of adding the
initial_offset. This was the first instance where we calculated nsoff
and did something other than rw_bytes with it (i.e. is_bad_pmem), and
therefore this nsoff needs the initial_offset adjustment manually..

However just adding that here is inviting future bugs like this, so I'm
going to refactor the initial_offset adjustment into some helpers and
use those across the board. That should prevent this sort of a bug
again.

Thanks for the report and the root cause!

	-Vishal
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  parent reply	other threads:[~2017-08-01 19:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:35 [PATCH v4 0/6] BTT error clearing rework Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma
2017-07-26 23:35   ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 2/6] btt: refactor map entry operations with macros Vishal Verma
2017-07-26 23:35   ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 3/6] btt: ensure that flags were also unchanged during a map_read Vishal Verma
2017-07-26 23:35   ` Vishal Verma
     [not found] ` <20170726233546.29052-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-07-26 23:35   ` [PATCH v4 4/6] btt: cache sector_size in arena_info Vishal Verma
2017-07-26 23:35     ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 5/6] libnvdimm: fix potential deadlock while clearing errors Vishal Verma
2017-07-26 23:35   ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 6/6] libnvdimm, btt: rework error clearing Vishal Verma
2017-07-26 23:35   ` Vishal Verma
2017-07-31 23:15 ` [PATCH v4 0/6] BTT error clearing rework Kani, Toshimitsu
2017-07-31 23:15   ` Kani, Toshimitsu
2017-07-31 23:35   ` Verma, Vishal L
2017-07-31 23:35     ` Verma, Vishal L
2017-08-01 15:28     ` Kani, Toshimitsu
2017-08-01 15:28       ` Kani, Toshimitsu
2017-08-01 19:11       ` Kani, Toshimitsu
2017-08-01 19:11         ` Kani, Toshimitsu
     [not found]         ` <1501614143.2042.101.camel-ZPxbGqLxI0U@public.gmane.org>
2017-08-01 19:56           ` Vishal Verma [this message]
2017-08-01 19:56             ` Vishal Verma
2017-08-01 20:06             ` Kani, Toshimitsu
2017-08-01 20:06               ` Kani, Toshimitsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801195630.GC18006@omniknight.lm.intel.com \
    --to=vishal.l.verma-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
    --cc=rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=toshi.kani-ZPxbGqLxI0U@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.