* [PATCH] pmem: report error on clear poison failure
@ 2016-10-13 15:54 Toshi Kani
2016-10-13 16:01 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Toshi Kani @ 2016-10-13 15:54 UTC (permalink / raw)
To: dan.j.williams; +Cc: vishal.l.verma, linux-nvdimm, linux-kernel, Toshi Kani
ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform. pmem_clear_poison() returns without
clearing badblocks in such cases, which leads to a silent data
corruption.
Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
so that filesystem can log an error message.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
---
drivers/nvdimm/pmem.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 42b3a82..2461843 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -47,7 +47,7 @@ static struct nd_region *to_region(struct pmem_device *pmem)
return to_nd_region(to_dev(pmem)->parent);
}
-static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
+static int pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
unsigned int len)
{
struct device *dev = to_dev(pmem);
@@ -62,8 +62,12 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
__func__, (unsigned long long) sector,
cleared / 512, cleared / 512 > 1 ? "s" : "");
badblocks_clear(&pmem->bb, sector, cleared / 512);
+ } else {
+ return -EIO;
}
+
invalidate_pmem(pmem->virt_addr + offset, len);
+ return 0;
}
static void write_pmem(void *pmem_addr, struct page *page,
@@ -123,7 +127,7 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
flush_dcache_page(page);
write_pmem(pmem_addr, page, off, len);
if (unlikely(bad_pmem)) {
- pmem_clear_poison(pmem, pmem_off, len);
+ rc = pmem_clear_poison(pmem, pmem_off, len);
write_pmem(pmem_addr, page, off, len);
}
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 15:54 [PATCH] pmem: report error on clear poison failure Toshi Kani
@ 2016-10-13 16:01 ` Dan Williams
2016-10-13 16:08 ` Kani, Toshimitsu
0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2016-10-13 16:01 UTC (permalink / raw)
To: Toshi Kani
Cc: Vishal L Verma, linux-nvdimm@lists.01.org,
linux-kernel@vger.kernel.org
On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> ACPI Clear Uncorrectable Error DSM function may fail or may be
> unsupported on a platform. pmem_clear_poison() returns without
> clearing badblocks in such cases, which leads to a silent data
> corruption.
>
> Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> so that filesystem can log an error message.
What's the silent data corruption scenario? If the clear poison fails
I'm assuming that the poison will still be notified on the next read.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 16:01 ` Dan Williams
@ 2016-10-13 16:08 ` Kani, Toshimitsu
2016-10-13 17:22 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Kani, Toshimitsu @ 2016-10-13 16:08 UTC (permalink / raw)
To: dan.j.williams@intel.com
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
vishal.l.verma@intel.com
On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
> On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com>
> wrote:
> >
> > ACPI Clear Uncorrectable Error DSM function may fail or may be
> > unsupported on a platform. pmem_clear_poison() returns without
> > clearing badblocks in such cases, which leads to a silent data
> > corruption.
> >
> > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> > so that filesystem can log an error message.
>
> What's the silent data corruption scenario? If the clear poison
> fails I'm assuming that the poison will still be notified on the next
> read.
I agree that the data is eventually read, but there is no guranteed
that when it is read soon enough, i.e. user might not access to the
data for a long time.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 16:08 ` Kani, Toshimitsu
@ 2016-10-13 17:22 ` Dan Williams
2016-10-13 18:16 ` Kani, Toshimitsu
0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2016-10-13 17:22 UTC (permalink / raw)
To: Kani, Toshimitsu
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
Verma, Vishal L
On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
>> On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com>
>> wrote:
>> >
>> > ACPI Clear Uncorrectable Error DSM function may fail or may be
>> > unsupported on a platform. pmem_clear_poison() returns without
>> > clearing badblocks in such cases, which leads to a silent data
>> > corruption.
>> >
>> > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
>> > so that filesystem can log an error message.
>>
>> What's the silent data corruption scenario? If the clear poison
>> fails I'm assuming that the poison will still be notified on the next
>> read.
>
> I agree that the data is eventually read, but there is no guranteed
> that when it is read soon enough, i.e. user might not access to the
> data for a long time.
...but that's the same behavior for errors that we don't yet know
about. That said, we indeed know that the write failed. I'd feel
better about this patch if the justification / impact was clearer in
the changelog, because "silent data corruption" is not the impact.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 17:22 ` Dan Williams
@ 2016-10-13 18:16 ` Kani, Toshimitsu
2016-10-13 19:09 ` Ross Zwisler
2016-10-13 19:24 ` Dan Williams
0 siblings, 2 replies; 7+ messages in thread
From: Kani, Toshimitsu @ 2016-10-13 18:16 UTC (permalink / raw)
To: dan.j.williams@intel.com
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
vishal.l.verma@intel.com
On Thu, 2016-10-13 at 10:22 -0700, Dan Williams wrote:
> On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu <toshi.kani@hpe.com
> > wrote:
> >
> > On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
> > >
> > > On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com>
> > > wrote:
> > > >
> > > >
> > > > ACPI Clear Uncorrectable Error DSM function may fail or may be
> > > > unsupported on a platform. pmem_clear_poison() returns without
> > > > clearing badblocks in such cases, which leads to a silent data
> > > > corruption.
> > > >
> > > > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> > > > so that filesystem can log an error message.
> > >
> > > What's the silent data corruption scenario? If the clear poison
> > > fails I'm assuming that the poison will still be notified on the
> > > next
> > > read.
> >
> > I agree that the data is eventually read, but there is no guranteed
> > that when it is read soon enough, i.e. user might not access to the
> > data for a long time.
>
> ...but that's the same behavior for errors that we don't yet know
> about. That said, we indeed know that the write failed. I'd feel
> better about this patch if the justification / impact was clearer in
> the changelog, because "silent data corruption" is not the impact.
Agreed. How about the following descritpion?
===
ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform. pmem_clear_poison() returns without
clearing badblocks in such cases. This failure is detected at
the next read (-EIO).
This behavior can lead to an issue when user keeps writing but
does not read immedicately. For instance, flight recorder file
may be only read when it is necessary for troubleshooting.
Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
so that filesystem can log an error message on a write error.
===
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 18:16 ` Kani, Toshimitsu
@ 2016-10-13 19:09 ` Ross Zwisler
2016-10-13 19:24 ` Dan Williams
1 sibling, 0 replies; 7+ messages in thread
From: Ross Zwisler @ 2016-10-13 19:09 UTC (permalink / raw)
To: Kani, Toshimitsu
Cc: dan.j.williams@intel.com, linux-kernel@vger.kernel.org,
linux-nvdimm@lists.01.org
On Thu, Oct 13, 2016 at 06:16:29PM +0000, Kani, Toshimitsu wrote:
> On Thu, 2016-10-13 at 10:22 -0700, Dan Williams wrote:
> > On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu <toshi.kani@hpe.com
> > > wrote:
> > >
> > > On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
> > > >
> > > > On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com>
> > > > wrote:
> > > > >
> > > > >
> > > > > ACPI Clear Uncorrectable Error DSM function may fail or may be
> > > > > unsupported on a platform. pmem_clear_poison() returns without
> > > > > clearing badblocks in such cases, which leads to a silent data
> > > > > corruption.
> > > > >
> > > > > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> > > > > so that filesystem can log an error message.
> > > >
> > > > What's the silent data corruption scenario? If the clear poison
> > > > fails I'm assuming that the poison will still be notified on the
> > > > next
> > > > read.
> > >
> > > I agree that the data is eventually read, but there is no guranteed
> > > that when it is read soon enough, i.e. user might not access to the
> > > data for a long time.
> >
> > ...but that's the same behavior for errors that we don't yet know
> > about. That said, we indeed know that the write failed. I'd feel
> > better about this patch if the justification / impact was clearer in
> > the changelog, because "silent data corruption" is not the impact.
>
> Agreed. How about the following descritpion?
>
> ===
> ACPI Clear Uncorrectable Error DSM function may fail or may be
> unsupported on a platform. pmem_clear_poison() returns without
> clearing badblocks in such cases. This failure is detected at
> the next read (-EIO).
>
> This behavior can lead to an issue when user keeps writing but
> does not read immedicately. For instance, flight recorder file
immediately
> may be only read when it is necessary for troubleshooting.
>
> Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> so that filesystem can log an error message on a write error.
> ===
>
> Thanks,
> -Toshi
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] pmem: report error on clear poison failure
2016-10-13 18:16 ` Kani, Toshimitsu
2016-10-13 19:09 ` Ross Zwisler
@ 2016-10-13 19:24 ` Dan Williams
1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2016-10-13 19:24 UTC (permalink / raw)
To: Kani, Toshimitsu
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
Verma, Vishal L
On Thu, Oct 13, 2016 at 11:16 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Thu, 2016-10-13 at 10:22 -0700, Dan Williams wrote:
>> On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu <toshi.kani@hpe.com
>> > wrote:
>> >
>> > On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@hpe.com>
>> > > wrote:
>> > > >
>> > > >
>> > > > ACPI Clear Uncorrectable Error DSM function may fail or may be
>> > > > unsupported on a platform. pmem_clear_poison() returns without
>> > > > clearing badblocks in such cases, which leads to a silent data
>> > > > corruption.
>> > > >
>> > > > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
>> > > > so that filesystem can log an error message.
>> > >
>> > > What's the silent data corruption scenario? If the clear poison
>> > > fails I'm assuming that the poison will still be notified on the
>> > > next
>> > > read.
>> >
>> > I agree that the data is eventually read, but there is no guranteed
>> > that when it is read soon enough, i.e. user might not access to the
>> > data for a long time.
>>
>> ...but that's the same behavior for errors that we don't yet know
>> about. That said, we indeed know that the write failed. I'd feel
>> better about this patch if the justification / impact was clearer in
>> the changelog, because "silent data corruption" is not the impact.
>
> Agreed. How about the following descritpion?
>
> ===
> ACPI Clear Uncorrectable Error DSM function may fail or may be
> unsupported on a platform. pmem_clear_poison() returns without
> clearing badblocks in such cases. This failure is detected at
> the next read (-EIO).
>
> This behavior can lead to an issue when user keeps writing but
> does not read immedicately. For instance, flight recorder file
> may be only read when it is necessary for troubleshooting.
>
> Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> so that filesystem can log an error message on a write error.
> ===
Looks good, thanks Toshi. I'll update the nvdimm.git branches after
-rc1 is out.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-13 19:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-13 15:54 [PATCH] pmem: report error on clear poison failure Toshi Kani
2016-10-13 16:01 ` Dan Williams
2016-10-13 16:08 ` Kani, Toshimitsu
2016-10-13 17:22 ` Dan Williams
2016-10-13 18:16 ` Kani, Toshimitsu
2016-10-13 19:09 ` Ross Zwisler
2016-10-13 19:24 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox