linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jane.chu@oracle.com
To: Jeff Moyer <jmoyer@redhat.com>, Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"JANE.CHU" <jane.chu@oracle.com>
Subject: Re: [RFC][PATCH] dax: Do not try to clear poison for partial pages
Date: Tue, 18 Feb 2020 14:50:43 -0800	[thread overview]
Message-ID: <17c0d27e-c23f-b686-1d47-a0ccace03211@oracle.com> (raw)
In-Reply-To: <583b5fc2-0358-ea9d-20eb-1323c8cedce2@oracle.com>

On 2/18/20 12:45 PM, jane.chu@oracle.com wrote:
> On 2/18/20 11:50 AM, Jeff Moyer wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> Right now the kernel does not install a pte on faults that land on a
>>> page with known poison, but only because the error clearing path is so
>>> convoluted and could only claim that fallocate(PUNCH_HOLE) cleared
>>> errors because that was guaranteed to send 512-byte aligned zero's
>>> down the block-I/O path when the fs-blocks got reallocated. In a world
>>> where native cpu instructions can clear errors the dax write() syscall
>>> case could be covered (modulo 64-byte alignment), and the kernel could
>>> just let the page be mapped so that the application could attempt it's
>>> own fine-grained clearing without calling back into the kernel.
>>
>> I'm not sure we'd want to do allow mapping the PTEs even if there was
>> support for clearing errors via CPU instructions.  Any load from a
>> poisoned page will result in an MCE, and there exists the possiblity
>> that you will hit an unrecoverable error (Processor Context Corrupt).
>> It's just safer to catch these cases by not mapping the page, and
>> forcing recovery through the driver.
>>
>> -Jeff
>>
> 
> I'm still in the process of trying a number of things before making an
> attempt to respond to Dan's response. But I'm too slow, so I'd like
> to share some concerns I have here.
> 
> If a poison in a file is consumed, and the signal handle does the
> repair and recover as follow: punch a hole the size at least 4K, then
> pwrite the correct data in to the 'hole', then resume the operation.
> However, because the newly allocated pmem block (due to pwrite to the 
> 'hole') is a different clean physical pmem block while the poisoned
> block remain unfixed, so we have a provisioning problem, because
>   1. DCPMEM is expensive hence there is likely little provision being
> provided by users;
>   2. lack up API between dax-filesystem and pmem driver for clearing
> poison at each legitimate point, such as when the filesystem tries
> to allocate a pmem block, or zeroing out a range >
> As DCPMM is used for its performance and capacity in cloud application,
> which translates to that the performance code paths include the error
> handling and recovery code path...
> 
> With respect to the new cpu instruction, my concern is about the API 
> including the error blast radius as reported in the signal payload.
> Is there a venue where we could discuss more in detail ?

For all the quarantined poison blocks, it's not practical to clear them 
poisons via ndctl/libndctl on a per namespace granularity for fear of
poisons occurred in valid pmem blocks during data at rest.

How to ultimately clear poisons in a dax-fs in current framework?
it seems to me poisons need to be cleared on the go automatically.

Regards,
-jane

> 
> Regards,
> -jane
> 
> 
> 

      reply	other threads:[~2020-02-18 22:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-29 21:03 [RFC][PATCH] dax: Do not try to clear poison for partial pages Vivek Goyal
2020-01-31  5:42 ` Christoph Hellwig
2020-02-05 20:26 ` jane.chu
2020-02-06  0:37   ` Dan Williams
2020-02-18 19:50     ` Jeff Moyer
2020-02-18 20:45       ` jane.chu
2020-02-18 22:50         ` jane.chu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17c0d27e-c23f-b686-1d47-a0ccace03211@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).