linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jane.chu@oracle.com
To: Jeff Moyer <jmoyer@redhat.com>, Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"JANE.CHU" <jane.chu@oracle.com>
Subject: Re: [RFC][PATCH] dax: Do not try to clear poison for partial pages
Date: Tue, 18 Feb 2020 12:45:32 -0800	[thread overview]
Message-ID: <583b5fc2-0358-ea9d-20eb-1323c8cedce2@oracle.com> (raw)
In-Reply-To: <x49v9o3brom.fsf@segfault.boston.devel.redhat.com>

On 2/18/20 11:50 AM, Jeff Moyer wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
>> Right now the kernel does not install a pte on faults that land on a
>> page with known poison, but only because the error clearing path is so
>> convoluted and could only claim that fallocate(PUNCH_HOLE) cleared
>> errors because that was guaranteed to send 512-byte aligned zero's
>> down the block-I/O path when the fs-blocks got reallocated. In a world
>> where native cpu instructions can clear errors the dax write() syscall
>> case could be covered (modulo 64-byte alignment), and the kernel could
>> just let the page be mapped so that the application could attempt it's
>> own fine-grained clearing without calling back into the kernel.
> 
> I'm not sure we'd want to do allow mapping the PTEs even if there was
> support for clearing errors via CPU instructions.  Any load from a
> poisoned page will result in an MCE, and there exists the possiblity
> that you will hit an unrecoverable error (Processor Context Corrupt).
> It's just safer to catch these cases by not mapping the page, and
> forcing recovery through the driver.
> 
> -Jeff
> 

I'm still in the process of trying a number of things before making an
attempt to respond to Dan's response. But I'm too slow, so I'd like
to share some concerns I have here.

If a poison in a file is consumed, and the signal handle does the
repair and recover as follow: punch a hole the size at least 4K, then
pwrite the correct data in to the 'hole', then resume the operation.
However, because the newly allocated pmem block (due to pwrite to the 
'hole') is a different clean physical pmem block while the poisoned
block remain unfixed, so we have a provisioning problem, because
  1. DCPMEM is expensive hence there is likely little provision being
provided by users;
  2. lack up API between dax-filesystem and pmem driver for clearing
poison at each legitimate point, such as when the filesystem tries
to allocate a pmem block, or zeroing out a range.

As DCPMM is used for its performance and capacity in cloud application,
which translates to that the performance code paths include the error
handling and recovery code path...

With respect to the new cpu instruction, my concern is about the API 
including the error blast radius as reported in the signal payload.
Is there a venue where we could discuss more in detail ?

Regards,
-jane




  reply	other threads:[~2020-02-18 20:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-29 21:03 [RFC][PATCH] dax: Do not try to clear poison for partial pages Vivek Goyal
2020-01-31  5:42 ` Christoph Hellwig
2020-02-05 20:26 ` jane.chu
2020-02-06  0:37   ` Dan Williams
2020-02-18 19:50     ` Jeff Moyer
2020-02-18 20:45       ` jane.chu [this message]
2020-02-18 22:50         ` jane.chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=583b5fc2-0358-ea9d-20eb-1323c8cedce2@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).