All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jane Chu <jane.chu@oracle.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 0/3] dax: clear poison on the fly along pwrite
Date: Fri, 17 Sep 2021 17:07:00 -0700	[thread overview]
Message-ID: <20210918000700.GA10182@magnolia> (raw)
In-Reply-To: <CAPcyv4iAr_Vwwgqw+4wz0RQUXhUUJGGz7_T+p+W6tC4T+k+zNw@mail.gmail.com>

On Fri, Sep 17, 2021 at 01:21:25PM -0700, Dan Williams wrote:
> On Fri, Sep 17, 2021 at 8:27 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Sep 17, 2021 at 01:53:33PM +0100, Christoph Hellwig wrote:
> > > On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote:
> > > > > That was my gut feeling.  If everyone feels 100% comfortable with
> > > > > zeroingas the mechanism to clear poisoning I'll cave in.  The most
> > > > > important bit is that we do that through a dedicated DAX path instead
> > > > > of abusing the block layer even more.
> > > >
> > > > ...or just rename dax_zero_page_range() to dax_reset_page_range()?
> > > > Where reset == "zero + clear-poison"?
> > >
> > > I'd say that naming is more confusing than overloading zero.
> >
> > How about dax_zeroinit_range() ?
> 
> Works for me.
> 
> >
> > To go with its fallocate flag (yeah I've been too busy sorting out -rc1
> > regressions to repost this) FALLOC_FL_ZEROINIT_RANGE that will reset the
> > hardware (whatever that means) and set the contents to the known value
> > zero.
> >
> > Userspace usage model:
> >
> > void handle_media_error(int fd, loff_t pos, size_t len)
> > {
> >         /* yell about this for posterior's sake */
> >
> >         ret = fallocate(fd, FALLOC_FL_ZEROINIT_RANGE, pos, len);
> >
> >         /* yay our disk drive / pmem / stone table engraver is online */
> 
> The fallocate mode can still be error-aware though, right? When the FS
> has knowledge of the error locations the fallocate mode could be
> fallocate(fd, FALLOC_FL_OVERWRITE_ERRORS, pos, len) with the semantics
> of attempting to zero out any known poison extents in the given file
> range? At the risk of going overboard on new fallocate modes there
> could also (or instead of) be FALLOC_FL_PUNCH_ERRORS to skip trying to
> clear them and just ask the FS to throw error extents away.

It /could/ be, but for now I've stuck to what you see is what you get --
if you tell it to 'zero initialize' 1MB of pmem, it'll write zeroes and
clear the poison on all 1MB, regardless of the old contents.

IOWs, you can use it from a poison handler on just the range that it
told you about, or you could use it to bulk-clear a lot of space all at
once.

A dorky thing here is that the dax_zero_page_range function returns EIO
if you tell it to do more than one page...


> 
> > }
> >
> > > > > I'm really worried about both patartitions on DAX and DM passing through
> > > > > DAX because they deeply bind DAX to the block layer, which is just a bad
> > > > > idea.  I think we also need to sort that whole story out before removing
> > > > > the EXPERIMENTAL tags.
> > > >
> > > > I do think it was a mistake to allow for DAX on partitions of a pmemX
> > > > block-device.
> > > >
> > > > DAX-reflink support may be the opportunity to start deprecating that
> > > > support. Only enable DAX-reflink for direct mounting on /dev/pmemX
> > > > without partitions (later add dax-device direct mounting),
> > >
> > > I think we need to fully or almost fully sort this out.
> > >
> > > Here is my bold suggestions:
> > >
> > >  1) drop no drop the EXPERMINTAL on the current block layer overload
> > >     at all
> >
> > I don't understand this.
> >
> > >  2) add direct mounting of the nvdimm namespaces ASAP.  Because all
> > >     the filesystem currently also need the /dev/pmem0 device add a way
> > >     to open the block device by the dax_device instead of our current
> > >     way of doing the reverse
> > >  3) deprecate DAX support through block layer mounts with a say 2 year
> > >     deprecation period
> > >  4) add DAX remapping devices as needed
> >
> > What devices are needed?  linear for lvm, and maybe error so we can
> > actually test all this stuff?
> 
> The proposal would be zero lvm support. The nvdimm namespace
> definition would need to grow support for concatenation + striping.

Ah, ok.

> Soft error injection could be achieved by writing to the badblocks
> interface.

<nod>

I'll send out an RFC of what I have currently.

--D

  reply	other threads:[~2021-09-18  0:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 23:31 [PATCH 0/3] dax: clear poison on the fly along pwrite Jane Chu
2021-09-14 23:31 ` [PATCH 1/3] dax: introduce dax_operation dax_clear_poison Jane Chu
2021-11-04 17:53   ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation Jane Chu
2021-11-04 17:53   ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax clear poison to page aligned " Jane Chu
2021-09-14 23:31 ` [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation Jane Chu
2021-11-04 17:55   ` Christoph Hellwig
2021-11-04 20:27     ` Jane Chu
2021-09-15  4:44 ` [PATCH 0/3] dax: clear poison on the fly along pwrite Dan Williams
2021-09-15  7:22   ` Jane Chu
2021-09-15 16:15     ` Darrick J. Wong
2021-09-15 20:27       ` Dan Williams
2021-09-16  0:05         ` Darrick J. Wong
2021-09-16  7:11         ` Christoph Hellwig
2021-09-16 18:40           ` Dan Williams
2021-09-17 12:53             ` Christoph Hellwig
2021-09-17 15:27               ` Darrick J. Wong
2021-09-17 20:21                 ` Dan Williams
2021-09-18  0:07                   ` Darrick J. Wong [this message]
2021-09-17 19:37               ` Dan Williams
2021-09-23 20:48         ` Jane Chu
2021-09-23 20:55       ` Jane Chu
2021-09-23 21:42         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210918000700.GA10182@magnolia \
    --to=djwong@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.