linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [RFC] dax,pmem: Provide a dax operation to zero range of memory
Date: Tue, 4 Feb 2020 15:23:18 -0800	[thread overview]
Message-ID: <20200204232318.GF6874@magnolia> (raw)
In-Reply-To: <CAPcyv4jT3py4gtdJo84i8gPnJo5MO4uGaaO=+fuuAjXQ0gQsHA@mail.gmail.com>

On Fri, Jan 31, 2020 at 03:31:58PM -0800, Dan Williams wrote:
> On Thu, Jan 23, 2020 at 11:07 AM Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> >
> > On Thu, Jan 23, 2020 at 11:52:49AM -0500, Vivek Goyal wrote:
> > > Hi,
> > >
> > > This is an RFC patch to provide a dax operation to zero a range of memory.
> > > It will also clear poison in the process. This is primarily compile tested
> > > patch. I don't have real hardware to test the poison logic. I am posting
> > > this to figure out if this is the right direction or not.
> > >
> > > Motivation from this patch comes from Christoph's feedback that he will
> > > rather prefer a dax way to zero a range instead of relying on having to
> > > call blkdev_issue_zeroout() in __dax_zero_page_range().
> > >
> > > https://lkml.org/lkml/2019/8/26/361
> > >
> > > My motivation for this change is virtiofs DAX support. There we use DAX
> > > but we don't have a block device. So any dax code which has the assumption
> > > that there is always a block device associated is a problem. So this
> > > is more of a cleanup of one of the places where dax has this dependency
> > > on block device and if we add a dax operation for zeroing a range, it
> > > can help with not having to call blkdev_issue_zeroout() in dax path.
> > >
> > > I have yet to take care of stacked block drivers (dm/md).
> > >
> > > Current poison clearing logic is primarily written with assumption that
> > > I/O is sector aligned. With this new method, this assumption is broken
> > > and one can pass any range of memory to zero. I have fixed few places
> > > in existing logic to be able to handle an arbitrary start/end. I am
> > > not sure are there other dependencies which might need fixing or
> > > prohibit us from providing this method.
> > >
> > > Any feedback or comment is welcome.
> >
> > So who gest to use this? :)
> >
> > Should we (XFS) make fallocate(ZERO_RANGE) detect when it's operating on
> > a written extent in a DAX file and call this instead of what it does now
> > (punch range and reallocate unwritten)?
> 
> If it eliminates more block assumptions, then yes. In general I think
> there are opportunities to use "native" direct_access instead of
> block-i/o for other areas too, like metadata i/o.
> 
> > Is this the kind of thing XFS should just do on its own when DAX us that
> > some range of pmem has gone bad and now we need to (a) race with the
> > userland programs to write /something/ to the range to prevent a machine
> > check (b) whack all the programs that think they have a mapping to
> > their data, (c) see if we have a DRAM copy and just write that back, (d)
> > set wb_err so fsyncs fail, and/or (e) regenerate metadata as necessary?
> 
> (a), (b) duplicate what memory error handling already does. So yes,
> could be done but it only helps if machine check handling is broken or
> missing.

<nod> 

> (c) what DRAM copy in the DAX case?

Sorry, I was talking about the fs metadata that we cache in DRAM.

> (d) dax fsync is just cache flush, so it can't fail, or are you
> talking about errors in metadata?

I'm talking about an S_DAX file that someone is doing regular write()s
to:

1. Open file O_RDWR
2. Write something to the file
3. Some time later, something decides the pmem is bad.
4. Program calls fsync(); does it return EIO?

(I shouldn't have mixed the metadata/file data cases, sorry...)

> (e) I thought our solution for dax metadata redundancy is to use a
> realtime data device and raid mirror for the metadata device.

In the end it was set aside on the grounds that reserving space for
a separate metadata device was too costly and too complex for now.
We might get back to it later when the <cough> economics improve.

> > <cough> Will XFS ever get that "your storage went bad" hook that was
> > promised ages ago?
> 
> pmem developers don't scale?

Ah, sorry. :/

> > Though I guess it only does this a single page at a time, which won't be
> > awesome if we're trying to zero (say) 100GB of pmem.  I was expecting to
> > see one big memset() call to zero the entire range followed by
> > pmem_clear_poison() on the entire range, but I guess you did tag this
> > RFC. :)
> 
> Until movdir64b is available the only way to clear poison is by making
> a call to the BIOS. The BIOS may not be efficient at bulk clearing.

Well then let's port XFS to SMM mode. <duck>

(No, please don't...)

--D

  parent reply	other threads:[~2020-02-04 23:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 16:52 [RFC] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-01-23 19:01 ` Darrick J. Wong
2020-01-24 13:52   ` Vivek Goyal
2020-01-31 23:31   ` Dan Williams
2020-02-03  8:20     ` Christoph Hellwig
2020-02-04 23:23     ` Darrick J. Wong [this message]
2020-01-31  5:36 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200204232318.GF6874@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).