From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <willy@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"hch@lst.de" <hch@lst.de>, "rgoldwyn@suse.de" <rgoldwyn@suse.de>,
"Qi, Fuli" <qi.fuli@fujitsu.com>,
"Gotou, Yasunori" <y-goto@fujitsu.com>
Subject: Re: 回复: Re: [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink
Date: Thu, 4 Jun 2020 07:51:07 -0700 [thread overview]
Message-ID: <20200604145107.GA1334206@magnolia> (raw)
In-Reply-To: <153e13e6-8685-fb0d-6bd3-bb553c06bf51@cn.fujitsu.com>
On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote:
>
>
> On 2020/4/28 下午2:43, Dave Chinner wrote:
> > On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote:
> > >
> > > 在 2020/4/27 20:28:36, "Matthew Wilcox" <willy@infradead.org> 写道:
> > >
> > > > On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:
> > > > > This patchset is a try to resolve the shared 'page cache' problem for
> > > > > fsdax.
> > > > >
> > > > > In order to track multiple mappings and indexes on one page, I
> > > > > introduced a dax-rmap rb-tree to manage the relationship. A dax entry
> > > > > will be associated more than once if is shared. At the second time we
> > > > > associate this entry, we create this rb-tree and store its root in
> > > > > page->private(not used in fsdax). Insert (->mapping, ->index) when
> > > > > dax_associate_entry() and delete it when dax_disassociate_entry().
> > > >
> > > > Do we really want to track all of this on a per-page basis? I would
> > > > have thought a per-extent basis was more useful. Essentially, create
> > > > a new address_space for each shared extent. Per page just seems like
> > > > a huge overhead.
> > > >
> > > Per-extent tracking is a nice idea for me. I haven't thought of it
> > > yet...
> > >
> > > But the extent info is maintained by filesystem. I think we need a way
> > > to obtain this info from FS when associating a page. May be a bit
> > > complicated. Let me think about it...
> >
> > That's why I want the -user of this association- to do a filesystem
> > callout instead of keeping it's own naive tracking infrastructure.
> > The filesystem can do an efficient, on-demand reverse mapping lookup
> > from it's own extent tracking infrastructure, and there's zero
> > runtime overhead when there are no errors present.
>
> Hi Dave,
>
> I ran into some difficulties when trying to implement the per-extent rmap
> tracking. So, I re-read your comments and found that I was misunderstanding
> what you described here.
>
> I think what you mean is: we don't need the in-memory dax-rmap tracking now.
> Just ask the FS for the owner's information that associate with one page
> when memory-failure. So, the per-page (even per-extent) dax-rmap is
> needless in this case. Is this right?
Right. XFS already has its own rmap tree.
> Based on this, we only need to store the extent information of a fsdax page
> in its ->mapping (by searching from FS). Then obtain the owners of this
> page (also by searching from FS) when memory-failure or other rmap case
> occurs.
I don't even think you need that much. All you need is the "physical"
offset of that page within the pmem device (e.g. 'this is the 307th 4k
page == offset 1257472 since the start of /dev/pmem0') and xfs can look
up the owner of that range of physical storage and deal with it as
needed.
> So, a fsdax page is no longer associated with a specific file, but with a
> FS(or the pmem device). I think it's easier to understand and implement.
Yes. I also suspect this will be necessary to support reflink...
--D
>
> --
> Thanks,
> Ruan Shiyang.
> >
> > At the moment, this "dax association" is used to "report" a storage
> > media error directly to userspace. I say "report" because what it
> > does is kill userspace processes dead. The storage media error
> > actually needs to be reported to the owner of the storage media,
> > which in the case of FS-DAX is the filesytem.
> >
> > That way the filesystem can then look up all the owners of that bad
> > media range (i.e. the filesystem block it corresponds to) and take
> > appropriate action. e.g.
> >
> > - if it falls in filesytem metadata, shutdown the filesystem
> > - if it falls in user data, call the "kill userspace dead" routines
> > for each mapping/index tuple the filesystem finds for the given
> > LBA address that the media error occurred.
> >
> > Right now if the media error is in filesystem metadata, the
> > filesystem isn't even told about it. The filesystem can't even shut
> > down - the error is just dropped on the floor and it won't be until
> > the filesystem next tries to reference that metadata that we notice
> > there is an issue.
> >
> > Cheers,
> >
> > Dave.
> >
>
>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <willy@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"hch@lst.de" <hch@lst.de>, "rgoldwyn@suse.de" <rgoldwyn@suse.de>,
"Qi, Fuli" <qi.fuli@fujitsu.com>,
"Gotou, Yasunori" <y-goto@fujitsu.com>
Subject: Re: 回复: Re: [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink
Date: Thu, 4 Jun 2020 07:51:07 -0700 [thread overview]
Message-ID: <20200604145107.GA1334206@magnolia> (raw)
In-Reply-To: <153e13e6-8685-fb0d-6bd3-bb553c06bf51@cn.fujitsu.com>
On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote:
>
>
> On 2020/4/28 下午2:43, Dave Chinner wrote:
> > On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote:
> > >
> > > 在 2020/4/27 20:28:36, "Matthew Wilcox" <willy@infradead.org> 写道:
> > >
> > > > On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:
> > > > > This patchset is a try to resolve the shared 'page cache' problem for
> > > > > fsdax.
> > > > >
> > > > > In order to track multiple mappings and indexes on one page, I
> > > > > introduced a dax-rmap rb-tree to manage the relationship. A dax entry
> > > > > will be associated more than once if is shared. At the second time we
> > > > > associate this entry, we create this rb-tree and store its root in
> > > > > page->private(not used in fsdax). Insert (->mapping, ->index) when
> > > > > dax_associate_entry() and delete it when dax_disassociate_entry().
> > > >
> > > > Do we really want to track all of this on a per-page basis? I would
> > > > have thought a per-extent basis was more useful. Essentially, create
> > > > a new address_space for each shared extent. Per page just seems like
> > > > a huge overhead.
> > > >
> > > Per-extent tracking is a nice idea for me. I haven't thought of it
> > > yet...
> > >
> > > But the extent info is maintained by filesystem. I think we need a way
> > > to obtain this info from FS when associating a page. May be a bit
> > > complicated. Let me think about it...
> >
> > That's why I want the -user of this association- to do a filesystem
> > callout instead of keeping it's own naive tracking infrastructure.
> > The filesystem can do an efficient, on-demand reverse mapping lookup
> > from it's own extent tracking infrastructure, and there's zero
> > runtime overhead when there are no errors present.
>
> Hi Dave,
>
> I ran into some difficulties when trying to implement the per-extent rmap
> tracking. So, I re-read your comments and found that I was misunderstanding
> what you described here.
>
> I think what you mean is: we don't need the in-memory dax-rmap tracking now.
> Just ask the FS for the owner's information that associate with one page
> when memory-failure. So, the per-page (even per-extent) dax-rmap is
> needless in this case. Is this right?
Right. XFS already has its own rmap tree.
> Based on this, we only need to store the extent information of a fsdax page
> in its ->mapping (by searching from FS). Then obtain the owners of this
> page (also by searching from FS) when memory-failure or other rmap case
> occurs.
I don't even think you need that much. All you need is the "physical"
offset of that page within the pmem device (e.g. 'this is the 307th 4k
page == offset 1257472 since the start of /dev/pmem0') and xfs can look
up the owner of that range of physical storage and deal with it as
needed.
> So, a fsdax page is no longer associated with a specific file, but with a
> FS(or the pmem device). I think it's easier to understand and implement.
Yes. I also suspect this will be necessary to support reflink...
--D
>
> --
> Thanks,
> Ruan Shiyang.
> >
> > At the moment, this "dax association" is used to "report" a storage
> > media error directly to userspace. I say "report" because what it
> > does is kill userspace processes dead. The storage media error
> > actually needs to be reported to the owner of the storage media,
> > which in the case of FS-DAX is the filesytem.
> >
> > That way the filesystem can then look up all the owners of that bad
> > media range (i.e. the filesystem block it corresponds to) and take
> > appropriate action. e.g.
> >
> > - if it falls in filesytem metadata, shutdown the filesystem
> > - if it falls in user data, call the "kill userspace dead" routines
> > for each mapping/index tuple the filesystem finds for the given
> > LBA address that the media error occurred.
> >
> > Right now if the media error is in filesystem metadata, the
> > filesystem isn't even told about it. The filesystem can't even shut
> > down - the error is just dropped on the floor and it won't be until
> > the filesystem next tries to reference that metadata that we notice
> > there is an issue.
> >
> > Cheers,
> >
> > Dave.
> >
>
>
next prev parent reply other threads:[~2020-06-04 14:51 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-27 8:47 [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 1/8] fs/dax: Introduce dax-rmap btree for reflink Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 2/8] mm: add dax-rmap for memory-failure and rmap Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 3/8] fs/dax: Introduce dax_copy_edges() for COW Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 4/8] fs/dax: copy data before write Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 5/8] fs/dax: replace mmap entry in case of CoW Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 6/8] fs/dax: dedup file range to use a compare function Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-28 1:13 ` kbuild test robot
2020-04-27 8:47 ` [RFC PATCH 7/8] fs/xfs: handle CoW for fsdax write() path Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-27 8:47 ` [RFC PATCH 8/8] fs/xfs: support dedupe for fsdax Shiyang Ruan
2020-04-27 8:47 ` Shiyang Ruan
2020-04-28 4:36 ` kbuild test robot
2020-04-27 12:28 ` [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Matthew Wilcox
2020-04-27 12:28 ` Matthew Wilcox
2020-04-28 6:09 ` 回复: " Ruan, Shiyang
2020-04-28 6:09 ` Ruan, Shiyang
2020-04-28 6:43 ` Dave Chinner
2020-04-28 6:43 ` Dave Chinner
2020-04-28 9:32 ` Ruan Shiyang
2020-04-28 9:32 ` Ruan Shiyang
2020-04-28 11:16 ` Matthew Wilcox
2020-04-28 11:16 ` Matthew Wilcox
2020-04-28 11:24 ` Dave Chinner
2020-04-28 11:24 ` Dave Chinner
2020-04-28 15:37 ` Darrick J. Wong
2020-04-28 15:37 ` Darrick J. Wong
2020-04-28 22:02 ` Dave Chinner
2020-04-28 22:02 ` Dave Chinner
2020-06-04 7:37 ` Ruan Shiyang
2020-06-04 7:37 ` Ruan Shiyang
2020-06-04 14:51 ` Darrick J. Wong [this message]
2020-06-04 14:51 ` Darrick J. Wong
2020-06-05 1:30 ` Dave Chinner
2020-06-05 1:30 ` Dave Chinner
2020-06-05 2:30 ` Ruan Shiyang
2020-06-05 2:30 ` Ruan Shiyang
2020-06-05 2:11 ` Ruan Shiyang
2020-06-05 2:11 ` Ruan Shiyang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200604145107.GA1334206@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=qi.fuli@fujitsu.com \
--cc=rgoldwyn@suse.de \
--cc=ruansy.fnst@cn.fujitsu.com \
--cc=willy@infradead.org \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.