linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <clm@fb.com>,
	linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: Re: [PATCH 4/6] Btrfs: add DAX support for nocow btrfs
Date: Fri, 9 Dec 2016 10:38:44 -0800	[thread overview]
Message-ID: <20161209183844.GA28006@localhost.localdomain> (raw)
In-Reply-To: <20161209123103.GA10957@quack2.suse.cz>

On Fri, Dec 09, 2016 at 01:31:03PM +0100, Jan Kara wrote:
> On Thu 08-12-16 08:45:39, Liu Bo wrote:
> > On Thu, Dec 08, 2016 at 11:47:41AM +0100, Jan Kara wrote:
> > > On Wed 07-12-16 17:15:42, Chris Mason wrote:
> > > > On 12/07/2016 04:45 PM, Liu Bo wrote:
> > > > >This has implemented DAX support for btrfs with nocow and single-device.
> > > > >
> > > > >DAX is developed for block devices that are memory-like in order to avoid
> > > > >double buffer in both page cache and the storage, so DAX can performs reads and
> > > > >writes directly to the storage device, and for those who prefer to using
> > > > >filesystem, filesystem dax support can help to map the storage into userspace
> > > > >for file-mapping.
> > > > >
> > > > >Since I haven't figure out how to map multiple devices to userspace without
> > > > >pagecache, this DAX support is only for single-device, and I don't think
> > > > >DAX(Direct Access) can work with cow, this is limited to nocow case.  I made
> > > > >this by setting nodatacow in dax mount option.
> > > > 
> > > > Interesting, this is a nice small start.  It might make more sense to limit
> > > > snapshots to readonly in DAX mode until we can figure out how to cow
> > > > properly.  I think it can be done, I just need to sit down with the dax code
> > > > to do a good review.
> > > > 
> > > > But bigger picture, if we can't cow and we can't crc and we can't
> > > > multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull
> > > > in more of the btrfs features too.
> > > 
> > > So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so
> > > I don't think there would be any obstacle to support all the features with
> > > that.
> > 
> > For DAX IO via read(2)/write(2), cow is OK while the mutliple devices is
> > a problem as currently iomap_dax_actor only takes one <device, blocknum>
> > pair:
> > 
> > - raid 0, one device is written once a time
> > - raid 1/10 and others, 2 or more devices need to be written each time
> 
> OK, but how do you cope with direct IO for multiple devices then? Do you
> just disallow it? That's the same issue AFAICS.

Direct IO takes advantage of how btrfs maps bios to different devices
before submitting them, I'll try to modify iomap_begin and
iomap_dax_actor to cope with more than one <dev, bno> pairs.

> 
> > > For mmap(2) things get more difficult but still: The filesystem gets
> > > normal ->fault notifications when the page is first faulted in. So you
> > > can COW if you need to at that moment.
> > 
> > Right.
> > 
> > > Also DAX PTEs can be write-protected (well, as of the coming merge
> > > window) as normal PTEs and then you'll get ->pfn_mkwrite /
> > > ->page_mkwrite notification when someone tries to write via mmap and
> > > you can do your stuff at that point.
> > 
> > That's right, but I think the problem comes from the fact that only
> > ->fault with FAULT_FLAG_WRITE gets to space allocation where we could
> > cow to new location.
> > 
> > For page_mkwrite, btrfs does cow while writing back a dirty page, but
> > dax doesn't do delayed allocation so dax_writeback_one doesn't have
> > place to do cow.
> 
> Yes, so you'd have to change this logic so that for DAX COW happens already
> on page_mkwrite() time (when iomap_begin() handler is called to prepare
> blocks for writing at given file offset) and not at write back time.

Right, just realized that I got a wrong impression that we could do
->page_mkwrite on a dirtied page so that I was worried about the race
if several callers call ->page_mkwrite, but now I'm OK and ready to go.

Thank you, Jan, for the suggestion.

Thanks,

-liubo

  reply	other threads:[~2016-12-09 18:38 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 21:45 [PATCH 0/6] btrfs dax IO Liu Bo
2016-12-07 21:45 ` [PATCH 1/6] Btrfs: add mount option for dax Liu Bo
2016-12-08  2:44   ` kbuild test robot
2016-12-09  4:47   ` Dave Chinner
2016-12-09 18:41     ` Liu Bo
2016-12-09 21:58       ` Dave Chinner
2016-12-07 21:45 ` [PATCH 2/6] Btrfs: set single device limit for dax usecase Liu Bo
2016-12-08 13:35   ` David Sterba
2016-12-08 15:19     ` Liu Bo
2016-12-07 21:45 ` [PATCH 3/6] Btrfs: refactor btrfs_file_write_iter Liu Bo
2016-12-08  0:44   ` kbuild test robot
2016-12-07 21:45 ` [PATCH 4/6] Btrfs: add DAX support for nocow btrfs Liu Bo
2016-12-07 22:15   ` Chris Mason
2016-12-07 22:51     ` Liu Bo
2016-12-08 10:47     ` Jan Kara
2016-12-08 16:45       ` Liu Bo
2016-12-09 12:31         ` Jan Kara
2016-12-09 18:38           ` Liu Bo [this message]
2016-12-08  1:16   ` kbuild test robot
2016-12-08  2:19     ` Janos Toth F.
2016-12-08  2:30   ` kbuild test robot
2016-12-09  5:13   ` Dave Chinner
2016-12-09 14:23     ` Chris Mason
2016-12-07 21:45 ` [PATCH 5/6] Btrfs: add mmap_sem to avoid race between page faults and truncate/hole_punch Liu Bo
2016-12-07 21:45 ` [PATCH 6/6] Btrfs: add tracepoint for btrfs_get_blocks_dax_fault Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161209183844.GA28006@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).