linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <clm@fb.com>,
	linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: Re: [PATCH 4/6] Btrfs: add DAX support for nocow btrfs
Date: Thu, 8 Dec 2016 08:45:39 -0800	[thread overview]
Message-ID: <20161208164539.GB20111@localhost.localdomain> (raw)
In-Reply-To: <20161208104741.GB4049@quack2.suse.cz>

On Thu, Dec 08, 2016 at 11:47:41AM +0100, Jan Kara wrote:
> On Wed 07-12-16 17:15:42, Chris Mason wrote:
> > On 12/07/2016 04:45 PM, Liu Bo wrote:
> > >This has implemented DAX support for btrfs with nocow and single-device.
> > >
> > >DAX is developed for block devices that are memory-like in order to avoid
> > >double buffer in both page cache and the storage, so DAX can performs reads and
> > >writes directly to the storage device, and for those who prefer to using
> > >filesystem, filesystem dax support can help to map the storage into userspace
> > >for file-mapping.
> > >
> > >Since I haven't figure out how to map multiple devices to userspace without
> > >pagecache, this DAX support is only for single-device, and I don't think
> > >DAX(Direct Access) can work with cow, this is limited to nocow case.  I made
> > >this by setting nodatacow in dax mount option.
> > 
> > Interesting, this is a nice small start.  It might make more sense to limit
> > snapshots to readonly in DAX mode until we can figure out how to cow
> > properly.  I think it can be done, I just need to sit down with the dax code
> > to do a good review.
> > 
> > But bigger picture, if we can't cow and we can't crc and we can't
> > multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull
> > in more of the btrfs features too.
> 
> So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so
> I don't think there would be any obstacle to support all the features with
> that.

For DAX IO via read(2)/write(2), cow is OK while the mutliple devices is
a problem as currently iomap_dax_actor only takes one <device, blocknum>
pair:

- raid 0, one device is written once a time
- raid 1/10 and others, 2 or more devices need to be written each time

> For mmap(2) things get more difficult but still: The filesystem gets
> normal ->fault notifications when the page is first faulted in. So you
> can COW if you need to at that moment.

Right.

> Also DAX PTEs can be write-protected (well, as of the coming merge
> window) as normal PTEs and then you'll get ->pfn_mkwrite /
> ->page_mkwrite notification when someone tries to write via mmap and
> you can do your stuff at that point.

That's right, but I think the problem comes from the fact that only
->fault with FAULT_FLAG_WRITE gets to space allocation where we could
cow to new location.

For page_mkwrite, btrfs does cow while writing back a dirty page, but
dax doesn't do delayed allocation so dax_writeback_one doesn't have
place to do cow.

Also thank you for the great write-protected patch, since another reason
I decided to disable cow is that there is no write-protected on DAX
PTEs, so without that even if we can do cow, we don't have a way to
update every pte pointing to our cow'd dax pfn.

> So DAX mappings are not that
> different from filesystem point of view. There are some differences wrt.
> locking (you don't have page lock, but you use a lock bit in radix tree
> entry instead for that) but that's about it. So I don't see a principial
> reason why we cannot support all btrfs features for DAX... But if you see
> some problem, let me know and we can talk if we could somehow help from the
> DAX side.

Yeah, looks like we have two problems at least, one is dax_writeback_one
and the other is iomap.

> 
> BTW, I also don't see how the multiple devices are a problem. Actually XFS
> supports that (with its real-time devices) just fine - your ->iomap_begin()
> returns a <device, blocknumber> pair and that should be all that's needed,
> no?

xfs is a bit different, it only writes to one device at a time, sort of
a raid0.

Thanks,

-liubo

  reply	other threads:[~2016-12-08 16:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 21:45 [PATCH 0/6] btrfs dax IO Liu Bo
2016-12-07 21:45 ` [PATCH 1/6] Btrfs: add mount option for dax Liu Bo
2016-12-08  2:44   ` kbuild test robot
2016-12-09  4:47   ` Dave Chinner
2016-12-09 18:41     ` Liu Bo
2016-12-09 21:58       ` Dave Chinner
2016-12-07 21:45 ` [PATCH 2/6] Btrfs: set single device limit for dax usecase Liu Bo
2016-12-08 13:35   ` David Sterba
2016-12-08 15:19     ` Liu Bo
2016-12-07 21:45 ` [PATCH 3/6] Btrfs: refactor btrfs_file_write_iter Liu Bo
2016-12-08  0:44   ` kbuild test robot
2016-12-07 21:45 ` [PATCH 4/6] Btrfs: add DAX support for nocow btrfs Liu Bo
2016-12-07 22:15   ` Chris Mason
2016-12-07 22:51     ` Liu Bo
2016-12-08 10:47     ` Jan Kara
2016-12-08 16:45       ` Liu Bo [this message]
2016-12-09 12:31         ` Jan Kara
2016-12-09 18:38           ` Liu Bo
2016-12-08  1:16   ` kbuild test robot
2016-12-08  2:19     ` Janos Toth F.
2016-12-08  2:30   ` kbuild test robot
2016-12-09  5:13   ` Dave Chinner
2016-12-09 14:23     ` Chris Mason
2016-12-07 21:45 ` [PATCH 5/6] Btrfs: add mmap_sem to avoid race between page faults and truncate/hole_punch Liu Bo
2016-12-07 21:45 ` [PATCH 6/6] Btrfs: add tracepoint for btrfs_get_blocks_dax_fault Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208164539.GB20111@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).