From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750985AbdAURwX (ORCPT ); Sat, 21 Jan 2017 12:52:23 -0500 Received: from verein.lst.de ([213.95.11.211]:49619 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbdAURwO (ORCPT ); Sat, 21 Jan 2017 12:52:14 -0500 Date: Sat, 21 Jan 2017 18:52:12 +0100 From: Christoph Hellwig To: Matthew Wilcox Cc: Dan Williams , "linux-nvdimm@lists.01.org" , Brian Boylston , Tony Luck , Jan Kara , Toshi Kani , Mike Snitzer , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , Jeff Moyer , Christoph Hellwig , Jens Axboe , "dm-devel@redhat.com" , Ingo Molnar , Al Viro , "H. Peter Anvin" , "linux-fsdevel@vger.kernel.org" , Thomas Gleixner , Linus Torvalds , Ross Zwisler Subject: Re: [PATCH 00/13] dax, pmem: move cpu cache maintenance to libnvdimm Message-ID: <20170121175212.GA28180@lst.de> References: <148488421301.37913.12835362165895864897.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 21, 2017 at 04:28:52PM +0000, Matthew Wilcox wrote: > Of course, there may not be a backing device either! s/backing device/block device/ ? If so fully agreed. I like the dax_ops scheme, but we should go all the way and detangle it from the block device. I already brought up this issue with the fallback to direct I/O on I/O error series. > I see two possible routes here: > > 1. Add a new address_space_operation: > > const struct dax_operations *(*get_dax_ops)(struct address_space *); > > 2. Add two of the dax_operations to address_space_operations: > > size_t (*copy_from_iter)(struct address_space *, void *, size_t, struct iov_iter *); > void (*flush)(struct address_space *, void *, size_t); > (we won't need ->direct_access as an address_space op because that'll be handled a different way in the brave new world that supports non-bdev-based filesystems) And both of them are wrong. The write_begin/write_end mistake notwithstanding address_space ops are operations the VM can call without knowing things like fs locking contexts. The above on the other hand are device operations provided by the low-level driver, similar to block_device operations. So what we need is to have a way to mount a dax device as a file system, similar to how we support that for block or MTD devices and can then call methods on it. For now this will be a bit complicated because all current DAX-aware file systems also still need block device for the metadata path, so we can't just say you mount either a DAX or block device. But I think we should aim for mounting a DAX device as the primary use case, and then deal with block device emulation as a generic DAX layer thing, similarly how we implement (bad in the rw case) block devices on top of MTD.