linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] dax: allow DAX to look up an inode's block device
       [not found]   ` <CAPcyv4gtcL_WwZpmiAUhO5h4q3YyXinkpz9SwKm5SBA9-1kE9Q@mail.gmail.com>
@ 2016-02-02 23:39     ` Dan Williams
  2016-02-02 23:52       ` Matthew Wilcox
  0 siblings, 1 reply; 2+ messages in thread
From: Dan Williams @ 2016-02-02 23:39 UTC (permalink / raw)
  To: Al Viro
  Cc: Ross Zwisler, linux-kernel@vger.kernel.org, J. Bruce Fields,
	Andrew Morton, Dave Chinner, Jan Kara, Jeff Layton,
	Matthew Wilcox, linux-fsdevel, linux-nvdimm, XFS Developers,
	linux-btrfs

[ adding btrfs, resend with the correct list address  ]

On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
>
>> However, for raw block devices and for XFS with a real-time device, the
>> value in inode->i_sb->s_bdev is not correct.  With the code as it is
>> currently written, an fsync or msync to a DAX enabled raw block device will
>> cause a NULL pointer dereference kernel BUG.  For this to work correctly we
>> need to ask the block device or filesystem what struct block_device is
>> appropriate for our inode.
>>
>> To that end, add a get_bdev(struct inode *) entry point to struct
>> super_operations.  If this function pointer is non-NULL, this notifies DAX
>> that it needs to use it to look up the correct block_device.  If
>> i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
>
> Umm...  It assumes that bdev will stay pinned for as long as inode is
> referenced, presumably?  If so, that needs to be documented (and verified
> for existing fs instances).  In principle, multi-disk fs might want to
> support things like "silently move the inodes backed by that disk to other
> ones"...

I assume btrfs is the only fs we have that might reassign the bdev for
a given inode on the fly?  Hopefully we don't need anything stronger
than rcu_read_lock() to pin the result as valid.

At least in this case the initial user is dax-fsync where the
->get_bdev() answer should be static for the life of the inode, and
btrfs does not currently interface with dax.  But yes, we need to get
the expected semantics clear.

On Tue, Feb 2, 2016 at 3:38 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> [ adding btrfs ]
>
> On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
>>
>>> However, for raw block devices and for XFS with a real-time device, the
>>> value in inode->i_sb->s_bdev is not correct.  With the code as it is
>>> currently written, an fsync or msync to a DAX enabled raw block device will
>>> cause a NULL pointer dereference kernel BUG.  For this to work correctly we
>>> need to ask the block device or filesystem what struct block_device is
>>> appropriate for our inode.
>>>
>>> To that end, add a get_bdev(struct inode *) entry point to struct
>>> super_operations.  If this function pointer is non-NULL, this notifies DAX
>>> that it needs to use it to look up the correct block_device.  If
>>> i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
>>
>> Umm...  It assumes that bdev will stay pinned for as long as inode is
>> referenced, presumably?  If so, that needs to be documented (and verified
>> for existing fs instances).  In principle, multi-disk fs might want to
>> support things like "silently move the inodes backed by that disk to other
>> ones"...
>
> I assume btrfs is the only fs we have that might reassign the bdev for
> a given inode on the fly?  Hopefully we don't need anything stronger
> than rcu_read_lock() to pin the result as valid.
>
> At least in this case the initial user is dax-fsync where the
> ->get_bdev() answer should be static for the life of the inode, and
> btrfs does not currently interface with dax.  But yes, we need to get
> the expected semantics clear.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] dax: allow DAX to look up an inode's block device
  2016-02-02 23:39     ` [PATCH] dax: allow DAX to look up an inode's block device Dan Williams
@ 2016-02-02 23:52       ` Matthew Wilcox
  0 siblings, 0 replies; 2+ messages in thread
From: Matthew Wilcox @ 2016-02-02 23:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Al Viro, Ross Zwisler, linux-kernel@vger.kernel.org,
	J. Bruce Fields, Andrew Morton, Dave Chinner, Jan Kara,
	Jeff Layton, linux-fsdevel, linux-nvdimm, XFS Developers,
	linux-btrfs

On Tue, Feb 02, 2016 at 03:39:15PM -0800, Dan Williams wrote:
> On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
> >> However, for raw block devices and for XFS with a real-time device, the
> >> value in inode->i_sb->s_bdev is not correct.  With the code as it is
> >> currently written, an fsync or msync to a DAX enabled raw block device will
> >> cause a NULL pointer dereference kernel BUG.  For this to work correctly we
> >> need to ask the block device or filesystem what struct block_device is
> >> appropriate for our inode.
> >>
> >> To that end, add a get_bdev(struct inode *) entry point to struct
> >> super_operations.  If this function pointer is non-NULL, this notifies DAX
> >> that it needs to use it to look up the correct block_device.  If
> >> i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
> >
> > Umm...  It assumes that bdev will stay pinned for as long as inode is
> > referenced, presumably?  If so, that needs to be documented (and verified
> > for existing fs instances).  In principle, multi-disk fs might want to
> > support things like "silently move the inodes backed by that disk to other
> > ones"...
> 
> I assume btrfs is the only fs we have that might reassign the bdev for
> a given inode on the fly?  Hopefully we don't need anything stronger
> than rcu_read_lock() to pin the result as valid.
> 
> At least in this case the initial user is dax-fsync where the
> ->get_bdev() answer should be static for the life of the inode, and
> btrfs does not currently interface with dax.  But yes, we need to get
> the expected semantics clear.

Let's be clear though.  ->get_bdev is a temporary hack.  The need for
it goes away when DAX doesn't rely on being on a block_device any more.
I don't expect it to live longer than six months.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-02-02 23:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1454454702-11889-1-git-send-email-ross.zwisler@linux.intel.com>
     [not found] ` <20160202231931.GR17997@ZenIV.linux.org.uk>
     [not found]   ` <CAPcyv4gtcL_WwZpmiAUhO5h4q3YyXinkpz9SwKm5SBA9-1kE9Q@mail.gmail.com>
2016-02-02 23:39     ` [PATCH] dax: allow DAX to look up an inode's block device Dan Williams
2016-02-02 23:52       ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).