All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca,
	sandeen@redhat.com, david@fromorbit.com
Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls
Date: Mon, 4 Feb 2019 10:27:22 -0800	[thread overview]
Message-ID: <20190204182722.GA32119@magnolia> (raw)
In-Reply-To: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com>

On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote:
> Hi, Sorry for the long delay Darrick.
> 
> > > +	fextent.fe_logical = 0;
> > > +	fextent.fe_physical = 0;
> > > +	fieinfo.fi_extents_max = 1;
> > > +	fieinfo.fi_extents_mapped = 0;
> > > +	fieinfo.fi_extents_start = &fextent;
> > > +	fieinfo.fi_start = start;
> > > +	fieinfo.fi_len = 1 << inode->i_blkbits;
> > > +	fieinfo.fi_flags = 0;
> > > +	fieinfo.fi_cb = fiemap_fill_kernel_extent;
> > > +
> > > +	error = inode->i_op->fiemap(inode, &fieinfo);
> > > +
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN |
> > > +				FIEMAP_EXTENT_ENCODED |
> > > +				FIEMAP_EXTENT_DATA_INLINE |
> > > +				FIEMAP_EXTENT_UNWRITTEN))
> > > +		return -EINVAL;
> > 
> > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2,
> > and btrfs) do not return bmap results for files with shared blocks.
> > This check here should include FIEMAP_EXTENT_SHARED since external
> > overwrites of a COW file block are bad news on btrfs (and ocfs2 and
> > xfs).
> 
> ok, np
> 
> > 
> > > +
> > > +	*block = (fextent.fe_physical +
> > > +		  (start - fextent.fe_logical)) >> inode->i_blkbits;
> > 
> > Hmmm, so there's nothing here checking that the physical device fiemap
> > reports is the same device that was passed into the mount.  This is
> > trivially true for most of the filesystems that implement bmap and
> > fiemap, but definitely not true for xfs or btrfs.  I would bet most
> > userspace callers of bmap (since it's an ext2-era ioctl) make that
> > assumption and don't even know how to find the device.
> 
> Makes sense.
> 
> > 
> > On xfs, the bmap implementation won't return any results for realtime
> > files, but it looks as though we suddenly will start doing that here,
> > because in the new bmap implementation we will use fiemap, and fiemap
> > reports extents without providing any context about which device they're
> > on, and that context-less extent gets passed back to bmap_fiemap.
> > 
> > In any case, I think a better solution to the multi-device problem is to
> > start returning device information via struct fiemap_extent, at least
> > inside the kernel.  Use one of the reserved fields to declare a new
> > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t
> > device number, and then you can check that against inode->i_sb->s_bdev
> > to avoid returning results for the non-primary device of a multi-device
> > filesystem.
> 
> I agree we should address it here, but I don't think fiemap_extent is the right
> place for it, it is linked to the UAPI, and changing it is usually not a good
> idea.

Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields
into some sort of dev_t/per-device cookie should be fine.  Userspace
shouldn't be expecting any meaning in reserved areas.

> I think I got your idea anyway, but, what if, instead returning the bdev in
> fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to
> idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do
> with such information?

I don't like the idea of adding a FIEMAP_FLAG to distinguish callers.

--D

> > 
> > > +
> > > +	return error;
> > > +}
> > > +
> > >  /**
> > >   *	bmap	- find a block number in a file
> > >   *	@inode:  inode owning the block number being requested
> > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput);
> > >   */
> > >  int bmap(struct inode *inode, sector_t *block)
> > >  {
> > > -	if (!inode->i_mapping->a_ops->bmap)
> > > +	if (inode->i_op->fiemap)
> > > +		return bmap_fiemap(inode, block);
> > > +	else if (inode->i_mapping->a_ops->bmap)
> > > +		*block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
> > > +						       *block);
> > > +	else
> > >  		return -EINVAL;
> > 
> > Waitaminute.  btrfs currently supports fiemap but not bmap, and now
> > suddenly it will support this legacy interface they've never supported
> > before.  Are they on board with this?
> > 
> > --D
> > 
> > >  
> > > -	*block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block);
> > >  	return 0;
> > >  }
> > >  EXPORT_SYMBOL(bmap);
> > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > index 6086978fe01e..bfa59df332bf 100644
> > > --- a/fs/ioctl.c
> > > +++ b/fs/ioctl.c
> > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > >  	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > >  }
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > > +			    u64 phys, u64 len, u32 flags)
> > > +{
> > > +	struct fiemap_extent *extent = fieinfo->fi_extents_start;
> > > +
> > > +	/* only count the extents */
> > > +	if (fieinfo->fi_extents_max == 0) {
> > > +		fieinfo->fi_extents_mapped++;
> > > +		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +	}
> > > +
> > > +	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
> > > +		return 1;
> > > +
> > > +	if (flags & SET_UNKNOWN_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_UNKNOWN;
> > > +	if (flags & SET_NO_UNMOUNTED_IO_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_ENCODED;
> > > +	if (flags & SET_NOT_ALIGNED_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_NOT_ALIGNED;
> > > +
> > > +	extent->fe_logical = logical;
> > > +	extent->fe_physical = phys;
> > > +	extent->fe_length = len;
> > > +	extent->fe_flags = flags;
> > > +
> > > +	fieinfo->fi_extents_mapped++;
> > > +
> > > +	if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
> > > +		return 1;
> > > +	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +}
> > >  /**
> > >   * fiemap_fill_next_extent - Fiemap helper function
> > >   * @fieinfo:	Fiemap context passed into ->fiemap
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 7a434979201c..28bb523d532a 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info {
> > >  	fiemap_fill_cb	fi_cb;
> > >  };
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical,
> > > +			      u64 phys, u64 len, u32 flags);
> > >  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> > >  			    u64 phys, u64 len, u32 flags);
> > >  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> > > -- 
> > > 2.17.2
> > > 
> 
> -- 
> Carlos

  reply	other threads:[~2019-02-04 18:27 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05  9:17 [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Carlos Maiolino
2018-12-05  9:17 ` [PATCH 01/10] fs: Enable bmap() function to properly return errors Carlos Maiolino
2018-12-05  9:17 ` [PATCH 02/10] cachefiles: drop direct usage of ->bmap method Carlos Maiolino
2018-12-05  9:17 ` [PATCH 03/10] ecryptfs: drop direct calls to ->bmap Carlos Maiolino
2018-12-05  9:17 ` [PATCH 04/10 V2] fibmap: Use bmap instead of ->bmap method in ioctl_fibmap Carlos Maiolino
2019-01-14 16:49   ` Christoph Hellwig
2019-02-04 11:34     ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 05/10] fs: Move start and length fiemap fields into fiemap_extent_info Carlos Maiolino
2019-01-14 16:50   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 06/10] iomap: Remove length and start fields from iomap_fiemap Carlos Maiolino
2019-01-14 16:51   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 07/10] fs: Use a void pointer to store fiemap_extent Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 08/10 V2] fiemap: Use a callback to fill fiemap extents Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Carlos Maiolino
2018-12-05 17:36   ` Darrick J. Wong
2018-12-07  9:09     ` Carlos Maiolino
2018-12-07 20:14       ` Andreas Dilger
2019-02-04 15:11     ` Carlos Maiolino
2019-02-04 18:27       ` Darrick J. Wong [this message]
2019-02-06 13:37         ` Carlos Maiolino
2019-02-06 20:44           ` Darrick J. Wong
2019-02-06 21:13             ` Andreas Dilger
2019-02-07  9:52               ` Carlos Maiolino
2019-02-08  8:43                 ` Christoph Hellwig
2019-02-11 12:57                   ` Christoph Hellwig
2019-02-11 16:21                     ` Carlos Maiolino
2019-02-11 16:48                       ` Christoph Hellwig
2019-02-07 11:59             ` Carlos Maiolino
2019-02-07 17:02               ` Darrick J. Wong
2019-02-07 21:25                 ` Andreas Dilger
2019-02-08  8:46                   ` Christoph Hellwig
2019-02-08 10:36                     ` Carlos Maiolino
2019-02-08 21:03                       ` Andreas Dilger
2019-02-08  9:08                   ` Carlos Maiolino
2019-02-08  9:03                 ` Carlos Maiolino
2019-02-07 12:36             ` Carlos Maiolino
2019-02-07 18:16               ` Darrick J. Wong
2019-02-08  8:58                 ` Carlos Maiolino
2019-02-06 21:04           ` Andreas Dilger
2019-01-14 16:56   ` Christoph Hellwig
2019-02-05  9:56     ` Carlos Maiolino
2019-02-05 18:25       ` Christoph Hellwig
2019-02-06  9:50         ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 10/10] xfs: Get rid of ->bmap Carlos Maiolino
2018-12-05 17:37   ` Darrick J. Wong
2018-12-06 13:06     ` Carlos Maiolino
2018-12-06 18:56 ` [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Andreas Grünbacher
2018-12-07  9:34   ` Carlos Maiolino
2019-01-14 16:50     ` Christoph Hellwig
2019-01-14 17:56       ` Andreas Grünbacher
2019-01-14 17:58         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204182722.GA32119@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=adilger@dilger.ca \
    --cc=cmaiolino@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.