All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Besogonov, Aleksei" <cyberax@amazon.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: fallocate on XFS for swap
Date: Fri, 9 Mar 2018 17:17:07 -0800	[thread overview]
Message-ID: <20180310011707.GA4875@magnolia> (raw)
In-Reply-To: <20180310005850.GW18129@dastard>

On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > [you really ought to cc the xfs list]
> > 
> > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > Hi!
> > > 
> > > We’re working at Amazon on making XFS our default root filesystem for
> > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > that we’ve encountered is inability to use fallocated files for swap
> > > on XFS. This is really important for us, since we’re shipping our
> > > current Amazon Linux with hibernation support .
> > 
> > <shudder>
> > 
> > > I’ve traced the problem to bmap(), used in generic_swapfile_activate
> > > call, which returns 0 for blocks inside holes created by fallocate and
> > > Dave Chinner confirmed it in a private email. I’m thinking about ways
> > > to fix it, so far I see the following possibilities:
> > > 
> > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > this is an ABI change and it likely will break some obscure userspace
> > > utility somewhere.
> > 
> > bmap is a horrible interface, let's leave it to wither and eventually go
> > away.
> > 
> > > 2. Change generic_swap_activate to use a more modern interface, by
> > > adding fiemap-like operation to address_space_operations with fallback
> > > on bmap().
> > 
> > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > a chunk of file space to the kernel.  Leasing space to a user that wants
> > direct access is becoming rather common (rdma, map_sync, etc.)
> 
> thing is, we don't want in-kernel users of fiemap. We've got other
> block mapping interfaces that can be used, such as iomap...

Well yes, I was clumsily trying to suggest reimplementing
generic_swap_activate with an iomap backend replacing/augmenting the old
get_blocks thing... :)

> > > 3. Add an XFS-specific implementation of swapfile_activate.
> > 
> > Ugh no.
> 
> What we want is an iomap-based re-implementation of
> generic_swap_activate(). One of the ways to plumb that in is to
> use ->swapfile_activate() like so:

Is this distinct from the ->swap_activate function pointer in
address_operations or a new one?  I think it'd be best to have it be a
separate callback like you suggest:

> iomap_swapfile_activate()
> {
> 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> }
> 
> xfs_vm_swapfile_activate()
> {
> 	return iomap_swapfile_activate(xfs_iomap_ops);
> }
> 
> 	.swapfile_activate = xfs_vm_swapfile_activate()
> 
> And massage the swapfile_activate callout be friendly to fragmented
> files. i.e. change the nfs caller to run a
> "add_single_swap_extent()" caller rather than have to do it in the
> generic code on return....

But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
and generic_swapfile_activate.  Not sure what's needed to clean up the
other filesystems to use a single mapping interface, though.

> IOWs, I think the choices we have are to either re-implement
> generic_swapfile_activate() and then be stuck with using get_block
> style interfaces forever in XFS, or we use the filesystem specific
> callout to implement more advanced generic support using the
> filesystem supplied get_block/iomap interfaces for block mapping
> like we do for everything else that the VM needs the filesystem to
> do....

Yes, that's what I was trying to nudge Mr. Besogonov towards, though not
as clearly as you've put it.  Thanks. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Besogonov, Aleksei" <cyberax@amazon.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: fallocate on XFS for swap
Date: Fri, 9 Mar 2018 17:17:07 -0800	[thread overview]
Message-ID: <20180310011707.GA4875@magnolia> (raw)
In-Reply-To: <20180310005850.GW18129@dastard>

On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > [you really ought to cc the xfs list]
> > 
> > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > Hi!
> > > 
> > > Wea??re working at Amazon on making XFS our default root filesystem for
> > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > that wea??ve encountered is inability to use fallocated files for swap
> > > on XFS. This is really important for us, since wea??re shipping our
> > > current Amazon Linux with hibernation support .
> > 
> > <shudder>
> > 
> > > Ia??ve traced the problem to bmap(), used in generic_swapfile_activate
> > > call, which returns 0 for blocks inside holes created by fallocate and
> > > Dave Chinner confirmed it in a private email. Ia??m thinking about ways
> > > to fix it, so far I see the following possibilities:
> > > 
> > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > this is an ABI change and it likely will break some obscure userspace
> > > utility somewhere.
> > 
> > bmap is a horrible interface, let's leave it to wither and eventually go
> > away.
> > 
> > > 2. Change generic_swap_activate to use a more modern interface, by
> > > adding fiemap-like operation to address_space_operations with fallback
> > > on bmap().
> > 
> > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > a chunk of file space to the kernel.  Leasing space to a user that wants
> > direct access is becoming rather common (rdma, map_sync, etc.)
> 
> thing is, we don't want in-kernel users of fiemap. We've got other
> block mapping interfaces that can be used, such as iomap...

Well yes, I was clumsily trying to suggest reimplementing
generic_swap_activate with an iomap backend replacing/augmenting the old
get_blocks thing... :)

> > > 3. Add an XFS-specific implementation of swapfile_activate.
> > 
> > Ugh no.
> 
> What we want is an iomap-based re-implementation of
> generic_swap_activate(). One of the ways to plumb that in is to
> use ->swapfile_activate() like so:

Is this distinct from the ->swap_activate function pointer in
address_operations or a new one?  I think it'd be best to have it be a
separate callback like you suggest:

> iomap_swapfile_activate()
> {
> 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> }
> 
> xfs_vm_swapfile_activate()
> {
> 	return iomap_swapfile_activate(xfs_iomap_ops);
> }
> 
> 	.swapfile_activate = xfs_vm_swapfile_activate()
> 
> And massage the swapfile_activate callout be friendly to fragmented
> files. i.e. change the nfs caller to run a
> "add_single_swap_extent()" caller rather than have to do it in the
> generic code on return....

But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
and generic_swapfile_activate.  Not sure what's needed to clean up the
other filesystems to use a single mapping interface, though.

> IOWs, I think the choices we have are to either re-implement
> generic_swapfile_activate() and then be stuck with using get_block
> style interfaces forever in XFS, or we use the filesystem specific
> callout to implement more advanced generic support using the
> filesystem supplied get_block/iomap interfaces for block mapping
> like we do for everything else that the VM needs the filesystem to
> do....

Yes, that's what I was trying to nudge Mr. Besogonov towards, though not
as clearly as you've put it.  Thanks. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2018-03-10  1:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-09 22:05 fallocate on XFS for swap Besogonov, Aleksei
2018-03-09 23:44 ` Darrick J. Wong
2018-03-09 23:44   ` Darrick J. Wong
2018-03-10  0:58   ` Dave Chinner
2018-03-10  0:58     ` Dave Chinner
2018-03-10  1:17     ` Darrick J. Wong [this message]
2018-03-10  1:17       ` Darrick J. Wong
2018-03-10  1:36       ` Dave Chinner
2018-03-10  1:36         ` Dave Chinner
2018-03-12 22:01         ` Besogonov, Aleksei
2018-03-13  1:31           ` Dave Chinner
2018-03-10  9:38     ` Christoph Hellwig
2018-03-12 21:46       ` Dave Chinner
2018-03-13  7:14         ` Christoph Hellwig
2018-03-12 18:40     ` Besogonov, Aleksei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180310011707.GA4875@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=cyberax@amazon.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.