linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>, Namjae Jeon <linkinjeon@gmail.com>,
	viro@zeniv.linux.org.uk, bpm@sgi.com, adilger.kernel@dilger.ca,
	jack@suse.cz, mtk.manpages@gmail.com, lczerner@redhat.com,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Namjae Jeon <namjae.jeon@samsung.com>,
	Ashish Sangwan <a.sangwan@samsung.com>
Subject: Re: [PATCH v5 1/10] fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
Date: Wed, 26 Feb 2014 12:57:47 +1100	[thread overview]
Message-ID: <20140226015747.GN13647@dastard> (raw)
In-Reply-To: <alpine.LSU.2.11.1402251525370.2380@eggly.anvils>

On Tue, Feb 25, 2014 at 03:41:20PM -0800, Hugh Dickins wrote:
> On Mon, 24 Feb 2014, Dave Chinner wrote:
> > On Sat, Feb 22, 2014 at 09:06:25AM -0500, Theodore Ts'o wrote:
> > > On Wed, Feb 19, 2014 at 01:37:43AM +0900, Namjae Jeon wrote:
> > > > +	/*
> > > > +	 * There is no need to overlap collapse range with EOF, in which case
> > > > +	 * it is effectively a truncate operation
> > > > +	 */
> > > > +	if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
> > > > +	    (offset + len >= i_size_read(inode)))
> > > > +		return -EINVAL;
> > > > +
> > > 
> > > I wonder if we should just translate a collapse range that is
> > > equivalent to a truncate operation to, in fact, be a truncate
> > > operation?
> > 
> > Trying to collapse a range that extends beyond EOF, IMO, is likely
> > to only happen if the DVR/NLE application is buggy. Hence I think
> > that telling the application it is doing something that is likely to
> > be wrong is better than silently truncating the file....
> 
> I do agree with Ted on this point.  This is not an xfs ioctl added
> for one DVR/NLE application, it's a mode of a Linux system call.
> 
> We do not usually reject with an error when one system call happens
> to ask for something which can already be accomplished another way;
> nor nanny our callers.
> 
> It seems natural to me that COLLAPSE_RANGE should support beyond EOF;
> unless that adds significantly to implementation difficulties?

Yes, it does add to the implementation complexity significantly - it
adds data security issues that don't exist with the current API.

That is, Filesystems can have uninitialised blocks beyond EOF so
if we allow COLLAPSE_RANGE to shift them down within EOF, we now
have to ensure they are properly zeroed or marked as unwritten.

It also makes implementations more difficult. For example, XFS can
also have in-memory delayed allocation extents beyond EOF, and they
can't be brought into the range < EOF without either:

	a) inserting zeroed pages with appropriately set up
	and mapped bufferheads into the page cache for the range
	that sits within EOF; or
	b) truncating the delalloc extents beyond EOF before the
	move

So, really, the moment you go beyond EOF filesystems have to do
quite a bit more validation and IO in the context of the system
call. It no longer becomes a pure extent manipulation offload - it
becomes a data security problem.

And, indeed, the specification that we are working to is that the
applications that want to collapse the range of a file are using
this function instead of read/memcpy/write/truncate, which by
definition means they cannot shift ranges of the file beyond EOF
into the new file.

So IMO the API defines the functionality as required by the
applications that require it and *no more*. If you need some
different behaviour - we can add it via additional flags in future
when you have an application that requires it. 

> Actually, is it even correct to fail at EOF?  What if fallocation
> with FALLOC_FL_KEEP_SIZE was used earlier, to allocate beyond EOF:
> shouldn't it be possible to shift that allocation down, along with
> the EOF, rather than leave it behind as a stranded island?

It does get shifted down - it just remains beyond EOF, just like it
was before the operation. And that is part of the specification of
COLLAPSE_RANGE - it was done so that preallocation (physical or
speculative delayed allocation) beyond EOF to avoid fragmentation as
the DVR continues to write is not screwed up by chopping out earlier
parts of the file.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-02-26  1:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 16:37 [PATCH v5 1/10] fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate Namjae Jeon
2014-02-22 14:06 ` Theodore Ts'o
2014-02-23 21:36   ` Dave Chinner
2014-02-25 23:41     ` Hugh Dickins
2014-02-26  1:57       ` Dave Chinner [this message]
2014-02-26  5:25         ` Hugh Dickins
2014-02-26 10:04           ` Dave Chinner
2014-02-26 23:48             ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140226015747.GN13647@dastard \
    --to=david@fromorbit.com \
    --cc=a.sangwan@samsung.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=bpm@sgi.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=lczerner@redhat.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtk.manpages@gmail.com \
    --cc=namjae.jeon@samsung.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).