From: Ted Ts'o <tytso-3s7WtUTddSA@public.gmane.org>
To: Yongqiang Yang <xiaoqiangnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
Eric Sandeen <sandeen-+82itfer+wXR7s880joybQ@public.gmane.org>,
xfs-oss <xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org>,
"coreutils-mXXj517/zsQ@public.gmane.org"
<coreutils-mXXj517/zsQ@public.gmane.org>,
"linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Markus Trippelsdorf
<markus-xp2qqqlHh3xzoYq+O6RWwA@public.gmane.org>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
Date: Mon, 18 Apr 2011 22:59:49 -0400 [thread overview]
Message-ID: <20110419025949.GA3030@thunk.org> (raw)
In-Reply-To: <BANLkTin=WEpSf6ddiOMNMOpCPP-wiEttSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> wrote:
> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
> > only do anything if there is unwritten data, which is the only
> > case we are concerned with at this point. In any case, this is a
> > simple solution for coreutils until such a time that a more
> > complex solution is added in the kernel (if ever).
I would recommend that coreutils check i_blocks and i_size and only
try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
sparse. That's because FIEMAP_FLAG_SYNC will do the effectively
equivalent of an fsync() system call. Otherwise, in the case of a
freshly untar'ed directory hierarchy which is then copied using "cp
-r", cp would end up calling fsync() for each file in the directory,
with the disastrous performance result that one might expect.
If cp only tries the fiemap optimization on files that appear to be
sparse, it should avoid this problem.
> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >
> > I don't see how this will change the problem in any meaningful way. There
> > will still need to be code that is traversing the on-disk mapping, and also
> > keeping it coherent with unwritten data in the page cache.
The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
fsync() of the data.
> It seems that we are being messed up by page cache and disk.
> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> written, but it does not say if there is data in page cache. So
> FIEMAP itself just tells user the map on disk. However there is an
> exception for delayed allocation, FIEMAP tells users the data is in
> page cache.
>
> Maybe FIEMAP should return all known messages for unwritten extent, if
> unwritten data exists in page cache, FIEMAP should let users know that
> data is in page cache and space on disk has been preallocated, but
> data has not been flushed into disk. Actually, delayed allocation has
> done like this. Then user-space applications can determine how to do.
> Taking cp as an example, it will copy from page cache rather ignore
> it.
>
> We need a definite definition for FIEMAP, in other words, it tells
> users map on disk or both disk and page cache.
>
> If the former one is taken, then FIEMAP should not consider delayed
> allocation. otherwise, FIEMAP should return all known messages for
> unwritten case like delayed allocation.
The fact that the FIEMAP interface deifnition includes an delayed
allocation bit could be a strong indication that unlike the XFS's bmap
interface, that this interface is supposed to return information
taking into account both on-disk and page cache information. If this
is the case, then even though there might be a single on-disk
(uninitialized) extent, if there are pages in the page cache that have
not yet been written out yet, but which are described by that on-disk
extent, then instead of returning a single struct fiemap_extent for
that on-disk extent, the fiemap ioctl would need to return multiple
struct fiemap_extents, where some would have the FIEMAP_UNWRITTEN bit,
and others would not (since data has been written to the page cache,
even if it hasn't been flushed to disk yet).
But yes, if we're going to make the case that the FIEMAP interface is
only intended to reflect the on-disk information, then the DELALLOC
bit shouldn't be returned at all, and we should deprecate it.
Anything else leads us to a inconsistent interface.
> > Since FIEMAP already exists for most Linux filesystems, it probably makes
> > sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
> > mapping in the first place.
Not if it means forcing an FIEMAP_FLAG_SYNC, which implies an fsync().
If the only way to get consistent data across ext4, btrfs, xfs,
etc. is to force userspace to issue a FIEMAP_FLAG_SYNC, then we need
to have a separate interface of SEEK_HOLE/SEEK_DATA that doesn't
require flushing data to the disk first.
Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
it's the only way to guarantee correct behaviour for XFS. But I would
really rather that be the long-term way we leave things!
- Ted
next prev parent reply other threads:[~2011-04-19 2:59 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110414102608.GA1678@x4.trippels.de>
[not found] ` <20110414120635.GB1678@x4.trippels.de>
[not found] ` <20110414140222.GB1679@x4.trippels.de>
[not found] ` <20110414140222.GB1679-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 14:59 ` Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Pádraig Brady
[not found] ` <4DA70BD3.1070409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 15:50 ` Eric Sandeen
[not found] ` <4DA717B2.3020305-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-14 15:52 ` Pádraig Brady
2011-04-14 15:56 ` Eric Sandeen
2011-04-14 16:03 ` Markus Trippelsdorf
2011-04-14 16:14 ` Eric Sandeen
[not found] ` <20110414160343.GA12787-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 16:21 ` Yongqiang Yang
[not found] ` <BANLkTimRxvBMp9M7zwiUY_UmmFOY5N58+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-14 16:28 ` Markus Trippelsdorf
2011-04-14 16:31 ` Eric Sandeen
2011-04-14 16:48 ` Markus Trippelsdorf
2011-04-14 16:49 ` Eric Sandeen
2011-04-14 16:04 ` Yongqiang Yang
2011-04-14 16:10 ` Yongqiang Yang
[not found] ` <BANLkTimoLeWMJgNFGW+zdeUeJyZ-_+8fMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-05 11:29 ` Pádraig Brady
2011-05-05 11:47 ` Yongqiang Yang
[not found] ` <4DA7182B.8050409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 17:27 ` Jim Meyering
2011-04-14 19:13 ` Pádraig Brady
[not found] ` <878vvcspz0.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-14 19:39 ` Jim Meyering
2011-04-14 22:59 ` Dave Chinner
2011-04-14 23:29 ` Pádraig Brady
2011-04-15 0:09 ` Dave Chinner
2011-04-15 5:01 ` Andreas Dilger
2011-04-16 0:50 ` Dave Chinner
2011-04-16 5:11 ` Andreas Dilger
2011-04-16 12:21 ` Theodore Tso
2011-04-18 0:40 ` Dave Chinner
2011-04-18 2:45 ` Andreas Dilger
2011-04-19 1:58 ` Yongqiang Yang
[not found] ` <BANLkTin=WEpSf6ddiOMNMOpCPP-wiEttSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-19 2:59 ` Ted Ts'o [this message]
[not found] ` <20110419025949.GA3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19 3:05 ` Eric Sandeen
[not found] ` <4DACFBEB.9040909-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-21 20:12 ` Jim Meyering
2011-04-19 3:30 ` Yongqiang Yang
2011-04-19 4:14 ` Dave Chinner
2011-04-19 5:27 ` Christoph Hellwig
2011-04-19 3:44 ` Dave Chinner
2011-04-19 6:53 ` Yongqiang Yang
2011-04-19 7:45 ` Dave Chinner
2011-04-19 8:11 ` Yongqiang Yang
2011-04-19 14:05 ` Eric Sandeen
2011-04-19 14:09 ` Ted Ts'o
2011-04-19 14:13 ` Eric Sandeen
2011-04-19 16:01 ` Ted Ts'o
2011-04-20 1:53 ` Yongqiang Yang
2011-04-20 15:21 ` Christoph Hellwig
2011-04-20 17:21 ` Ted Ts'o
[not found] ` <20110419140909.GD3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19 21:08 ` Dave Chinner
2011-04-20 15:29 ` Christoph Hellwig
2011-04-16 6:05 ` Yongqiang Yang
2011-04-18 0:35 ` Dave Chinner
2011-04-15 8:53 ` Jim Meyering
2011-04-15 17:16 ` Christoph Hellwig
[not found] ` <20110415171629.GA9088-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 17:24 ` Eric Blake
2011-04-15 17:26 ` Christoph Hellwig
[not found] ` <20110415172603.GA20086-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 22:28 ` Andreas Dilger
2011-04-16 0:25 ` Dave Chinner
[not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-20 14:39 ` Jim Meyering
[not found] ` <87d3khugv1.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-21 20:01 ` Jim Meyering
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110419025949.GA3030@thunk.org \
--to=tytso-3s7wtutddsa@public.gmane.org \
--cc=adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
--cc=coreutils-mXXj517/zsQ@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=markus-xp2qqqlHh3xzoYq+O6RWwA@public.gmane.org \
--cc=sandeen-+82itfer+wXR7s880joybQ@public.gmane.org \
--cc=xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org \
--cc=xiaoqiangnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).