linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Christoph Hellwig <hch@infradead.org>,
	Eric Blake <eblake@redhat.com>, Jim Meyering <jim@meyering.net>,
	Eric Sandeen <sandeen@sandeen.net>, xfs-oss <xfs@oss.sgi.com>,
	coreutils@gnu.org, linux-ext4@vger.kernel.org,
	Markus Trippelsdorf <markus@trippelsdorf.de>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
Date: Sat, 16 Apr 2011 10:25:00 +1000	[thread overview]
Message-ID: <20110416002500.GO21395@dastard> (raw)
In-Reply-To: <DD59E27A-FD96-4983-A274-B76CCE99AE7A@dilger.ca>

On Fri, Apr 15, 2011 at 04:28:37PM -0600, Andreas Dilger wrote:
> On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> > On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> >> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> >> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> >> to define and easier to use) interface for discovering the regions of a
> >> file that only contain NUL bytes?
> > 
> > Yes, I've already suggested that both in this thread and on IRC.
> > 
> > For efficient copies it's the only usable interface.
> 
> I suspect that these bugs would have still existed whether the
> interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is
> that the delalloc pages were not accounted for correctly during
> layout traversal..

It's not delalloc that is the problem - XFS accounts for them just
fine in the extent map when asked. However, XFS does speculative
delayed allocation over regions that contain no data, so if the
core-utils folk are assuming that delalloc extents contain data and
need to be copied, they're in for a nasty surprise.

However, every example I've seen in this thread has had to do with
unwritten extents not changing state when data is written into the
page cache. i.e. people are struggling with the expected behaviour
of unwritten extents.

That is, unwritten extent remain unwritten extents until data has
been _physically_ written to them. If there is data in the page
cache over the unwritten extent, it is still an unwritten extent.
If the system crashes while in this state, then the extent _must_
remain an unwritten extent after recovery, otherwise it exposes
stale data.

Further, using FIEMAP to determine where the data is that needs
copying is extremely fragile. What happens when FIEMAP grows a
different type of extent that contains data? cp will break, because
it doesn't think it needs to copy data in extents of an unknown
type. Or it will break because it thinks it needs to copy it and
there's something in it that should not be copied.

Also, cp shoul dnot be trying to replicate the physical layout of
the file when copying it - that's for the filesystem to decide and
having userspace try to do this is a sure recipe for causing severe
filesystem fragmentation. The filesystems already do an excellent
job of optimising allocation - userspace should not be trying to
second guess what is optimal layout for the filesystem.

Fundamentally, what the core-utils guys want is FIEMAP to tell them
where data is in the file, regardless of whether it is in memory or
on disk. That is not what FIEMAP is intended for and matches
SEEK_HOLE/SEEK_DATA precisely.

SEEK_HOLE/SEEK_DATA have very well understood semantics and is
designed specifically for optimising acceess to sparse files. This
interface abstracts all the details of how different filesystems
store their data so the application doesn't need to care about it.
The API is so, so much simpler to use and understand, to. And if the
filesystem has data in cache over an unwritten extent, then by
definition it's still data to be returned by SEEK_DATA. If it fails
to return the range as such then the implementation is broken.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2011-04-16  0:25 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20110414102608.GA1678@x4.trippels.de>
     [not found] ` <20110414120635.GB1678@x4.trippels.de>
     [not found]   ` <20110414140222.GB1679@x4.trippels.de>
     [not found]     ` <20110414140222.GB1679-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 14:59       ` Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Pádraig Brady
     [not found]         ` <4DA70BD3.1070409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 15:50           ` Eric Sandeen
     [not found]             ` <4DA717B2.3020305-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-14 15:52               ` Pádraig Brady
2011-04-14 15:56                 ` Eric Sandeen
2011-04-14 16:03                   ` Markus Trippelsdorf
2011-04-14 16:14                     ` Eric Sandeen
     [not found]                     ` <20110414160343.GA12787-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 16:21                       ` Yongqiang Yang
     [not found]                         ` <BANLkTimRxvBMp9M7zwiUY_UmmFOY5N58+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-14 16:28                           ` Markus Trippelsdorf
2011-04-14 16:31                             ` Eric Sandeen
2011-04-14 16:48                               ` Markus Trippelsdorf
2011-04-14 16:49                                 ` Eric Sandeen
2011-04-14 16:04                   ` Yongqiang Yang
2011-04-14 16:10                     ` Yongqiang Yang
     [not found]                       ` <BANLkTimoLeWMJgNFGW+zdeUeJyZ-_+8fMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-05 11:29                         ` Pádraig Brady
2011-05-05 11:47                           ` Yongqiang Yang
     [not found]                 ` <4DA7182B.8050409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 17:27                   ` Jim Meyering
2011-04-14 19:13                     ` Pádraig Brady
     [not found]                     ` <878vvcspz0.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-14 19:39                       ` Jim Meyering
2011-04-14 22:59             ` Dave Chinner
2011-04-14 23:29               ` Pádraig Brady
2011-04-15  0:09                 ` Dave Chinner
2011-04-15  5:01                   ` Andreas Dilger
2011-04-16  0:50                     ` Dave Chinner
2011-04-16  5:11                       ` Andreas Dilger
2011-04-16 12:21                         ` Theodore Tso
2011-04-18  0:40                           ` Dave Chinner
2011-04-18  2:45                             ` Andreas Dilger
2011-04-19  1:58                               ` Yongqiang Yang
     [not found]                                 ` <BANLkTin=WEpSf6ddiOMNMOpCPP-wiEttSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-19  2:59                                   ` Ted Ts'o
     [not found]                                     ` <20110419025949.GA3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19  3:05                                       ` Eric Sandeen
     [not found]                                         ` <4DACFBEB.9040909-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-21 20:12                                           ` Jim Meyering
2011-04-19  3:30                                     ` Yongqiang Yang
2011-04-19  4:14                                     ` Dave Chinner
2011-04-19  5:27                                     ` Christoph Hellwig
2011-04-19  3:44                                 ` Dave Chinner
2011-04-19  6:53                                   ` Yongqiang Yang
2011-04-19  7:45                                     ` Dave Chinner
2011-04-19  8:11                                       ` Yongqiang Yang
2011-04-19 14:05                                         ` Eric Sandeen
2011-04-19 14:09                                       ` Ted Ts'o
2011-04-19 14:13                                         ` Eric Sandeen
2011-04-19 16:01                                           ` Ted Ts'o
2011-04-20  1:53                                             ` Yongqiang Yang
2011-04-20 15:21                                             ` Christoph Hellwig
2011-04-20 17:21                                               ` Ted Ts'o
     [not found]                                         ` <20110419140909.GD3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19 21:08                                           ` Dave Chinner
2011-04-20 15:29                                             ` Christoph Hellwig
2011-04-16  6:05                       ` Yongqiang Yang
2011-04-18  0:35                         ` Dave Chinner
2011-04-15  8:53                   ` Jim Meyering
2011-04-15 17:16                     ` Christoph Hellwig
     [not found]                       ` <20110415171629.GA9088-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 17:24                         ` Eric Blake
2011-04-15 17:26                           ` Christoph Hellwig
     [not found]                             ` <20110415172603.GA20086-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 22:28                               ` Andreas Dilger
2011-04-16  0:25                                 ` Dave Chinner [this message]
     [not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-20 14:39   ` Jim Meyering
     [not found]     ` <87d3khugv1.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-21 20:01       ` Jim Meyering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110416002500.GO21395@dastard \
    --to=david@fromorbit.com \
    --cc=adilger@dilger.ca \
    --cc=coreutils@gnu.org \
    --cc=eblake@redhat.com \
    --cc=hch@infradead.org \
    --cc=jim@meyering.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).