From: Jan Kara <jack@suse.cz>
To: Pavel Raiskup <praiskup@redhat.com>
Cc: bug-tar@gnu.org,
"Andrew J. Schorr" <aschorr@telemetry-investments.com>,
Paul Eggert <eggert@cs.ucla.edu>,
linux-fsdevel@vger.kernel.org, Andreas Dilger <adilger@dilger.ca>
Subject: Re: [Bug-tar] --sparse is broken on filesystems where small files may have zero blocks
Date: Tue, 29 Oct 2013 18:37:13 +0100 [thread overview]
Message-ID: <20131029173713.GB6087@quack.suse.cz> (raw)
In-Reply-To: <6917534.P50uorHOUu@nb.usersys.redhat.com>
On Tue 29-10-13 16:27:02, Pavel Raiskup wrote:
> On Tuesday, October 29, 2013 09:59:56 Pavel Raiskup wrote:
> > > #define ST_IS_SPARSE(st) \
> > > (ST_NBLOCKS (st) \
> > > - < ((st).st_size / ST_NBLOCKSIZE + ((st).st_size % ST_NBLOCKSIZE != 0)))
> > > + < ((st).st_size / ST_NBLOCKSIZE \
> > > + + ((st).st_size % ST_NBLOCKSIZE != 0 \
> > > + && (st).st_size / ST_NBLOCKSIZE != 0)))
> >
> > May the st.st_size / ST_NBLOCKSIZE be greater than 1 and data still stored
> > in inode directly? Seems like on ext4 filesystem it is not possible [1]
> > but does anybody know about exception?
> >
> > [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Inline_Data
>
> Well, I now recalled somehow relevant Red Hat bug, sorry I have not
> mentioned it before:
> https://bugzilla.redhat.com/show_bug.cgi?id=757557
>
> CC'ing fs-devel: The question is whether that ^^^^ is not a bug in
> filesystem — whether filesystem should not _always_ return to fstat()
> block count at least 1 if there are at least some data (even if these data
> are inlined in inode)? Just for catching the context, this thread starts
> here: http://lists.gnu.org/archive/html/bug-tar/2013-10/msg00030.html
So 'st_blocks' should be "the number of blocks allocated to the file,
512-byte units". If we are able to store the whole file within inode, we
have no blocks allocated and thus setting st_blocks to 0 looks as a decent
thing to do. Looking into filesystems where this is possible (ext4, btrfs,
reiserfs) they will all set st_blocks to 0 if the file body is inlined and
the size is smaller than 512 bytes.
But OTOH btrfs and reiserfs *do* in fact account the amount of space
consumed by the file. It is just that stat(2) syscall reports the space in
512-byte units and for historical reasons we ended up just truncating the
byte-precision space counter instead of rounding it up (that is a mistake I
made like 10 years ago :(). It is easy enough to start reporting the value
rounded up but I'm not sure if it won't break some userspace which already
developped some dependency on this buggy kernel behavior.
ext4 is yet a different matter. It does really report the number of
allocated blocks in st_blocks so it will report 0 while data can fit into
the inode (whose size is configurable during fs creation, default is 256).
In practice that will result in reporting non-zero st_blocks for 512-byte
and larger files anyways so there won't be an observable difference between
what ext4 and btrfs / reiserfs do. But we might still want to fix up ext4
to be consistent with btrfs and reiserfs so that things are more
future-proof.
> If that is not a bug in fs, is there possible to detect that particular
> file is completely sparse?
As Joerg wrote, SEEK_DATA / SEEK_HOLE is a proper interface for this at
least for systems that support it. Once you have called stat(2) on the
file, inode will be in cache anyways so the additional cost of open(2),
lseek(2), close(2) won't be that big. For systems that don't support
SEEK_DATA / SEEK_HOLE, you can use some heuristic like:
if (st.st_size < st.st_blksize || st.st_blocks > 0)
/* Bite the bullet and scan the data for non-zero bytes */
else
/* Assume the file is sparse */
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-10-29 17:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20131028211739.GA26741@ti119.telemetry-investments.com>
[not found] ` <526F376E.5030307@cs.ucla.edu>
[not found] ` <10978717.9ApR7zKiv0@nb.usersys.redhat.com>
2013-10-29 15:27 ` [Bug-tar] --sparse is broken on filesystems where small files may have zero blocks Pavel Raiskup
2013-10-29 15:32 ` Joerg Schilling
2013-10-29 17:12 ` [Bug-tar] " Christoph Hellwig
2013-10-29 17:37 ` Jan Kara [this message]
2013-10-29 22:01 ` Andreas Dilger
2013-10-31 12:11 ` Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131029173713.GB6087@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger@dilger.ca \
--cc=aschorr@telemetry-investments.com \
--cc=bug-tar@gnu.org \
--cc=eggert@cs.ucla.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=praiskup@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).