From: Jiaying Zhang <jiayingz@google.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>,
Frank Mayhar <fmayhar@google.com>,
Eric Sandeen <sandeen@redhat.com>,
Curt Wohlgemuth <curtw@google.com>,
ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Question on fallocate/ftruncate sequence
Date: Mon, 31 Aug 2009 12:40:29 -0700 [thread overview]
Message-ID: <5df78e1d0908311240s3205b4bcrb65b2552b4ed579c@mail.gmail.com> (raw)
In-Reply-To: <20090830025250.GA25768@mit.edu>
Thanks a lot for the comments and suggestions!
On Sat, Aug 29, 2009 at 7:52 PM, Theodore Tso<tytso@mit.edu> wrote:
> On Fri, Aug 28, 2009 at 05:40:54PM -0700, Jiaying Zhang wrote:
>> --- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700
>> +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700
>> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode,
>> unsigned int ia_valid = attr->ia_valid;
>>
>> if (ia_valid & ATTR_SIZE &&
>> - (attr->ia_size != i_size_read(inode)) {
>> + (attr->ia_size != i_size_read(inode) ||
>> + (inode->i_flags & FS_KEEPSIZE_FL))) {
>> int error = vmtruncate(inode, attr->ia_size);
>> if (error)
>> return error;
>
> Instead of doing this in the generic code, it really should be done in
> ext4_setattr. Technically speaking, we don't actually need the
> FS_KEEPSIZE_FL to solve this problem; instead we can simply have the
> ext4 code look in the extent tree to see if there are any blocks
> mapped beyond the logical block:
>
> i_size_read(inode) >> inode->i_sb->s_blocksize_bits
Is it relatively cheap to scan the extent tree? Will this add the
overhead to truncate?
>
> Having a flag as Andreas suggested does help with the issue of e2fsck
> noticing whether or not i_size is incorrect (and should be fixed) or
> the file has been extended. So keeping having the flag is an OK thing
> to do, but we need to be careful about a particularly subtle
> overloading problem. The flags FS_*_FL as defined in
> include/linux/fs.h are technically only for in-memory use. The ext4
> on-disk format flags is EXT4_*_FL, and defined in ext4.h.
>
> The flags were originially defined for use in ext2/3/4, but later on
> other filesystems adopted those flags so that e2fsprogs's chattr and
> lsattr programs could be used for their filesystems as well. It just
> so happens that for ext2/3/4 the on-disk encoding of those flags in
> the in-memory encoding of those flags in i_flags are the same, but
> that means that the flags need to be defined in both places to avoid
> assignment overlaps. We also need to be clear whether the flags are
> internal flags for ext4's use only, or flags meant for use by all
> filesystems. This is why the testing for FS_KEEPSIZE_FL in fs/attr is
> particularly bad, if the flag are going to be set in fs/ext4/extents.c.
>
> It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as
> EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being
> reserved in include/linux/fs.h.
Here is the modified patch based on your suggestions. I stick with the
KEEPSIZE_FL approach that I think can allow us to handle the special
truncation accordingly during fsck. Other file systems can also re-use
this flag when they want to support fallocate with KEEP_SIZE. As you
suggested, I moved the EXT4_KEEPSIZE_FL checking to ext4_setattr
that now calls vmtruncate if the KEEPSIZE flag is set in the i_flag.
Please let me know what you think about this proposed patch.
Thanks a lot!
Jiaying
--- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/extents.c 2009-08-31 12:12:16.000000000 -0700
@@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str
i_size_write(inode, new_size);
if (new_size > EXT4_I(inode)->i_disksize)
ext4_update_i_disksize(inode, new_size);
+ inode->i_flags &= ~EXT4_KEEPSIZE_FL;
} else {
+ /*
+ * Mark that we allocate beyond EOF so the subsequent truncate
+ * can proceed even if the new size is the same as i_size.
+ */
+ inode->i_flags |= EXT4_KEEPSIZE_FL;
}
}
--- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/inode.c 2009-08-31 12:12:16.000000000 -0700
@@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode)
if (!ext4_can_truncate(inode))
return;
+ inode->i_flags &= ~EXT4_KEEPSIZE_FL;
+
if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC))
ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE;
@@ -4807,7 +4809,9 @@ int ext4_setattr(struct dentry *dentry,
}
if (S_ISREG(inode->i_mode) &&
- attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
+ attr->ia_valid & ATTR_SIZE &&
+ (attr->ia_size < inode->i_size ||
+ (inode->i_flags & EXT4_KEEPSIZE_FL))) {
handle_t *handle;
handle = ext4_journal_start(inode, 3);
@@ -4838,6 +4842,11 @@ int ext4_setattr(struct dentry *dentry,
goto err_out;
}
}
+ if ((inode->i_flags & EXT4_KEEPSIZE_FL)) {
+ rc = vmtruncate(inode, attr->ia_size);
+ if (rc)
+ goto err_out;
+ }
}
rc = inode_setattr(inode, attr);
--- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-31
12:08:10.000000000 -0700
+++ include/linux/fs.h 2009-08-31 12:12:16.000000000 -0700
@@ -343,6 +343,7 @@ struct inodes_stat_t {
#define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
+#define FS_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
--- .pc/fallocate_keepsizse.patch/fs/ext4/ext4.h 2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/ext4.h 2009-08-31 12:12:16.000000000 -0700
@@ -235,6 +235,7 @@ struct flex_groups {
#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
#define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */
+#define EXT4_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF
(bit reserved in fs.h) */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */
>
> - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-08-31 19:40 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-20 16:36 Question on fallocate/ftruncate sequence Curt Wohlgemuth
2009-07-20 22:45 ` Eric Sandeen
2009-07-21 21:29 ` Frank Mayhar
2009-07-21 21:54 ` Andreas Dilger
2009-07-22 16:24 ` Frank Mayhar
2009-07-22 23:10 ` Frank Mayhar
2009-07-23 3:05 ` Eric Sandeen
2009-07-23 16:27 ` Frank Mayhar
2009-07-23 17:00 ` Eric Sandeen
2009-07-23 18:05 ` Frank Mayhar
2009-07-23 21:56 ` Andreas Dilger
2009-07-23 22:46 ` Frank Mayhar
2009-08-28 18:42 ` Jiaying Zhang
2009-08-28 19:40 ` Andreas Dilger
2009-08-28 21:44 ` Jiaying Zhang
2009-08-28 22:14 ` Andreas Dilger
2009-08-29 0:40 ` Jiaying Zhang
2009-08-30 2:52 ` Theodore Tso
2009-08-31 19:40 ` Jiaying Zhang [this message]
2009-08-31 21:56 ` Andreas Dilger
2009-08-31 23:33 ` Jiaying Zhang
2009-09-02 8:41 ` Andreas Dilger
2009-09-03 5:20 ` Jiaying Zhang
2009-09-03 5:32 ` Jiaying Zhang
2009-09-24 5:27 ` Jiaying Zhang
2009-09-25 7:35 ` Andreas Dilger
2009-09-25 22:08 ` Jiaying Zhang
2009-09-29 19:15 ` Eric Sandeen
2009-09-29 19:38 ` Jiaying Zhang
2009-09-29 19:55 ` Eric Sandeen
2009-09-30 8:10 ` Andreas Dilger
2009-10-02 22:10 ` Jiaying Zhang
2009-10-02 22:29 ` Eric Sandeen
2009-10-02 23:21 ` Jiaying Zhang
2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar
2009-07-23 20:37 ` Eric Sandeen
2009-07-23 21:01 ` Frank Mayhar
2009-07-29 15:29 ` Jan Kara
2009-07-29 15:59 ` Frank Mayhar
2009-07-23 21:53 ` Andreas Dilger
2009-07-23 23:33 ` Greg Freemyer
2009-07-21 22:03 ` Question on fallocate/ftruncate sequence Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5df78e1d0908311240s3205b4bcrb65b2552b4ed579c@mail.gmail.com \
--to=jiayingz@google.com \
--cc=adilger@sun.com \
--cc=curtw@google.com \
--cc=fmayhar@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).