From: Jiaying Zhang <jiayingz@google.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Frank Mayhar <fmayhar@google.com>,
Eric Sandeen <sandeen@redhat.com>,
Curt Wohlgemuth <curtw@google.com>,
ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Question on fallocate/ftruncate sequence
Date: Fri, 28 Aug 2009 17:40:54 -0700 [thread overview]
Message-ID: <5df78e1d0908281740w7bc0f283x5004ca5b231b3af5@mail.gmail.com> (raw)
In-Reply-To: <20090828221432.GS4197@webber.adilger.int>
On Fri, Aug 28, 2009 at 3:14 PM, Andreas Dilger<adilger@sun.com> wrote:
> On Aug 28, 2009 14:44 -0700, Jiaying Zhang wrote:
>> On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote:
>> > This isn't really correct, however, because i_blocks also contains
>> > non-data blocks (indirect/index, EA, etc) blocks, so even with small
>> > files with ACLs i_blocks may always be larger than ia_size >> 9, and
>> > for ext2/3 at least this will ALWAYS be true for files > 48kB in size.
>>
>> I see. I guess we need to use a special flag then. Or is there any
>> other suggestions? I also have another question related to this
>> problem. Why those fallocated blocks are not marked as preallocated
>> blocks that will then be automatically freed in ext4_release_file?
>
> Because fallocate() means "persistent allocation on disk", not "in memory
> preallocation". The "in memory" preallocation already happens in ext4,
> and it is released when the inode is cleaned up.
Right. Thanks for pointing this out!
RFC, here is a patch that Frank and I have been working on. It introduces
a new fs flag to mark that the file has been allocated beyond its EOF, as
discussed previously in this thread. The flag is cleared in the subsequent
vmtruncate or fallocate without KEEPSIZE. It is possible that a vmtruncate
may be called unnecessarily in the case that the file is written beyond the
allocated size, but I think it is ok to pay this cost to get correctness.
--- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700
+++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700
@@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode,
unsigned int ia_valid = attr->ia_valid;
if (ia_valid & ATTR_SIZE &&
- (attr->ia_size != i_size_read(inode)) {
+ (attr->ia_size != i_size_read(inode) ||
+ (inode->i_flags & FS_KEEPSIZE_FL))) {
int error = vmtruncate(inode, attr->ia_size);
if (error)
return error;
--- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-28
15:37:45.000000000 -0700
+++ fs/ext4/extents.c 2009-08-28 17:27:27.000000000 -0700
@@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str
i_size_write(inode, new_size);
if (new_size > EXT4_I(inode)->i_disksize)
ext4_update_i_disksize(inode, new_size);
+ inode->i_flags &= ~FS_KEEPSIZE_FL;
} else {
+ /*
+ * Mark that we allocate beyond EOF so the subsequent truncate
+ * can proceed even if the new size is the same as i_size.
+ */
+ inode->i_flags |= FS_KEEPSIZE_FL;
}
}
--- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-16
14:19:38.000000000 -0700
+++ fs/ext4/inode.c 2009-08-28 16:59:42.000000000 -0700
@@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode)
if (!ext4_can_truncate(inode))
return;
+ inode->i_flags &= ~FS_KEEPSIZE_FL;
+
if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC))
ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE;
--- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-28
15:44:27.000000000 -0700
+++ include/linux/fs.h 2009-08-28 17:00:47.000000000 -0700
@@ -343,6 +343,7 @@ struct inodes_stat_t {
#define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
+#define FS_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
Jiaying
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-08-29 0:40 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-20 16:36 Question on fallocate/ftruncate sequence Curt Wohlgemuth
2009-07-20 22:45 ` Eric Sandeen
2009-07-21 21:29 ` Frank Mayhar
2009-07-21 21:54 ` Andreas Dilger
2009-07-22 16:24 ` Frank Mayhar
2009-07-22 23:10 ` Frank Mayhar
2009-07-23 3:05 ` Eric Sandeen
2009-07-23 16:27 ` Frank Mayhar
2009-07-23 17:00 ` Eric Sandeen
2009-07-23 18:05 ` Frank Mayhar
2009-07-23 21:56 ` Andreas Dilger
2009-07-23 22:46 ` Frank Mayhar
2009-08-28 18:42 ` Jiaying Zhang
2009-08-28 19:40 ` Andreas Dilger
2009-08-28 21:44 ` Jiaying Zhang
2009-08-28 22:14 ` Andreas Dilger
2009-08-29 0:40 ` Jiaying Zhang [this message]
2009-08-30 2:52 ` Theodore Tso
2009-08-31 19:40 ` Jiaying Zhang
2009-08-31 21:56 ` Andreas Dilger
2009-08-31 23:33 ` Jiaying Zhang
2009-09-02 8:41 ` Andreas Dilger
2009-09-03 5:20 ` Jiaying Zhang
2009-09-03 5:32 ` Jiaying Zhang
2009-09-24 5:27 ` Jiaying Zhang
2009-09-25 7:35 ` Andreas Dilger
2009-09-25 22:08 ` Jiaying Zhang
2009-09-29 19:15 ` Eric Sandeen
2009-09-29 19:38 ` Jiaying Zhang
2009-09-29 19:55 ` Eric Sandeen
2009-09-30 8:10 ` Andreas Dilger
2009-10-02 22:10 ` Jiaying Zhang
2009-10-02 22:29 ` Eric Sandeen
2009-10-02 23:21 ` Jiaying Zhang
2009-07-23 19:48 ` Question on fallocate/ftruncate sequence (and flags) Frank Mayhar
2009-07-23 20:37 ` Eric Sandeen
2009-07-23 21:01 ` Frank Mayhar
2009-07-29 15:29 ` Jan Kara
2009-07-29 15:59 ` Frank Mayhar
2009-07-23 21:53 ` Andreas Dilger
2009-07-23 23:33 ` Greg Freemyer
2009-07-21 22:03 ` Question on fallocate/ftruncate sequence Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5df78e1d0908281740w7bc0f283x5004ca5b231b3af5@mail.gmail.com \
--to=jiayingz@google.com \
--cc=adilger@sun.com \
--cc=curtw@google.com \
--cc=fmayhar@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).