From: Theodore Tso <tytso@mit.edu>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-ext4@vger.kernel.org, Andreas Dilger <adilger@sun.com>,
Alex Tomas <bzzz@sun.com>
Subject: Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files
Date: Fri, 21 Aug 2009 08:23:13 -0400 [thread overview]
Message-ID: <20090821122313.GB9529@mit.edu> (raw)
In-Reply-To: <20090821024545.GA9529@mit.edu>
On Thu, Aug 20, 2009 at 10:45:45PM -0400, Theodore Tso wrote:
> Um, oops. Yeah, good point. It should be !ext4_fs_is_busy(). I also
> have the logic backwards in ext4_lock_group as well, so the two errors
> mostly cancel each other out. I'll fix that in the patch.
... and here's the revised patch.
- Ted
ext4: Avoid group preallocation for closed files
Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.
So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 70aa951..b989920 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -952,6 +952,7 @@ struct ext4_sb_info {
atomic_t s_mb_lost_chunks;
atomic_t s_mb_preallocated;
atomic_t s_mb_discarded;
+ atomic_t s_lock_busy;
/* locality groups */
struct ext4_locality_group *s_locality_groups;
@@ -1593,15 +1594,42 @@ struct ext4_group_info {
#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MAX_CONTENTION 8
+#define EXT4_CONTENTION_THRESHOLD 2
+
static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb,
ext4_group_t group)
{
return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group);
}
+/*
+ * Returns true if the filesystem is busy enough that attempts to
+ * access the block group locks has run into contention.
+ */
+static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi)
+{
+ return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD);
+}
+
static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
{
- spin_lock(ext4_group_lock_ptr(sb, group));
+ spinlock_t *lock = ext4_group_lock_ptr(sb, group);
+ if (spin_trylock(lock))
+ /*
+ * We're able to grab the lock right away, so drop the
+ * lock contention counter.
+ */
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0);
+ else {
+ /*
+ * The lock is busy, so bump the contention counter,
+ * and then wait on the spin lock.
+ */
+ atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1,
+ EXT4_MAX_CONTENTION);
+ spin_lock(lock);
+ }
}
static inline void ext4_unlock_group(struct super_block *sb,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a103cb0..ee563e2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
return;
size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
- isize = i_size_read(ac->ac_inode) >> bsbits;
+ isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1)
+ >> bsbits;
size = max(size, isize);
+ if ((size == isize) &&
+ !ext4_fs_is_busy(sbi) &&
+ (atomic_read(&ac->ac_inode->i_writecount) == 0)) {
+ ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC;
+ return;
+ }
+
/* don't use group allocation for large files */
if (size >= sbi->s_mb_stream_request) {
ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
prev parent reply other threads:[~2009-08-21 12:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o
2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o
2009-08-10 3:42 ` Eric Sandeen
2009-08-10 20:23 ` Theodore Tso
2009-08-11 18:15 ` Xiang Wang
2009-08-11 18:53 ` Theodore Tso
2009-08-10 3:23 ` [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal Theodore Ts'o
2009-08-10 3:44 ` Eric Sandeen
2009-08-10 3:23 ` [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o
2009-08-20 7:22 ` Aneesh Kumar K.V
2009-08-20 18:20 ` Theodore Tso
2009-08-10 3:23 ` [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files Theodore Ts'o
2009-08-20 6:40 ` Aneesh Kumar K.V
2009-08-21 2:45 ` Theodore Tso
2009-08-21 12:23 ` Theodore Tso [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090821122313.GB9529@mit.edu \
--to=tytso@mit.edu \
--cc=adilger@sun.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bzzz@sun.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).