From: Jan Kara <jack@suse.cz>
To: Ted Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, Zheng Liu <wenqing.lz@taobao.com>,
Jan Kara <jack@suse.cz>
Subject: [PATCH 1/7] ext4: Fix block reservation for bigalloc filesystems
Date: Tue, 25 Nov 2014 15:55:30 +0100 [thread overview]
Message-ID: <1416927336-9328-2-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1416927336-9328-1-git-send-email-jack@suse.cz>
For bigalloc filesystems we have to check whether newly requested inode
block isn't already part of a cluster for which we already have delayed
allocation reservation. This check happens in ext4_ext_map_blocks() and
that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
ext4_da_map_blocks() finds in extent cache information about the block,
we don't call into ext4_ext_map_blocks() and thus we always end up
getting new reservation even if the space for cluster is already
reserved. This results in overreservation and premature ENOSPC reports.
Fix the problem by checking for existing cluster reservation already in
ext4_da_map_blocks(). That simplifies the logic and actually allows us
to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 21 +--------------------
fs/ext4/extents.c | 12 ++++--------
fs/ext4/inode.c | 27 ++++-----------------------
include/trace/events/ext4.h | 3 +--
4 files changed, 10 insertions(+), 53 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c55a1faaed58..b28c916340b9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -158,17 +158,8 @@ struct ext4_allocation_request {
#define EXT4_MAP_MAPPED (1 << BH_Mapped)
#define EXT4_MAP_UNWRITTEN (1 << BH_Unwritten)
#define EXT4_MAP_BOUNDARY (1 << BH_Boundary)
-/* Sometimes (in the bigalloc case, from ext4_da_get_block_prep) the caller of
- * ext4_map_blocks wants to know whether or not the underlying cluster has
- * already been accounted for. EXT4_MAP_FROM_CLUSTER conveys to the caller that
- * the requested mapping was from previously mapped (or delayed allocated)
- * cluster. We use BH_AllocFromCluster only for this flag. BH_AllocFromCluster
- * should never appear on buffer_head's state flags.
- */
-#define EXT4_MAP_FROM_CLUSTER (1 << BH_AllocFromCluster)
#define EXT4_MAP_FLAGS (EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
- EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY |\
- EXT4_MAP_FROM_CLUSTER)
+ EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
struct ext4_map_blocks {
ext4_fsblk_t m_pblk;
@@ -2791,16 +2782,6 @@ extern int ext4_bio_write_page(struct ext4_io_submit *io,
extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);
/*
- * Note that these flags will never ever appear in a buffer_head's state flag.
- * See EXT4_MAP_... to see where this is used.
- */
-enum ext4_state_bits {
- BH_AllocFromCluster /* allocated blocks were part of already
- * allocated cluster. */
- = BH_JBDPrivateStart
-};
-
-/*
* Add new method to test whether block and inode bitmaps are properly
* initialized. With uninit_bg reading the block from disk is not enough
* to mark the bitmap uptodate. We need to also zero-out the bitmap
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 37043d0b2be8..d8cd229b5c2b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4268,6 +4268,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
ext4_io_end_t *io = ext4_inode_aio(inode);
ext4_lblk_t cluster_offset;
int set_unwritten = 0;
+ bool map_from_cluster = false;
ext_debug("blocks %u/%u requested for inode %lu\n",
map->m_lblk, map->m_len, inode->i_ino);
@@ -4344,10 +4345,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
}
}
- if ((sbi->s_cluster_ratio > 1) &&
- ext4_find_delalloc_cluster(inode, map->m_lblk))
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
-
/*
* requested block isn't allocated yet;
* we couldn't try to create block if create flag is zero
@@ -4365,7 +4362,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
/*
* Okay, we need to do block allocation.
*/
- map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
newex.ee_block = cpu_to_le32(map->m_lblk);
cluster_offset = EXT4_LBLK_COFF(sbi, map->m_lblk);
@@ -4377,7 +4373,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
get_implied_cluster_alloc(inode->i_sb, map, ex, path)) {
ar.len = allocated = map->m_len;
newblock = map->m_pblk;
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
+ map_from_cluster = true;
goto got_allocated_blocks;
}
@@ -4398,7 +4394,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
get_implied_cluster_alloc(inode->i_sb, map, ex2, path)) {
ar.len = allocated = map->m_len;
newblock = map->m_pblk;
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
+ map_from_cluster = true;
goto got_allocated_blocks;
}
@@ -4524,7 +4520,7 @@ got_allocated_blocks:
*/
reserved_clusters = get_reserved_cluster_alloc(inode,
map->m_lblk, allocated);
- if (map->m_flags & EXT4_MAP_FROM_CLUSTER) {
+ if (map_from_cluster) {
if (reserved_clusters) {
/*
* We have clusters reserved for this range.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e9777f93cf05..08ccb344a46f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -416,11 +416,6 @@ static void ext4_map_blocks_es_recheck(handle_t *handle,
}
if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
up_read((&EXT4_I(inode)->i_data_sem));
- /*
- * Clear EXT4_MAP_FROM_CLUSTER and EXT4_MAP_BOUNDARY flag
- * because it shouldn't be marked in es_map->m_flags.
- */
- map->m_flags &= ~(EXT4_MAP_FROM_CLUSTER | EXT4_MAP_BOUNDARY);
/*
* We don't check m_len because extent will be collpased in status
@@ -1434,19 +1429,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
* file system block.
*/
down_read(&EXT4_I(inode)->i_data_sem);
- if (ext4_has_inline_data(inode)) {
- /*
- * We will soon create blocks for this page, and let
- * us pretend as if the blocks aren't allocated yet.
- * In case of clusters, we have to handle the work
- * of mapping from cluster so that the reserved space
- * is calculated properly.
- */
- if ((EXT4_SB(inode->i_sb)->s_cluster_ratio > 1) &&
- ext4_find_delalloc_cluster(inode, map->m_lblk))
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
+ if (ext4_has_inline_data(inode))
retval = 0;
- } else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+ else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
retval = ext4_ext_map_blocks(NULL, inode, map,
EXT4_GET_BLOCKS_NO_PUT_HOLE);
else
@@ -1465,7 +1450,8 @@ add_delayed:
* then we don't need to reserve it again. However we still need
* to reserve metadata for every block we're going to write.
*/
- if (!(map->m_flags & EXT4_MAP_FROM_CLUSTER)) {
+ if (EXT4_SB(inode->i_sb)->s_cluster_ratio <= 1 ||
+ !ext4_find_delalloc_cluster(inode, map->m_lblk)) {
ret = ext4_da_reserve_space(inode, iblock);
if (ret) {
/* not enough space to reserve */
@@ -1481,11 +1467,6 @@ add_delayed:
goto out_unlock;
}
- /* Clear EXT4_MAP_FROM_CLUSTER flag since its purpose is served
- * and it should not appear on the bh->b_state.
- */
- map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
next prev parent reply other threads:[~2014-11-25 14:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 14:55 [PATCH 0/7 v4] ext4: Extent status tree shrinker improvements Jan Kara
2014-11-25 14:55 ` Jan Kara [this message]
2014-11-25 14:55 ` [PATCH 2/7] ext4: cache extent hole in extent status tree for ext4_da_map_blocks() Jan Kara
2014-11-25 14:55 ` [PATCH 3/7] ext4: change LRU to round-robin in extent status tree shrinker Jan Kara
2014-11-25 14:55 ` [PATCH 4/7] ext4: Move handling of list of shrinkable inodes into extent status code Jan Kara
2014-11-25 14:55 ` [PATCH 5/7] ext4: Limit number of scanned extents in status tree shrinker Jan Kara
2014-11-25 14:55 ` [PATCH 6/7] ext4: Cleanup flag definitions for extent status tree Jan Kara
2014-11-25 14:55 ` [PATCH 7/7] ext4: Introduce aging to " Jan Kara
2014-12-03 2:20 ` [PATCH 0/7 v4] ext4: Extent status tree shrinker improvements Theodore Ts'o
2014-12-03 8:21 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1416927336-9328-2-git-send-email-jack@suse.cz \
--to=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=wenqing.lz@taobao.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).