linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: stable@vger.kernel.org
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Eric Sandeen <sandeen@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: [PATCH v2.6.34.y 16/28] ext4: don't use quota reservation for speculative metadata
Date: Tue,  1 Jun 2010 12:13:03 -0400	[thread overview]
Message-ID: <1275408795-17487-16-git-send-email-tytso@mit.edu> (raw)
In-Reply-To: <1275408795-17487-1-git-send-email-tytso@mit.edu>

From: Eric Sandeen <sandeen@redhat.com>

commit 72b8ab9dde211ea518ff27e631b2046ef90c29a2 upstream (as of 2.6.34-git13)

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

This patch has been tested indirectly by running the xfstests
suite with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/balloc.c |    5 ++-
 fs/ext4/inode.c  |   69 ++++++++++++++++++------------------------------------
 2 files changed, 26 insertions(+), 48 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index d2f37a5..95b7594 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -591,14 +591,15 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
 	ret = ext4_mb_new_blocks(handle, &ar, errp);
 	if (count)
 		*count = ar.len;
-
 	/*
-	 * Account for the allocated meta blocks
+	 * Account for the allocated meta blocks.  We will never
+	 * fail EDQUOT for metdata, but we do account for it.
 	 */
 	if (!(*errp) && EXT4_I(inode)->i_delalloc_reserved_flag) {
 		spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
 		EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
 		spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+		dquot_alloc_block_nofail(inode, ar.len);
 	}
 	return ret;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 89a31e8..df43217 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1076,7 +1076,6 @@ void ext4_da_update_reserve_space(struct inode *inode,
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	int mdb_free = 0, allocated_meta_blocks = 0;
 
 	spin_lock(&ei->i_block_reservation_lock);
 	trace_ext4_da_update_reserve_space(inode, used);
@@ -1091,11 +1090,10 @@ void ext4_da_update_reserve_space(struct inode *inode,
 
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
-	used += ei->i_allocated_meta_blocks;
 	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
-	allocated_meta_blocks = ei->i_allocated_meta_blocks;
+	percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+			   used + ei->i_allocated_meta_blocks);
 	ei->i_allocated_meta_blocks = 0;
-	percpu_counter_sub(&sbi->s_dirtyblocks_counter, used);
 
 	if (ei->i_reserved_data_blocks == 0) {
 		/*
@@ -1103,31 +1101,23 @@ void ext4_da_update_reserve_space(struct inode *inode,
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		mdb_free = ei->i_reserved_meta_blocks;
+		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+				   ei->i_reserved_meta_blocks);
 		ei->i_reserved_meta_blocks = 0;
 		ei->i_da_metadata_calc_len = 0;
-		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
 	}
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 
-	/* Update quota subsystem */
-	if (quota_claim) {
+	/* Update quota subsystem for data blocks */
+	if (quota_claim)
 		dquot_claim_block(inode, used);
-		if (mdb_free)
-			dquot_release_reservation_block(inode, mdb_free);
-	} else {
+	else {
 		/*
 		 * We did fallocate with an offset that is already delayed
 		 * allocated. So on delayed allocated writeback we should
-		 * not update the quota for allocated blocks. But then
-		 * converting an fallocate region to initialized region would
-		 * have caused a metadata allocation. So claim quota for
-		 * that
+		 * not re-claim the quota for fallocated blocks.
 		 */
-		if (allocated_meta_blocks)
-			dquot_claim_block(inode, allocated_meta_blocks);
-		dquot_release_reservation_block(inode, mdb_free + used -
-						allocated_meta_blocks);
+		dquot_release_reservation_block(inode, used);
 	}
 
 	/*
@@ -1861,7 +1851,7 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 	int retries = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	unsigned long md_needed, md_reserved;
+	unsigned long md_needed;
 	int ret;
 
 	/*
@@ -1871,22 +1861,24 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 	 */
 repeat:
 	spin_lock(&ei->i_block_reservation_lock);
-	md_reserved = ei->i_reserved_meta_blocks;
 	md_needed = ext4_calc_metadata_amount(inode, lblock);
 	trace_ext4_da_reserve_space(inode, md_needed);
 	spin_unlock(&ei->i_block_reservation_lock);
 
 	/*
-	 * Make quota reservation here to prevent quota overflow
-	 * later. Real quota accounting is done at pages writeout
-	 * time.
+	 * We will charge metadata quota at writeout time; this saves
+	 * us from metadata over-estimation, though we may go over by
+	 * a small amount in the end.  Here we just reserve for data.
 	 */
-	ret = dquot_reserve_block(inode, md_needed + 1);
+	ret = dquot_reserve_block(inode, 1);
 	if (ret)
 		return ret;
-
+	/*
+	 * We do still charge estimated metadata to the sb though;
+	 * we cannot afford to run out of free blocks.
+	 */
 	if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
-		dquot_release_reservation_block(inode, md_needed + 1);
+		dquot_release_reservation_block(inode, 1);
 		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
 			yield();
 			goto repeat;
@@ -1933,12 +1925,13 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		to_free += ei->i_reserved_meta_blocks;
+		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+				   ei->i_reserved_meta_blocks);
 		ei->i_reserved_meta_blocks = 0;
 		ei->i_da_metadata_calc_len = 0;
 	}
 
-	/* update fs dirty blocks counter */
+	/* update fs dirty data blocks counter */
 	percpu_counter_sub(&sbi->s_dirtyblocks_counter, to_free);
 
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
@@ -3086,7 +3079,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       loff_t pos, unsigned len, unsigned flags,
 			       struct page **pagep, void **fsdata)
 {
-	int ret, retries = 0, quota_retries = 0;
+	int ret, retries = 0;
 	struct page *page;
 	pgoff_t index;
 	unsigned from, to;
@@ -3145,22 +3138,6 @@ retry:
 
 	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
 		goto retry;

  parent reply	other threads:[~2010-06-01 16:13 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01 16:12 [PATCH v2.6.34.y 01/28] ext4: check missed return value in ext4_sync_file() Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 02/28] ext4: fix memory leaks in error path handling of ext4_ext_zeroout() Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 03/28] ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 04/28] ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy() Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 05/28] ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 06/28] ext4: fix quota accounting in case of fallocate Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 07/28] ext4: check s_log_groups_per_flex in online resize code Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 08/28] ext4: don't return to userspace after freezing the fs with a mutex held Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 09/28] ext4: stop issuing discards if not supported by device Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 10/28] ext4: don't scan/accumulate more pages than mballoc will allocate Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 11/28] ext4: Do not zero out uninitialized extents beyond i_size Theodore Ts'o
2010-06-01 16:12 ` [PATCH v2.6.34.y 12/28] ext4: clean up inode bitmaps manipulation in ext4_free_inode Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 13/28] ext4: init statistics after journal recovery Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 14/28] quota: use flags interface for dquot alloc/free space Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 15/28] quota: add the option to not fail with EDQUOT in block Theodore Ts'o
2010-06-01 16:13 ` Theodore Ts'o [this message]
2010-06-01 16:13 ` [PATCH v2.6.34.y 17/28] ext4: Remove extraneous newlines in ext4_msg() calls Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 18/28] ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 19/28] ext4: check for a good block group before loading buddy pages Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 20/28] ext4: Show journal_checksum option Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 21/28] ext4: Use our own write_cache_pages() Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 22/28] ext4: Use bitops to read/modify i_flags in struct ext4_inode_info Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 23/28] ext4: Avoid crashing on NULL ptr dereference on a filesystem error Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 24/28] ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 25/28] ext4: restart ext4_ext_remove_space() after transaction restart Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 26/28] ext4: Conditionally define compat ioctl numbers Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 27/28] ext4: Fix compat EXT4_IOC_ADD_GROUP Theodore Ts'o
2010-06-01 16:13 ` [PATCH v2.6.34.y 28/28] ext4: Make fsync sync new parent directories in no-journal mode Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1275408795-17487-16-git-send-email-tytso@mit.edu \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).