linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Metadata reservation for unwritten extent conversion
@ 2013-03-13 15:48 Lukáš Czerner
  2013-03-13 16:09 ` Lukáš Czerner
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2013-03-13 15:48 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Ts'o

Hi Ted,

turns out that I've been wrong with my assumption that we can count
with metadata reservations for the unwritten extent conversion. So
the patch I've proposed:

 ext4: Reserve metadata if writing into uninitialized

which I think you've already merged into the devel branch is not
particularly useful and it should be reverted. Reverting this will
not cause any problem to reappear and in fact it should get rid of
some of the warnings about more metadata blocks being allocated than
reserved.

The problem is that even though we're able to reserve the metadata
for the possible conversion, we can never know when to free the
metadata reservation. Currently we're freeing the metadata
reservation (which is usually overestimated because we never know
for sure how much metadata will actually be needed) when the number
of reserved data blocks drop to zero. However since we only need to
reserve metadata (and not data) blocks when writing into unwritten
extent, this approach does not work and we would free reserved
metadata even if we might still need it - there is no way to tell
whether we're going to need it or not in writeback path.

However even after the revert we're still left with the problem
with unwrittent extent conversion in dealloc path. I've attempted
to solved it with similar mechanism which xfs has - block reserve pool.

Blocks from this reserve pool should be removed from the global pool
and no one would be able to allocate from it, unless:

1. we're in delalloc path and we need space for metadata allocation
   in unwritten extent conversion,

   This should solve the problem with not having enough metadata in
   ENOSPC condition in delayed allocation writeback path.

2. we're in punch hole path and we need space for metadata
   allocation when splitting extents

   This should solve the problem when we do not have _any_ space at
   all and we attempt to punch a hole into the file resulting in the
   need for new extent tree block. Punch hole should succeed even in
   ENOSPC conditions since we're actually freeing space.

3. we're writing into unwritten (preallocated) extent and we're in
   ENOSPC condition - not having any blocks to allocated metadata
   for unwritten extent conversion.

   This should solve the problem when we can return ENOSPC even when
   writing into preallocated space which is certainly unexpected
   (xfstest 274).


I already have a patch which implements that, but it need some
tweaks and some more testing, so I'll send it probably later this
week, as well as more metadata reservation fixes (not related to
unwritten extent conversion).



Let me know what do you think.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata reservation for unwritten extent conversion
  2013-03-13 15:48 Metadata reservation for unwritten extent conversion Lukáš Czerner
@ 2013-03-13 16:09 ` Lukáš Czerner
  2013-03-13 17:25   ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2013-03-13 16:09 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-ext4, Theodore Ts'o

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3697 bytes --]

On Wed, 13 Mar 2013, Lukáš Czerner wrote:

> Date: Wed, 13 Mar 2013 16:48:49 +0100 (CET)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: linux-ext4@vger.kernel.org
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: Metadata reservation for unwritten extent conversion
> 
> Hi Ted,
> 
> turns out that I've been wrong with my assumption that we can count
> with metadata reservations for the unwritten extent conversion. So
> the patch I've proposed:
> 
>  ext4: Reserve metadata if writing into uninitialized
> 
> which I think you've already merged into the devel branch is not
> particularly useful and it should be reverted. Reverting this will
> not cause any problem to reappear and in fact it should get rid of
> some of the warnings about more metadata blocks being allocated than
> reserved.

Well, we can not actually revert that patch because it introduces
da_reserve_metadata() which is used elsewhere in the commit:

ext4: reserve metadata block for every delayed write


I can resend the patch which uses da_reserve_metadata() so you can use
it instead the old version and just drop the

ext4: Reserve metadata if writing into uninitialized


or I can fix it in separate patch. Which would you prefer ?

-Lukas

> 
> The problem is that even though we're able to reserve the metadata
> for the possible conversion, we can never know when to free the
> metadata reservation. Currently we're freeing the metadata
> reservation (which is usually overestimated because we never know
> for sure how much metadata will actually be needed) when the number
> of reserved data blocks drop to zero. However since we only need to
> reserve metadata (and not data) blocks when writing into unwritten
> extent, this approach does not work and we would free reserved
> metadata even if we might still need it - there is no way to tell
> whether we're going to need it or not in writeback path.
> 
> However even after the revert we're still left with the problem
> with unwrittent extent conversion in dealloc path. I've attempted
> to solved it with similar mechanism which xfs has - block reserve pool.
> 
> Blocks from this reserve pool should be removed from the global pool
> and no one would be able to allocate from it, unless:
> 
> 1. we're in delalloc path and we need space for metadata allocation
>    in unwritten extent conversion,
> 
>    This should solve the problem with not having enough metadata in
>    ENOSPC condition in delayed allocation writeback path.
> 
> 2. we're in punch hole path and we need space for metadata
>    allocation when splitting extents
> 
>    This should solve the problem when we do not have _any_ space at
>    all and we attempt to punch a hole into the file resulting in the
>    need for new extent tree block. Punch hole should succeed even in
>    ENOSPC conditions since we're actually freeing space.
> 
> 3. we're writing into unwritten (preallocated) extent and we're in
>    ENOSPC condition - not having any blocks to allocated metadata
>    for unwritten extent conversion.
> 
>    This should solve the problem when we can return ENOSPC even when
>    writing into preallocated space which is certainly unexpected
>    (xfstest 274).
> 
> 
> I already have a patch which implements that, but it need some
> tweaks and some more testing, so I'll send it probably later this
> week, as well as more metadata reservation fixes (not related to
> unwritten extent conversion).
> 
> 
> 
> Let me know what do you think.
> 
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata reservation for unwritten extent conversion
  2013-03-13 16:09 ` Lukáš Czerner
@ 2013-03-13 17:25   ` Theodore Ts'o
  2013-03-13 17:31     ` Lukáš Czerner
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2013-03-13 17:25 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-ext4

On Wed, Mar 13, 2013 at 05:09:51PM +0100, Lukáš Czerner wrote:
> 
> Well, we can not actually revert that patch because it introduces
> da_reserve_metadata() which is used elsewhere in the commit:
> 
> ext4: reserve metadata block for every delayed write
> 
> I can resend the patch which uses da_reserve_metadata() so you can use
> it instead the old version

That sounds like the best thing to do.  I assume you will be simply
taking the da_reserve_metadata() function introduced in the
to-be-dropped commit:

> ext4: Reserve metadata if writing into uninitialized

... and moving it into the "reserve metadata block for every delayed
write" patch?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata reservation for unwritten extent conversion
  2013-03-13 17:25   ` Theodore Ts'o
@ 2013-03-13 17:31     ` Lukáš Czerner
  2013-03-14  3:21       ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2013-03-13 17:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Lukáš Czerner, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 993 bytes --]

On Wed, 13 Mar 2013, Theodore Ts'o wrote:

> Date: Wed, 13 Mar 2013 13:25:21 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: Metadata reservation for unwritten extent conversion
> 
> On Wed, Mar 13, 2013 at 05:09:51PM +0100, Lukáš Czerner wrote:
> > 
> > Well, we can not actually revert that patch because it introduces
> > da_reserve_metadata() which is used elsewhere in the commit:
> > 
> > ext4: reserve metadata block for every delayed write
> > 
> > I can resend the patch which uses da_reserve_metadata() so you can use
> > it instead the old version
> 
> That sounds like the best thing to do.  I assume you will be simply
> taking the da_reserve_metadata() function introduced in the
> to-be-dropped commit:
> 
> > ext4: Reserve metadata if writing into uninitialized
> 
> ... and moving it into the "reserve metadata block for every delayed
> write" patch?

Exactly.

-Lukas

> 
> 					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata reservation for unwritten extent conversion
  2013-03-13 17:31     ` Lukáš Czerner
@ 2013-03-14  3:21       ` Theodore Ts'o
  2013-03-14  6:38         ` Metadata reservation for unwritten extent conversionjkj Lukáš Czerner
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2013-03-14  3:21 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-ext4

On Wed, Mar 13, 2013 at 06:31:25PM +0100, Lukáš Czerner wrote:
> > That sounds like the best thing to do.  I assume you will be simply
> > taking the da_reserve_metadata() function introduced in the
> > to-be-dropped commit:
> > 
> > > ext4: Reserve metadata if writing into uninitialized
> > 
> > ... and moving it into the "reserve metadata block for every delayed
> > write" patch?

I wnated to kick off some testing tonight, so I've made this change to
the git tree.  So the new version of "reserve metadata block for every
delayed write" is here:

https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=c7cf7abbf754f16b78a8b9641d06baac6be60524

... and the dev branch has been modified to drop the "reserve metadata
if writing into unitialized.." commit.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Metadata reservation for unwritten extent conversionjkj
  2013-03-14  3:21       ` Theodore Ts'o
@ 2013-03-14  6:38         ` Lukáš Czerner
  2013-03-14  7:08           ` [PATCH v2] ext4: reserve metadata block for every delayed write Lukas Czerner
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2013-03-14  6:38 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Lukáš Czerner, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1276 bytes --]

On Wed, 13 Mar 2013, Theodore Ts'o wrote:

> Date: Wed, 13 Mar 2013 23:21:18 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: Metadata reservation for unwritten extent conversion
> 
> On Wed, Mar 13, 2013 at 06:31:25PM +0100, Lukáš Czerner wrote:
> > > That sounds like the best thing to do.  I assume you will be simply
> > > taking the da_reserve_metadata() function introduced in the
> > > to-be-dropped commit:
> > > 
> > > > ext4: Reserve metadata if writing into uninitialized
> > > 
> > > ... and moving it into the "reserve metadata block for every delayed
> > > write" patch?
> 
> I wnated to kick off some testing tonight, so I've made this change to
> the git tree.  So the new version of "reserve metadata block for every
> delayed write" is here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=c7cf7abbf754f16b78a8b9641d06baac6be60524
> 
> ... and the dev branch has been modified to drop the "reserve metadata
> if writing into unitialized.." commit.
> 
> 						- Ted

Great, thanks. It was quite late yesterday so I have not had time to
actually send the patch, sorry.

I'll send it today just for the "record-keeping" :)

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] ext4: reserve metadata block for every delayed write
  2013-03-14  6:38         ` Metadata reservation for unwritten extent conversionjkj Lukáš Czerner
@ 2013-03-14  7:08           ` Lukas Czerner
  0 siblings, 0 replies; 7+ messages in thread
From: Lukas Czerner @ 2013-03-14  7:08 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, Lukas Czerner

Currently we only reserve space (data+metadata) in delayed allocation if
we're allocating from new cluster (which is always in non-bigalloc file
system) which is ok for data blocks, because we reserve whole cluster.

However we have to reserve metadata for every delayed block we're going
to write because every block could potentially require metedata block
when we need to grow the extent tree.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
v2: Move ext4_da_reserve_metadata() function into this patch

 fs/ext4/inode.c |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9ea0cde..878ace9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1216,6 +1216,55 @@ static int ext4_journalled_write_end(struct file *file,
 }
 
 /*
+ * Reserve a metadata for a single block located at lblock
+ */
+static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock)
+{
+	int retries = 0;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	unsigned int md_needed;
+	ext4_lblk_t save_last_lblock;
+	int save_len;
+
+	/*
+	 * recalculate the amount of metadata blocks to reserve
+	 * in order to allocate nrblocks
+	 * worse case is one extent per block
+	 */
+repeat:
+	spin_lock(&ei->i_block_reservation_lock);
+	/*
+	 * ext4_calc_metadata_amount() has side effects, which we have
+	 * to be prepared undo if we fail to claim space.
+	 */
+	save_len = ei->i_da_metadata_calc_len;
+	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
+	md_needed = EXT4_NUM_B2C(sbi,
+				 ext4_calc_metadata_amount(inode, lblock));
+	trace_ext4_da_reserve_space(inode, md_needed);
+
+	/*
+	 * We do still charge estimated metadata to the sb though;
+	 * we cannot afford to run out of free blocks.
+	 */
+	if (ext4_claim_free_clusters(sbi, md_needed, 0)) {
+		ei->i_da_metadata_calc_len = save_len;
+		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
+		spin_unlock(&ei->i_block_reservation_lock);
+		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
+			cond_resched();
+			goto repeat;
+		}
+		return -ENOSPC;
+	}
+	ei->i_reserved_meta_blocks += md_needed;
+	spin_unlock(&ei->i_block_reservation_lock);
+
+	return 0;       /* success */
+}
+
+/*
  * Reserve a single cluster located at lblock
  */
 static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
@@ -1843,8 +1892,11 @@ add_delayed:
 		 * XXX: __block_prepare_write() unmaps passed block,
 		 * is it OK?
 		 */
-		/* If the block was allocated from previously allocated cluster,
-		 * then we dont need to reserve it again. */
+		/*
+		 * If the block was allocated from previously allocated cluster,
+		 * then we don't need to reserve it again. However we still need
+		 * to reserve metadata for every block we're going to write.
+		 */
 		if (!(map->m_flags & EXT4_MAP_FROM_CLUSTER)) {
 			ret = ext4_da_reserve_space(inode, iblock);
 			if (ret) {
@@ -1852,6 +1904,13 @@ add_delayed:
 				retval = ret;
 				goto out_unlock;
 			}
+		} else {
+			ret = ext4_da_reserve_metadata(inode, iblock);
+			if (ret) {
+				/* not enough space to reserve */
+				retval = ret;
+				goto out_unlock;
+			}
 		}
 
 		ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-14  7:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-13 15:48 Metadata reservation for unwritten extent conversion Lukáš Czerner
2013-03-13 16:09 ` Lukáš Czerner
2013-03-13 17:25   ` Theodore Ts'o
2013-03-13 17:31     ` Lukáš Czerner
2013-03-14  3:21       ` Theodore Ts'o
2013-03-14  6:38         ` Metadata reservation for unwritten extent conversionjkj Lukáš Czerner
2013-03-14  7:08           ` [PATCH v2] ext4: reserve metadata block for every delayed write Lukas Czerner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).