From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full
Date: Fri, 29 Feb 2008 16:39:24 +0530 [thread overview]
Message-ID: <20080229110924.GA16757@skywalker> (raw)
In-Reply-To: <1204240440.3609.26.camel@localhost.localdomain>
On Thu, Feb 28, 2008 at 03:14:00PM -0800, Mingming Cao wrote:
> On Thu, 2008-02-28 at 23:35 +0530, Aneesh Kumar K.V wrote:
> > A write to prealloc area cause the split of unititalized extent into a initialized
> > and uninitialized extent. If we don't have space to add new extent information instead
> > of returning error convert the existing uninitialized extent to initialized one. We
> > need to zero out the blocks corresponding to the extent to prevent wrong data reaching
> > userspace.
> >
> > +
....
>
> The complexity added to the code to handle the corner case seems not
> worth the effort.
>
> One simple solution is submit bio directly to zero out the blocks on
> disk, and wait for that to finish before clear the uninitialized bit. On
> a 4K block size case, the max size of an uninitialized extents is 128MB,
> and since the blocks are all contigous on disk, a single IO could done
> the job, the latency should not be a too big issue. After all when a
> filesystem is full, it's already performs slowly.
This is the change that i have now. Yet to run the full test on that.
But seems to be working for simple tests.
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d315cc1..26396e2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2136,6 +2136,55 @@ void ext4_ext_release(struct super_block *sb)
#endif
}
+static void bi_complete(struct bio *bio, int error)
+{
+ complete((struct completion*)bio->bi_private);
+}
+
+/* FIXME!! we need to try to merge to left or right after zerout */
+static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
+{
+ int ret = -EIO;
+ struct bio *bio;
+ int blkbits, blocksize;
+ ext4_fsblk_t ee_pblock;
+ unsigned int ee_len, i;
+ struct completion event;
+
+
+ blkbits = inode->i_blkbits;
+ blocksize = inode->i_sb->s_blocksize;
+ ee_len = ext4_ext_get_actual_len(ex);
+ ee_pblock = ext_pblock(ex);
+
+ bio = bio_alloc(GFP_NOIO, ee_len);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_sector = ee_pblock << (blkbits >> 9);
+ bio->bi_bdev = inode->i_sb->s_bdev;
+
+ for (i = 0; i < ee_len; i++) {
+ ret = bio_add_page(bio, ZERO_PAGE(0), blocksize, 0);
+ if (ret != blocksize) {
+ ret = -EIO;
+ goto err_out;
+ }
+ }
+
+ init_completion(&event);
+ bio->bi_private = &event;
+ bio->bi_end_io = bi_complete;
+ submit_bio(WRITE, bio);
+ wait_for_completion(&event);
+
+ if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+ ret = 0;
+err_out:
+ bio_put(bio);
+ return ret;
+}
+
/*
* This function is called by ext4_ext_get_blocks() if someone tries to write
* to an uninitialized extent. It may result in splitting the uninitialized
@@ -2202,14 +2251,19 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex3->ee_len = cpu_to_le16(allocated - max_blocks);
ext4_ext_mark_uninitialized(ex3);
err = ext4_ext_insert_extent(handle, inode, path, ex3);
- if (err) {
+ if (err == -ENOSPC) {
+ err = ext4_ext_zeroout(inode, &orig_ex);
+ if (err)
+ goto fix_extent_len;
+ /* update the extent length and mark as initialized */
ex->ee_block = orig_ex.ee_block;
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
- ext4_ext_mark_uninitialized(ex);
ext4_ext_dirty(handle, inode, path + depth);
- goto out;
- }
+ return le16_to_cpu(ex->ee_len);
+
+ } else if (err)
+ goto fix_extent_len;
/*
* The depth, and hence eh & ex might change
* as part of the insert above.
@@ -2295,15 +2349,28 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
goto out;
insert:
err = ext4_ext_insert_extent(handle, inode, path, &newex);
- if (err) {
+ if (err == -ENOSPC) {
+ err = ext4_ext_zeroout(inode, &orig_ex);
+ if (err)
+ goto fix_extent_len;
+ /* update the extent length and mark as initialized */
ex->ee_block = orig_ex.ee_block;
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
- ext4_ext_mark_uninitialized(ex);
ext4_ext_dirty(handle, inode, path + depth);
- }
+ return le16_to_cpu(ex->ee_len);
+ } else if (err)
+ goto fix_extent_len;
out:
return err ? err : allocated;
+
+fix_extent_len:
+ ex->ee_block = orig_ex.ee_block;
+ ex->ee_len = orig_ex.ee_len;
+ ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
+ ext4_ext_mark_uninitialized(ex);
+ ext4_ext_dirty(handle, inode, path + depth);
+ return err;
}
/*
I am not invalidating the inode mapping after zeroing out the block. I
guess that is the right thing to do considering that pages already
mapped in via read or mmap would contain same value (zero).
>
> > /*
> > * This function is called by ext4_ext_get_blocks() if someone tries to write
> > * to an uninitialized extent. It may result in splitting the uninitialized
> > @@ -2202,14 +2333,20 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> > ex3->ee_len = cpu_to_le16(allocated - max_blocks);
> > ext4_ext_mark_uninitialized(ex3);
> > + } else if (err)
......
> > + goto fix_extent_len;
> > out:
> > return err ? err : allocated;
> > +
> > +fix_extent_len:
> > + ex->ee_block = orig_ex.ee_block;
> > + ex->ee_len = orig_ex.ee_len;
> > + ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> > + ext4_ext_mark_uninitialized(ex);
> > + ext4_ext_dirty(handle, inode, path + depth);
> > + return err;
> > }
> >
> It would be nice to detect if fs is full or almost full before convert
> the uninitialized extents. If the total number of free blocks left are
> not enough for the split(plan for the worse case, 3 extents adds), just
> go ahead to do the zero out the one single chunk ahead, in stead of
> possible zeroing out two chucks later on the error path. I feel it's
> much cleaner that way.
>
We don't zero out two chunks. The uninit extent can possibly get split
into three extent.
[ 1st uninit] [ 2 init ] [ 3rd uninit]
Now first we attempt to insert 3. And if we fail due to ENOSPC we
zero out the full extent [1 2 3]. Now if we are successful in inserting 3 then
we attempt to insert 2. If we fail, we zero out [1 2]. That should also
reduce the number blocks that we are zeroing out. For example if we have
uninit extent len of 32767 blocks and we try to write the third block within
the extent and failed in the second step above we will zero out only 3
blocks. If we want to zero out the full extent that would imply zero out
32767 blocks.
-aneesh
next prev parent reply other threads:[~2008-02-29 11:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-28 18:05 [RFC][PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-02-28 18:05 ` [RFC][PATCH] ext4: Fix fallocate error path Aneesh Kumar K.V
2008-02-28 18:05 ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Aneesh Kumar K.V
2008-02-28 18:05 ` [RFC][PATCH] ext4: Enable extent format for symlink Aneesh Kumar K.V
2008-02-28 23:14 ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Mingming Cao
2008-02-29 11:09 ` Aneesh Kumar K.V [this message]
2008-02-29 19:21 ` Andreas Dilger
2008-03-01 17:30 ` Aneesh Kumar K.V
2008-03-02 18:51 ` Andreas Dilger
2008-02-29 18:05 ` Andreas Dilger
-- strict thread matches above, loose matches on Subject: below --
2008-02-21 19:17 Aneesh Kumar K.V
2008-02-21 21:07 ` Mingming Cao
2008-02-22 14:31 ` Aneesh Kumar K.V
2008-02-22 15:42 ` Aneesh Kumar K.V
2008-02-22 17:28 ` Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080229110924.GA16757@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox