public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>, linux-ext4@vger.kernel.org
Subject: Re: [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full
Date: Fri, 29 Feb 2008 11:21:42 -0800	[thread overview]
Message-ID: <20080229192142.GJ2997@webber.adilger.int> (raw)
In-Reply-To: <20080229110924.GA16757@skywalker>

On Feb 29, 2008  16:39 +0530, Aneesh Kumar K.V wrote:
> > One simple solution is submit bio directly to zero out the blocks on
> > disk, and wait for that to finish before clear the uninitialized bit. On
> > a 4K block size case, the max size of an uninitialized extents is 128MB,
> > and since the blocks are all contigous on disk, a single IO could done
> > the job, the latency should not be a too big issue. After all when a
> > filesystem is full, it's already performs slowly.
> 
> This is the change that i have now. Yet to run the full test on that.
> But seems to be working for simple tests.
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index d315cc1..26396e2 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2136,6 +2136,55 @@ void ext4_ext_release(struct super_block *sb)
>  #endif
>  }
>  
> +static void bi_complete(struct bio *bio, int error)
> +{
> +	complete((struct completion*)bio->bi_private);
> +}

Note that the completion event can be called multiple times if there are
block device errors...  Our similar completion code in Lustre is like:

static int dio_complete_routine(struct bio *bio, unsigned int done, int error)
{

        /* CAVEAT EMPTOR: possibly in IRQ context  */
        if (bio->bi_size)                       /* Not complete */
                return 1;

	bio->bi_private->data.error = error;

	return 0;
}


> +/* FIXME!! we need to try to merge to left or right after zerout  */
> +static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
> +{
> +	bio = bio_alloc(GFP_NOIO, ee_len);
> +	if (!bio)
> +		return -ENOMEM;

I don't think it will be possible to allocate a bio large enough for a
maximum-sized unwritten extent.  BIO_MAX_PAGES is only 256 (1MB on x86),
but an unwritten extent can be up to 128MB.

> +	bio->bi_bdev   = inode->i_sb->s_bdev;
> +
> +	for (i = 0; i < ee_len; i++) {
> +		ret = bio_add_page(bio, ZERO_PAGE(0), blocksize, 0);
> +		if (ret != blocksize) {
> +			ret = -EIO;
> +			goto err_out;

This shouldn't be considered an error.  Rather, it just means that the
bio is full or is crossing some storage boundary so it should be submitted
and a new bio created and the zeroing continues.

Please move most of this function into a generic helper that can be used
elsewhere.  It might even go into the VFS like:

int bio_zero_blocks(struct block_device *bdev, sector_t start, sector_t len,
		    bio_end_io_t completion);

and then have ext4_ext_zeroout() call that routine after decoding the extent.
The error case is only when the bio completion routine is called and the
saved "data.error" value is returned.

> > It would be nice to detect if fs is full or almost full before convert
> > the uninitialized extents. If the total number of free blocks left are
> > not enough for the split(plan for the worse case, 3 extents adds), just
> > go ahead to do the zero out the one single chunk ahead, in stead of
> > possible zeroing out two chucks later on the error path. I feel it's
> > much cleaner that way.
> 
> We don't zero out two chunks. The uninit extent can possibly get split
> into three extent.
> [ 1st uninit] [ 2 init ] [ 3rd uninit]
> 
> 
> Now first we attempt to insert 3. And if we fail due to ENOSPC we
> zero out the full extent [1 2 3]. Now if we are successful in inserting 3 then
> we attempt to insert 2. If we fail, we zero out [1 2]. That should also
> reduce the number blocks that we are zeroing out. For example if we have
> uninit extent len of 32767 blocks and we try to write the third block within
> the extent and failed in the second step above we will zero out only 3
> blocks. If we want to zero out the full extent that would imply zero out
> 32767 blocks.

A related optimization is to determine the size of the remaining split
extents.  I propose that if either of the remaining extents are < 7
blocks long (or whatever, possibly 15 blocks to get a nice 64kB write) we
should just zero out those blocks and create a single initialized extent.
This would avoid the "write every alternate block" problem that could
grow the number of extents dramatically.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-02-29 19:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-28 18:05 [RFC][PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-02-28 18:05 ` [RFC][PATCH] ext4: Fix fallocate error path Aneesh Kumar K.V
2008-02-28 18:05   ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Aneesh Kumar K.V
2008-02-28 18:05     ` [RFC][PATCH] ext4: Enable extent format for symlink Aneesh Kumar K.V
2008-02-28 23:14     ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Mingming Cao
2008-02-29 11:09       ` Aneesh Kumar K.V
2008-02-29 19:21         ` Andreas Dilger [this message]
2008-03-01 17:30           ` Aneesh Kumar K.V
2008-03-02 18:51             ` Andreas Dilger
2008-02-29 18:05       ` Andreas Dilger
  -- strict thread matches above, loose matches on Subject: below --
2008-02-21 19:17 Aneesh Kumar K.V
2008-02-21 21:07 ` Mingming Cao
2008-02-22 14:31   ` Aneesh Kumar K.V
2008-02-22 15:42     ` Aneesh Kumar K.V
2008-02-22 17:28       ` Mingming Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080229192142.GJ2997@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox