From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Ted Tso <tytso@mit.edu>, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 22/29] ext4: Defer clearing of PageWriteback after extent conversion
Date: Wed, 8 May 2013 15:08:26 +0800 [thread overview]
Message-ID: <20130508070826.GD20599@gmail.com> (raw)
In-Reply-To: <1365456754-29373-23-git-send-email-jack@suse.cz>
On Mon, Apr 08, 2013 at 11:32:27PM +0200, Jan Kara wrote:
> Currently PageWriteback bit gets cleared from put_io_page() called from
> ext4_end_bio(). This is somewhat inconvenient as extent tree is not
> fully updated at that time (unwritten extents are not marked as written)
> so we cannot read the data back yet. This design was dictated by lock
> ordering as we cannot start a transaction while PageWriteback bit is set
> (we could easily deadlock with ext4_da_writepages()). But now that we
> use transaction reservation for extent conversion, locking issues are
> solved and we can move PageWriteback bit clearing after extent
> conversion is done. As a result we can remove wait for unwritt en extent
> conversion from ext4_sync_file() because it already implicitely happe ns
> through wait_on_page_writeback().
>
> We implement deferring of PageWriteback clearing by queueing completed
> bios to appropriate io_end and processing all the pages when io_end is
> going to be freed instead of at the moment ext4_io_end() is called.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Regards,
- Zheng
> ---
> fs/ext4/ext4.h | 2 +
> fs/ext4/fsync.c | 4 --
> fs/ext4/page-io.c | 132 +++++++++++++++++++++++++++++------------------------
> 3 files changed, 74 insertions(+), 64 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index a594a94..2b0dd9a 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -190,6 +190,8 @@ typedef struct ext4_io_end {
> handle_t *handle; /* handle reserved for extent
> * conversion */
> struct inode *inode; /* file being written to */
> + struct bio *bio; /* Linked list of completed
> + * bios covering the extent */
> unsigned int flag; /* unwritten or not */
> loff_t offset; /* offset in the file */
> ssize_t size; /* size of the extent */
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 3278e64..e02ba1b 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -132,10 +132,6 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> if (inode->i_sb->s_flags & MS_RDONLY)
> goto out;
>
> - ret = ext4_flush_unwritten_io(inode);
> - if (ret < 0)
> - goto out;
> -
> if (!journal) {
> ret = __sync_inode(inode, datasync);
> if (!ret && !hlist_empty(&inode->i_dentry))
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 8bff3b3..2967794 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -51,14 +51,83 @@ void ext4_ioend_wait(struct inode *inode)
> wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0));
> }
>
> +/*
> + * Print an buffer I/O error compatible with the fs/buffer.c. This
> + * provides compatibility with dmesg scrapers that look for a specific
> + * buffer I/O error message. We really need a unified error reporting
> + * structure to userspace ala Digital Unix's uerf system, but it's
> + * probably not going to happen in my lifetime, due to LKML politics...
> + */
> +static void buffer_io_error(struct buffer_head *bh)
> +{
> + char b[BDEVNAME_SIZE];
> + printk(KERN_ERR "Buffer I/O error on device %s, logical block %llu\n",
> + bdevname(bh->b_bdev, b),
> + (unsigned long long)bh->b_blocknr);
> +}
> +
> +static void ext4_finish_bio(struct bio *bio)
> +{
> + int i;
> + int error = !test_bit(BIO_UPTODATE, &bio->bi_flags);
> +
> + for (i = 0; i < bio->bi_vcnt; i++) {
> + struct bio_vec *bvec = &bio->bi_io_vec[i];
> + struct page *page = bvec->bv_page;
> + struct buffer_head *bh, *head;
> + unsigned bio_start = bvec->bv_offset;
> + unsigned bio_end = bio_start + bvec->bv_len;
> + unsigned under_io = 0;
> + unsigned long flags;
> +
> + if (!page)
> + continue;
> +
> + if (error) {
> + SetPageError(page);
> + set_bit(AS_EIO, &page->mapping->flags);
> + }
> + bh = head = page_buffers(page);
> + /*
> + * We check all buffers in the page under BH_Uptodate_Lock
> + * to avoid races with other end io clearing async_write flags
> + */
> + local_irq_save(flags);
> + bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> + do {
> + if (bh_offset(bh) < bio_start ||
> + bh_offset(bh) + bh->b_size > bio_end) {
> + if (buffer_async_write(bh))
> + under_io++;
> + continue;
> + }
> + clear_buffer_async_write(bh);
> + if (error)
> + buffer_io_error(bh);
> + } while ((bh = bh->b_this_page) != head);
> + bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> + local_irq_restore(flags);
> + if (!under_io)
> + end_page_writeback(page);
> + }
> +}
> +
> static void ext4_release_io_end(ext4_io_end_t *io_end)
> {
> + struct bio *bio, *next_bio;
> +
> BUG_ON(!list_empty(&io_end->list));
> BUG_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
> WARN_ON(io_end->handle);
>
> if (atomic_dec_and_test(&EXT4_I(io_end->inode)->i_ioend_count))
> wake_up_all(ext4_ioend_wq(io_end->inode));
> +
> + for (bio = io_end->bio; bio; bio = next_bio) {
> + next_bio = bio->bi_private;
> + ext4_finish_bio(bio);
> + bio_put(bio);
> + }
> if (io_end->flag & EXT4_IO_END_DIRECT)
> inode_dio_done(io_end->inode);
> if (io_end->iocb)
> @@ -254,76 +323,20 @@ ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end)
> return io_end;
> }
>
> -/*
> - * Print an buffer I/O error compatible with the fs/buffer.c. This
> - * provides compatibility with dmesg scrapers that look for a specific
> - * buffer I/O error message. We really need a unified error reporting
> - * structure to userspace ala Digital Unix's uerf system, but it's
> - * probably not going to happen in my lifetime, due to LKML politics...
> - */
> -static void buffer_io_error(struct buffer_head *bh)
> -{
> - char b[BDEVNAME_SIZE];
> - printk(KERN_ERR "Buffer I/O error on device %s, logical block %llu\n",
> - bdevname(bh->b_bdev, b),
> - (unsigned long long)bh->b_blocknr);
> -}
> -
> static void ext4_end_bio(struct bio *bio, int error)
> {
> ext4_io_end_t *io_end = bio->bi_private;
> struct inode *inode;
> - int i;
> - int blocksize;
> sector_t bi_sector = bio->bi_sector;
>
> BUG_ON(!io_end);
> inode = io_end->inode;
> - blocksize = 1 << inode->i_blkbits;
> - bio->bi_private = NULL;
> bio->bi_end_io = NULL;
> + /* Link bio into list hanging from io_end */
> + bio->bi_private = io_end->bio;
> + io_end->bio = bio;
> if (test_bit(BIO_UPTODATE, &bio->bi_flags))
> error = 0;
> - for (i = 0; i < bio->bi_vcnt; i++) {
> - struct bio_vec *bvec = &bio->bi_io_vec[i];
> - struct page *page = bvec->bv_page;
> - struct buffer_head *bh, *head;
> - unsigned bio_start = bvec->bv_offset;
> - unsigned bio_end = bio_start + bvec->bv_len;
> - unsigned under_io = 0;
> - unsigned long flags;
> -
> - if (!page)
> - continue;
> -
> - if (error) {
> - SetPageError(page);
> - set_bit(AS_EIO, &page->mapping->flags);
> - }
> - bh = head = page_buffers(page);
> - /*
> - * We check all buffers in the page under BH_Uptodate_Lock
> - * to avoid races with other end io clearing async_write flags
> - */
> - local_irq_save(flags);
> - bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> - do {
> - if (bh_offset(bh) < bio_start ||
> - bh_offset(bh) + blocksize > bio_end) {
> - if (buffer_async_write(bh))
> - under_io++;
> - continue;
> - }
> - clear_buffer_async_write(bh);
> - if (error)
> - buffer_io_error(bh);
> - } while ((bh = bh->b_this_page) != head);
> - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> - local_irq_restore(flags);
> - if (!under_io)
> - end_page_writeback(page);
> - }
> - bio_put(bio);
>
> if (error) {
> io_end->flag |= EXT4_IO_END_ERROR;
> @@ -335,7 +348,6 @@ static void ext4_end_bio(struct bio *bio, int error)
> (unsigned long long)
> bi_sector >> (inode->i_blkbits - 9));
> }
> -
> ext4_put_io_end_defer(io_end);
> }
>
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-05-08 6:50 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-08 21:32 [PATCH 00/22 v1] Fixes and improvements in ext4 writeback path Jan Kara
2013-04-08 21:32 ` [PATCH 01/29] ext4: Make ext4_bio_write_page() use BH_Async_Write flags instead page pointers from ext4_io_end Jan Kara
2013-04-10 18:05 ` Dmitry Monakhov
2013-04-11 13:38 ` Zheng Liu
2013-04-12 3:50 ` Theodore Ts'o
2013-04-08 21:32 ` [PATCH 02/29] ext4: Use io_end for multiple bios Jan Kara
2013-04-11 5:10 ` Dmitry Monakhov
2013-04-11 14:04 ` Zheng Liu
2013-04-12 3:55 ` Theodore Ts'o
2013-04-08 21:32 ` [PATCH 03/29] ext4: Clear buffer_uninit flag when submitting IO Jan Kara
2013-04-11 14:08 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 04/29] jbd2: Reduce journal_head size Jan Kara
2013-04-11 14:10 ` Zheng Liu
2013-04-12 4:04 ` Theodore Ts'o
2013-04-08 21:32 ` [PATCH 05/29] jbd2: Don't create journal_head for temporary journal buffers Jan Kara
2013-04-12 8:01 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 06/29] jbd2: Remove journal_head from descriptor buffers Jan Kara
2013-04-12 8:10 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 07/29] jbd2: Refine waiting for shadow buffers Jan Kara
2013-05-03 14:16 ` Zheng Liu
2013-05-03 20:44 ` Jan Kara
2013-04-08 21:32 ` [PATCH 08/29] jbd2: Remove outdated comment Jan Kara
2013-05-03 14:20 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 09/29] jbd2: Cleanup needed free block estimates when starting a transaction Jan Kara
2013-05-05 8:17 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 10/29] jbd2: Fix race in t_outstanding_credits update in jbd2_journal_extend() Jan Kara
2013-05-05 8:37 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 11/29] jbd2: Remove unused waitqueues Jan Kara
2013-05-05 8:41 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 12/29] jbd2: Transaction reservation support Jan Kara
2013-05-05 9:39 ` Zheng Liu
2013-05-06 12:49 ` Jan Kara
2013-05-07 5:22 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 13/29] ext4: Provide wrappers for transaction reservation calls Jan Kara
2013-05-05 11:51 ` Zheng Liu
2013-05-05 11:58 ` Zheng Liu
2013-05-06 12:51 ` Jan Kara
2013-04-08 21:32 ` [PATCH 14/29] ext4: Stop messing with nr_to_write in ext4_da_writepages() Jan Kara
2013-05-05 12:40 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 15/29] ext4: Deprecate max_writeback_mb_bump sysfs attribute Jan Kara
2013-05-05 12:47 ` Zheng Liu
2013-05-06 12:55 ` Jan Kara
2013-04-08 21:32 ` [PATCH 16/29] ext4: Improve writepage credit estimate for files with indirect blocks Jan Kara
2013-05-07 5:39 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 17/29] ext4: Better estimate credits needed for ext4_da_writepages() Jan Kara
2013-05-07 6:33 ` Zheng Liu
2013-05-07 14:17 ` Jan Kara
2013-04-08 21:32 ` [PATCH 18/29] ext4: Restructure writeback path Jan Kara
2013-05-08 3:48 ` Zheng Liu
2013-05-08 11:20 ` Jan Kara
2013-04-08 21:32 ` [PATCH 19/29] ext4: Remove buffer_uninit handling Jan Kara
2013-05-08 6:56 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 20/29] ext4: Use transaction reservation for extent conversion in ext4_end_io Jan Kara
2013-05-08 6:57 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 21/29] ext4: Split extent conversion lists to reserved & unreserved parts Jan Kara
2013-05-08 7:03 ` Zheng Liu
2013-05-08 11:23 ` Jan Kara
2013-05-08 11:49 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 22/29] ext4: Defer clearing of PageWriteback after extent conversion Jan Kara
2013-05-08 7:08 ` Zheng Liu [this message]
2013-04-08 21:32 ` [PATCH 23/29] ext4: Protect extent conversion after DIO with i_dio_count Jan Kara
2013-05-08 7:08 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 24/29] ext4: Remove wait for unwritten extent conversion from ext4_ext_truncate() Jan Kara
2013-05-08 7:35 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 25/29] ext4: Use generic_file_fsync() in ext4_file_fsync() in nojournal mode Jan Kara
2013-05-08 7:37 ` Zheng Liu
2013-05-08 11:29 ` Jan Kara
2013-04-08 21:32 ` [PATCH 26/29] ext4: Remove i_mutex from ext4_file_sync() Jan Kara
2013-05-08 7:41 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 27/29] ext4: Remove wait for unwritten extents in ext4_ind_direct_IO() Jan Kara
2013-05-08 7:55 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 28/29] ext4: Don't wait for extent conversion in ext4_ext_punch_hole() Jan Kara
2013-05-08 7:56 ` Zheng Liu
2013-04-08 21:32 ` [PATCH 29/29] ext4: Remove ext4_ioend_wait() Jan Kara
2013-05-08 7:57 ` Zheng Liu
2013-05-08 11:32 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130508070826.GD20599@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.