linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>,
	jack@suse.cz, akpm@linux-foundation.org,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails
Date: Wed, 6 Aug 2008 14:47:29 +0200	[thread overview]
Message-ID: <20080806124728.GC9233@duck.suse.cz> (raw)
In-Reply-To: <1217970194.7516.13.camel@mingming-laptop>

  Hi,

On Tue 05-08-08 14:03:14, Mingming Cao wrote:
> 在 2008-08-04一的 20:10 +0900,Hisashi Hifumi写道:
> > Hi
> > 
> > Dio write returns EIO when try_to_release_page fails because bh is
> > still referenced.
> > The patch 
> > "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
> > Author: Mingming Cao <cmm@us.ibm.com>
> > Date:   Fri Jul 25 01:46:22 2008 -0700
> > 
> >     jbd: fix race between free buffer and commit transaction
> > " 
> > was merged into 2.6.27-rc1, but I noticed that this patch is not enough
> > to fix the race.
> > I did fsstress test heavily to 2.6.27-rc1, and found that dio write still 
> > sometimes got EIO through this test.
> 
> :(  thought we beat that race pretty hard already.T
> 
> Could you send me the fsstree command to reproduce the race?
  It is a part of ext3-tools package Andrew has somewhere and also LTP has
it I think.

> > The patch above fixed race between freeing buffer(dio) and committing 
> > transaction(jbd) but I discovered that there is another race, 
> > freeing buffer(dio) and ext3/4_ordered_writepage.
> > : background_writeout()
> >      ->write_cache_pages()
> >        ->ext3_ordered_writepage()
> >      	   walk_page_buffers() <- take a bh ref
> >  	   block_write_full_page() <- unlock_page
> > 		: <- end_page_writeback
> >                 : <- race! (dio write->try_to_release_page fails)
> >       	   walk_page_buffers() <-release a bh ref
> > 
> > ext3_ordered_writepage holds bh ref and does unlock_page remaining 
> > taking a bh ref, so this causes the race and failure of 
> > try_to_release_page.
> > 
> 
> I thought about this before, the race seems unlikely to me. Perhaps I
> missed something, but DIO code already waiting for all the pending IO to
> finish before calling try_to_release_page which eventually called
> journal_try_to_free_buffers(). During this call, the inode mutx is hold
> to prevent the new writer (buffered/DIO) to re-dirty the pages. If there
> is background writeout happens when DIO is kicked in, DIO will wait for
> all the pages writeback bit clear first. here is the stack
  Yes, but in principle, nothing assures that writeback of buffers does not
finish even before block_write_full_page() returns. So there is possibly a
window after PageWriteback is cleared (and thus filemap_fdatawait()
finishes) and before buffer references are dropped.
  Now what is more likely in practice is that all buffers of the page are
written during transaction commit. So they are clean but the page remains
dirty. Now background writeback happens, sees dirty page, calls:
  ext3_ordered_writepage()
    block_write_full_page()
      - finds all buffers are clean -> end_page_writeback()
- and at this point direct IO happens which happily proceeds upto a point
  where try_to_release_page() fails because ext3_ordered_writepage() has
  not yet dropped its references to buffers. Nasty.

  So we really need some nice and effective way in which ->writepage()
calls (and possibly others) could synchronize with try_to_release_page()
(which has __GFP_WAIT and __GFP_FS and is thus willing to wait a bit). But
I don't have a good candidate...

								Honza
> generic_file_aio_write()
>   -> mutex_lock(&inode->i_mutex);
>   -> __generic_file_aio_write_nolock()
>      -> generic_file_direct_IO()
>         ->filemap_write_and_wait()
>            -> filemap_fdatawait()
>               -> wait_on_page_writeback_range()
>                                                 (==== waiting for
> pending IO to finish ====)
>       ->invalidate_inode_pages2_range()
>           ->invalidate_inode_pages2()
>              ->try_to_releasepage()
>                 ->ext3_releasepage()
>                     ->journal_try_to_free_buffers()
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2008-08-06 12:47 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-04 11:10 [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails Hisashi Hifumi
2008-08-04 21:50 ` Andrew Morton
2008-08-05  2:36   ` [PATCH] jbd jbd2: fix dio write returning EIO whentry_to_release_page fails Hisashi Hifumi
2008-08-05 21:35     ` Mingming Cao
2008-08-06  2:04       ` [PATCH] jbd jbd2: fix dio write returning EIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-05  3:35   ` [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails Chris Mason
2008-08-05  4:51     ` [PATCH] jbd jbd2: fix dio write returning EIO whentry_to_release_page fails Hisashi Hifumi
2008-08-05 16:17       ` Chris Mason
2008-08-05 21:17         ` Mingming Cao
2008-08-06  6:55           ` [PATCH] jbd jbd2: fix dio write returning EIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-06  8:39             ` [PATCH] jbd jbd2: fix dio write returningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-06 13:25           ` [PATCH] jbd jbd2: fix dio write returning EIO whentry_to_release_page fails Chris Mason
2008-08-06 13:53             ` Jan Kara
2008-08-06 22:57               ` Mingming Cao
2008-08-07  1:07                 ` Chris Mason
2008-08-07  3:15                 ` [PATCH] jbd jbd2: fix dio write returning EIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-07 10:21                   ` Chris Mason
2008-08-08  3:28                     ` [PATCH] jbd jbd2: fix dio write returningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-08 12:54                       ` Chris Mason
2008-08-11  6:25                         ` [PATCH] jbd jbd2: fix dio writereturningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-12 13:28                           ` Chris Mason
2008-08-12 16:38                             ` Zach Brown
2008-08-12 20:06                             ` Mingming Cao
2008-08-13  6:02                               ` [PATCH] jbd jbd2: fix diowritereturningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-13 10:56                               ` [PATCH] jbd jbd2: fix dio writereturningEIOwhentry_to_release_page fails Jan Kara
2008-08-13 10:16                             ` Jan Kara
2008-08-13 12:59                               ` Chris Mason
2008-08-19  7:03                                 ` [PATCH] jbd jbd2: fix diowritereturningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-19  7:16                                   ` Andrew Morton
2008-08-20  2:50                                     ` [PATCH] jbd jbd2: fixdiowritereturningEIOwhentry_to_release_page fails Hisashi Hifumi
2008-08-21  7:47                                     ` Hisashi Hifumi
2008-08-05 21:03 ` [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails Mingming Cao
2008-08-06 12:47   ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080806124728.GC9233@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=hifumi.hisashi@oss.ntt.co.jp \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).