From: Mingming Cao <cmm@us.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: Badari Pulavarty <pbadari@us.ibm.com>,
akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: Possible race between direct IO and JBD?
Date: Tue, 29 Apr 2008 10:49:37 -0700 [thread overview]
Message-ID: <1209491377.4751.21.camel@localhost.localdomain> (raw)
In-Reply-To: <20080429124321.GD1987@duck.suse.cz>
On Tue, 2008-04-29 at 14:43 +0200, Jan Kara wrote:
> On Mon 28-04-08 12:09:23, Mingming Cao wrote:
> > On Mon, 2008-04-28 at 20:09 +0200, Jan Kara wrote:
> > > On Mon 28-04-08 10:11:34, Badari Pulavarty wrote:
> > > >
> > > > On Mon, 2008-04-28 at 14:26 +0200, Jan Kara wrote:
> > > > > Hi,
> > > > >
> > > > > On Fri 25-04-08 16:38:23, Mingming Cao wrote:
> > > > > > While looking at a bug related to direct IO returns to EIO, after
> > > > > > looking at the code, I found there is a window that
> > > > > > try_to_free_buffers() from direct IO could race with JBD, which holds
> > > > > > the reference to the data buffers before journal_commit_transaction()
> > > > > > ensures the data buffers has reached to the disk.
> > > > > >
> > > > > > A little more detail: to prepare for direct IO, generic_file_direct_IO()
> > > > > > calls invalidate_inode_pages2_range() to invalidate the pages in the
> > > > > > cache before performaning direct IO. invalidate_inode_pages2_range()
> > > > > > tries to free the buffers via try_to free_buffers(), but sometimes it
> > > > > > can't, due to the buffers is possible still on some transaction's
> > > > > > t_sync_datalist or t_locked_list waiting for
> > > > > > journal_commit_transaction() to process it.
> > > > > >
> > > > > > Currently Direct IO simply returns EIO if try_to_free_buffers() finds
> > > > > > the buffer is busy, as it has no clue that JBD is referencing it.
> > > > > >
> > > > > > Is this a known issue and expected behavior? Any thoughts?
> > > > > Are you seeing this in data=ordered mode? As Andrew pointed out we do
> > > > > filemap_write_and_wait() so all the relevant data buffers of the inode
> > > > > should be already on disk. In __journal_try_to_free_buffer() we check
> > > > > whether the buffer is already-written-out data buffer and unfile and free
> > > > > it in that case. It shouldn't happen that a data buffer has
> > > > > b_next_transaction set so really the only idea why try_to_free_buffers()
> > > > > could fail is that somebody manages to write to a page via mmap before
> > > > > invalidate_inode_pages2_range() gets to it. Under which kind of load do you
> > > > > observe the problem? Do you know exactly because of which condition does
> > > > > journal_try_to_free_buffers() fail?
> > > > >
> > > >
> > > > Thank you for your reply.
> > > >
> > > > What we are noticing is invalidate_inode_pages2_range() fails with -EIO
> > > > (from try_to_free_buffers() since b_count > 0).
> > > >
> > > > I don't think the file is being updated through mmap(). Previous
> > > > writepage() added these buffers to t_sync_data list (data=ordered).
> > > > filemap_write_and_wait() waits for pagewrite back to be cleared.
> > > > So, buffers are no longer dirty, but still on the t_sync_data and
> > > > kjournald didn't get chance to process them yet :(
> > > >
> > > > Since we have elevated b_count on these buffers, try_to_free_buffers()
> > > > fails. How can we make filemap_write_and_wait() to wait for kjournald
> > > > to unfile these buffers ?
> > > Hmm, I don't get one thing:
> > > The call chain is invalidate_inode_pages2_range() ->
> > > invalidate_complete_page2() -> try_to_release_page() -> ext3_releasepage()
> > > -> journal_try_to_free_buffers() -> __journal_try_to_free_buffer() and this
> > > function should remove the buffer from the committing transaction.
> >
> > Thanks, yes I noticed that after you pointing this out.
> >
> > But __journal_try_to_free_buffer() only unfile the buffer from
> > t_sync_datalist or t_locked list, the journal head is not removed in
> > journal_remove_journal_head() there, at that time,
> > journal_remove_journal_head() just check if counter b_jcount is 0. But
> > before calling __journal_try_to_free_buffer(), since
> > journal_try_to_free_buffers() already increase the b_jcount in
> > journal_grab_journal_head(), so the journal head is not removed in
> > __journal_try_to_free_buffer-> journal_remove_journal_head()
> >
> > > So who's
> > > holding the reference to those buffers?
> >
> > Looking at the code, it seems the it's the journal_put_journal_head(jh)
> > who remove the journal head and decrease the bh
> >
> > journal_try_to_free_buffers()
> > {
> > ...
> >
> > jh = journal_grab_journal_head(bh);
> > if (!jh)
> > continue;
> >
> > jbd_lock_bh_state(bh);
> > __journal_try_to_free_buffer(journal, bh);
> > journal_put_journal_head(jh);
> > jbd_unlock_bh_state(bh);
> >
> > ...
> >
> > }
> > so when journal_put_journal_head()-> __journal_remove_journal_head(),
> > now the b_jcount is zero, but is
> > jh->b_transaction is NULL? So it seems possible that bh ref count is non
> > zero when exit from journal_put_journal_head() if jh_b_transaction is
> > not cleared.
> >
> > I miss where jh->b_transaction is clear to NULL?
> __journal_unfile_buffer() called from __journal_try_to_free_buffer() sets
> jh->b_transaction to NULL. So as soon as journal_put_journal_head() is
> called, it results in freeing of journal head and releasing buffer
> reference.
Thanks, I saw this piece of code after I post it.
> So really the only possible race I see is what I describe
> below...
>
> > > Hmm, maybe I have one idea - in theory we could call
> > > __journal_try_to_free_buffer() exactly at the moment commit code inspects
> > > the buffer. Then we'd release the buffer from the transaction but
> > > try_to_free_buffers() would fail because of elevated b_count exactly as you
> > > described. Could you maybe verify this? Not that I'd know how to easily fix
> > > this ;)...
>
here are some details:
The customer workload involves direct IO and buffered IO. The saw EIO
gets returned without any log messages. Initial probing via SystemTap
shows:
drop_buffers returns 0
try_to_free_buffers returns 0
try_to_release_page returns 0
drop_buffers returns 0
try_to_free_buffers returns 0
try_to_release_page returns 0
drop_buffers returns 0
try_to_free_buffers returns 0
try_to_release_page returns 0
invalidate_inode_pages2_range returns -5 (EIO)
drop_buffers returns 0
try_to_free_buffers returns 0
try_to_release_page returns 0
Which indicating that the EIO is from the
invalidate_inode_pages2_range(), which tries to free buffers but
failed.
We will try to add more debug information. Thanks for the suggestions.
However, Since ext3 has releasepge method defined, so the
try_to_free_buffer() failure should from
try_to_release_page()->ext3_releasepage()->journal_try_to_free_buffers()->try_to_free_buffer(), instead of try_to_release_page() calling try_to_free_buffer() directly.
If journal_try_to_free_buffers() calls try to free_buffer(), that means
the journal head is already successfully removed by
journal_remove_journal_head(), so buffer_jbd() safty checking after it
is false as expected. Otherwise try_to_free_buffer() won't be called.
In that case, I am not sure if it is possible to have race with commit
code?? we seems have j_list_lock protected when
__journal_try_to_free_buffer() is trying to take the buffer off the
list.
There are many other try_to_release_page() failure before the DIO EIO,
not sure where those coming from.
Fortunately Badari is able to reproduce this problem via simple buffered
write and direct write to the same file on 2.6.25-git12. We could add
more debug info there to see if we could get the counter and the jh
values out when try to free a busy buffer.
> Honza
next prev parent reply other threads:[~2008-04-29 17:49 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-06 17:42 [RFC] JBD ordered mode rewrite Jan Kara
2008-03-06 19:05 ` Josef Bacik
2008-03-10 16:30 ` Jan Kara
2008-03-06 23:53 ` Andrew Morton
2008-03-10 17:38 ` Jan Kara
2008-03-07 1:34 ` Mark Fasheh
2008-03-10 18:00 ` Jan Kara
2008-03-07 10:55 ` Mingming Cao
2008-03-10 18:29 ` Jan Kara
2008-03-07 23:52 ` Andreas Dilger
2008-03-08 0:08 ` Mingming Cao
2008-03-08 12:14 ` Christoph Hellwig
2008-03-10 19:54 ` Jan Kara
2008-03-10 21:37 ` Andreas Dilger
2008-04-25 23:38 ` Possible race between direct IO and JBD? Mingming Cao
2008-04-26 10:41 ` Andrew Morton
2008-04-28 12:26 ` Jan Kara
2008-04-28 17:11 ` Badari Pulavarty
2008-04-28 18:09 ` Jan Kara
2008-04-28 19:09 ` Mingming Cao
2008-04-29 12:43 ` Jan Kara
2008-04-29 17:49 ` Mingming Cao [this message]
2008-05-01 15:16 ` [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures Badari Pulavarty
2008-05-01 22:08 ` Mingming Cao
2008-05-05 17:06 ` Jan Kara
2008-05-05 17:53 ` Mingming Cao
2008-05-06 0:10 ` Badari Pulavarty
2008-05-09 22:27 ` Mingming Cao
2008-05-09 22:39 ` [PATCH] JBD:need hold j_state_lock to updates to transaction t_state to T_COMMIT Mingming Cao
2008-05-12 9:34 ` Jan Kara
2008-05-12 15:54 ` [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures Jan Kara
2008-05-12 19:23 ` Mingming Cao
2008-05-13 14:20 ` Jan Kara
2008-05-13 0:39 ` Mingming Cao
2008-05-13 14:54 ` Jan Kara
2008-05-13 16:37 ` Mingming Cao
2008-05-13 22:23 ` Mingming Cao
2008-05-14 17:08 ` Jan Kara
2008-05-14 17:41 ` Mingming Cao
2008-05-14 18:14 ` Jan Kara
2008-05-16 14:13 ` Mingming Cao
2008-05-16 14:14 ` [PATCH] Fix DIO EIO error caused by race between jbd_commit_transaction() and journal_try_to_drop_buffers() Mingming Cao
2008-05-16 15:01 ` Josef Bacik
2008-05-16 17:11 ` Mingming Cao
2008-05-16 17:17 ` Badari Pulavarty
2008-05-16 17:30 ` Mingming Cao
2008-05-16 17:12 ` Badari Pulavarty
2008-05-16 21:01 ` [PATCH] JBD: Fix DIO EIO error caused by race between free buffer and commit trasanction Mingming Cao
2008-05-18 22:37 ` Jan Kara
2008-05-19 19:59 ` Mingming Cao
2008-05-19 20:25 ` Andrew Morton
2008-05-19 22:07 ` Mingming Cao
2008-05-20 9:30 ` Jens Axboe
2008-05-20 17:47 ` Mingming Cao
2008-05-20 18:02 ` [PATCH-v2] JBD: Fix " Mingming Cao
2008-05-20 23:53 ` Jan Kara
2008-05-21 17:14 ` Mingming
2008-05-24 22:44 ` Jan Kara
2008-05-28 18:18 ` Mingming Cao
2008-05-28 18:55 ` Jan Kara
2008-05-29 0:15 ` Mingming Cao
2008-05-29 0:16 ` [PATCH][take 5] " Mingming Cao
2008-05-29 0:18 ` [PATCH][take 5] JBD2: " Mingming Cao
2008-05-30 6:24 ` Aneesh Kumar K.V
2008-05-30 15:17 ` Mingming Cao
2008-05-21 23:38 ` [PATCH 1/2][TAKE3] JBD: " Mingming
2008-05-22 5:57 ` Andrew Morton
2008-05-21 23:39 ` [PATCH 2/2][TAKE3] JBD2: " Mingming
2008-05-20 18:03 ` [PATCH -v2] JBD2: Fix race between journal " Mingming Cao
2008-05-16 21:01 ` [PATCH] JBD2: Fix DIO EIO error caused by race between " Mingming Cao
2008-05-09 22:39 ` [PATCH] JBD2:need hold j_state_lock to updates to transaction t_state to T_COMMIT Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1209491377.4751.21.camel@localhost.localdomain \
--to=cmm@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).