From: Younger Liu <younger.liu@huawei.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
<linux-ext4@vger.kernel.org>,
Ocfs2-Devel <ocfs2-devel@oss.oracle.com>,
Li Zefan <lizefan@huawei.com>, <jack@suse.cz>
Subject: Re: [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart()
Date: Fri, 21 Jun 2013 21:29:31 +0800 [thread overview]
Message-ID: <51C4553B.7010809@huawei.com> (raw)
In-Reply-To: <20130620155555.GE28309@thunk.org>
Hi Ted,
On 2013/6/20 23:55, Theodore Ts'o wrote:
> [ LKML and linux-fsdevel BCC'ed ]
>
> On Wed, Jun 19, 2013 at 12:48:26PM +0800, Younger Liu wrote:
>> jbd2_journal_restart() would restart a handle. In this function, it
>> calls start_this_handle(). Before calling start_this_handle(),subtract
>> 1 from transaction->t_updates.
>> If start_this_handle() succeeds, transaction->t_updates increases by 1
>> in it. But if start_this_handle() fails, transaction->t_updates does
>> not increase.
>> So, when commit the handle's transaction in jbd2_journal_stop(), the
>> assertion is false, and then trigger a bug.
>> The assertion is as follows:
>> J_ASSERT(atomic_read(&transaction->t_updates) > 0)
>>
>> Signed-off-by: Younger Liu <younger.liu@huawei.com>
>
> Thanks for pointing out this potential problem. Your fix isn't quite
> the right one, however.
>
> The problem is once we get to this point, the transaction pointer may
> no longer be valid, since once we decrement t_updates, the transaction
> could start commiting, and so we should not actually dereference the
> transaction pointer after we unlock transaction->t_handle_lock. (We
> are referencing t_tid two lines later, and technically that's a bug.
> We've just been getting lucky.)
>
> The real issue is that by the time we call start_this_handle() in
> jbd2__journal_restart, the handle is not attached to any transaction.
> So if jbd2_journal_restart() fails, the handle has to be considered
> invalid, and the calling code should not try to use the handle at all,
> including calling jbd2_journal_stop().
>
> Jan Kara is I believe currently on vacation but I'd really like him to
> chime in with his opinion about the best way to fix this, since he's
> also quite familiar with the jbd2 code.
>
> Also, Jan has recently submitted changes to implement reserved handles
> (to be submitted in the next merge window), and in these new
> functions, if start_this_handle() fails when called from
> jbd2_journal_start_reserved(), the handle is left invalidated, and the
> caller of jbd2_journal_start_reserved() must not touch the handle
> again, including calling jbd2_journal_stop() --- in fact, because
> jbd2_journal_start_reserved() clears current->journal_info on failure,
> an attempt to call jbd2_journal_stop() will result in the kernel oops
> due to an assertion failure.
>
> My inclination is to fix this in the same way, but it will require
> changing the current code paths that use jbd2_journal_restart(), and
> in some cases passing back the state that the handle is now invalid
> and should not be released via jbd2_journal_stop() is going to be
> tricky indeed.
>
>
> Another possible fix is to set the handle to be aborted, via
> jbd2_journal_abort_handle(). This function isn't used at all at the
> moment, but from what I can tell this should do the right thing. The
> one unfortunate thing about this is that when jbd2_journal_stop() gets
> called, it will return EROFS, which is a misleading error code. I'm
> guessing you're seeing this because start_this_handle() returned
> ENOMEM, correct? We could hack around this by stashing the real error
> in the handle, and then change jbd2_journal_stop() to return that
> error instead of EROFS if it is set.
>
> This second solution is hacky all-around, and it's also inconsistent
> with how we are doing things with jbd2_journal_start_reserved(). So
> I'm not so happy with this solution. But it would require a lot less
> work because the fix would be isolated in the jbd2 layer. OTOH, right
> now if the code calls jbd2_journal_stop() on the handle after a
> failure in jbd2_journal_start_reserved(), they are crashing anyway, so
> changing the code so it changes with an assertion failure doesn't make
> things any worse, and then we fix things in ext4 and ocfs2 without any
> patch interdependencies --- and this is a problem which appears to
> happen very rarely in practice.
>
> (How did you manage to trigger this, BTW? Was this something you
> noticed by manual code inspection? Or are you instrumenting the
> kernel's memory allocators to occasionally fail to test our error
> paths? Or were you running in a system with very heavy memory
> pressure?)
>
> - Ted
>
This bug was triggered by the following scenario:
In ocfs2 file system, allocate a very large disk space for a small file
with ocfs2_fallocate(), while the journal file size is 32M.
Because there are much many journal blocks needed by jbd2_journal_restart(),
so that nblocks is greater than journal->j_max_transaction_buffers
in start_this_handle(), and then return -ENOSPC.
In start_this_handle():
if (nblocks > journal->j_max_transaction_buffers) {
printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
current->comm, nblocks,
journal->j_max_transaction_buffers);
return -ENOSPC;
}
Dump stack:
<ffffffffa0b04620>{jbd2:jbd2_journal_stop+0x50}
<ffffffffa0b671a3>{ocfs2:ocfs2_commit_trans+0x23}
<ffffffffa0b5c8ce>{ocfs2:__ocfs2_extend_allocation+0x66e}
<ffffffffa0b5cc55>{ocfs2:ocfs2_allocate_unwritten_extents+0xc5}
<ffffffffa0b5d315>{ocfs2:__ocfs2_change_file_space+0x3f5}
<ffffffffa0b5d67a>{ocfs2:ocfs2_fallocate+0x7a}
<ffffffff80127689>{do_fallocate+0x129}
<ffffffff801276d6>{sys_fallocate+0x46}
<ffffffff803fd423>{system_call_fastpath+0x16}
[<00007f7283b25030>]
This problem may be because jbd2_journal_restart() was called incorrectly
in __ocfs2_extend_allocation() which was called by ocfs2_fallocate().
Although I solved this question by modifing __ocfs2_extend_allocation(),
there is also a risk in jbd2_journal_restart() for jbd2 system.
So I put this potential risk to see if there is a better ideas.
>
>
>> ---
>> fs/jbd2/transaction.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index 325bc01..9ddb444 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -530,6 +530,8 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
>> lock_map_release(&handle->h_lockdep_map);
>> handle->h_buffer_credits = nblocks;
>> ret = start_this_handle(journal, handle, gfp_mask);
>> + if (ret < 0)
>> + atomic_inc(&transaction->t_updates);
>> return ret;
>> }
>> EXPORT_SYMBOL(jbd2__journal_restart);
>> --
>> 1.7.9.7
>>
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Younger Liu <younger.liu@huawei.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-ext4@vger.kernel.org,
Ocfs2-Devel <ocfs2-devel@oss.oracle.com>,
Li Zefan <lizefan@huawei.com>,
jack@suse.cz
Subject: [Ocfs2-devel] [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart()
Date: Fri, 21 Jun 2013 21:29:31 +0800 [thread overview]
Message-ID: <51C4553B.7010809@huawei.com> (raw)
In-Reply-To: <20130620155555.GE28309@thunk.org>
Hi Ted,
On 2013/6/20 23:55, Theodore Ts'o wrote:
> [ LKML and linux-fsdevel BCC'ed ]
>
> On Wed, Jun 19, 2013 at 12:48:26PM +0800, Younger Liu wrote:
>> jbd2_journal_restart() would restart a handle. In this function, it
>> calls start_this_handle(). Before calling start_this_handle()?subtract
>> 1 from transaction->t_updates.
>> If start_this_handle() succeeds, transaction->t_updates increases by 1
>> in it. But if start_this_handle() fails, transaction->t_updates does
>> not increase.
>> So, when commit the handle's transaction in jbd2_journal_stop(), the
>> assertion is false, and then trigger a bug.
>> The assertion is as follows:
>> J_ASSERT(atomic_read(&transaction->t_updates) > 0)
>>
>> Signed-off-by: Younger Liu <younger.liu@huawei.com>
>
> Thanks for pointing out this potential problem. Your fix isn't quite
> the right one, however.
>
> The problem is once we get to this point, the transaction pointer may
> no longer be valid, since once we decrement t_updates, the transaction
> could start commiting, and so we should not actually dereference the
> transaction pointer after we unlock transaction->t_handle_lock. (We
> are referencing t_tid two lines later, and technically that's a bug.
> We've just been getting lucky.)
>
> The real issue is that by the time we call start_this_handle() in
> jbd2__journal_restart, the handle is not attached to any transaction.
> So if jbd2_journal_restart() fails, the handle has to be considered
> invalid, and the calling code should not try to use the handle at all,
> including calling jbd2_journal_stop().
>
> Jan Kara is I believe currently on vacation but I'd really like him to
> chime in with his opinion about the best way to fix this, since he's
> also quite familiar with the jbd2 code.
>
> Also, Jan has recently submitted changes to implement reserved handles
> (to be submitted in the next merge window), and in these new
> functions, if start_this_handle() fails when called from
> jbd2_journal_start_reserved(), the handle is left invalidated, and the
> caller of jbd2_journal_start_reserved() must not touch the handle
> again, including calling jbd2_journal_stop() --- in fact, because
> jbd2_journal_start_reserved() clears current->journal_info on failure,
> an attempt to call jbd2_journal_stop() will result in the kernel oops
> due to an assertion failure.
>
> My inclination is to fix this in the same way, but it will require
> changing the current code paths that use jbd2_journal_restart(), and
> in some cases passing back the state that the handle is now invalid
> and should not be released via jbd2_journal_stop() is going to be
> tricky indeed.
>
>
> Another possible fix is to set the handle to be aborted, via
> jbd2_journal_abort_handle(). This function isn't used at all at the
> moment, but from what I can tell this should do the right thing. The
> one unfortunate thing about this is that when jbd2_journal_stop() gets
> called, it will return EROFS, which is a misleading error code. I'm
> guessing you're seeing this because start_this_handle() returned
> ENOMEM, correct? We could hack around this by stashing the real error
> in the handle, and then change jbd2_journal_stop() to return that
> error instead of EROFS if it is set.
>
> This second solution is hacky all-around, and it's also inconsistent
> with how we are doing things with jbd2_journal_start_reserved(). So
> I'm not so happy with this solution. But it would require a lot less
> work because the fix would be isolated in the jbd2 layer. OTOH, right
> now if the code calls jbd2_journal_stop() on the handle after a
> failure in jbd2_journal_start_reserved(), they are crashing anyway, so
> changing the code so it changes with an assertion failure doesn't make
> things any worse, and then we fix things in ext4 and ocfs2 without any
> patch interdependencies --- and this is a problem which appears to
> happen very rarely in practice.
>
> (How did you manage to trigger this, BTW? Was this something you
> noticed by manual code inspection? Or are you instrumenting the
> kernel's memory allocators to occasionally fail to test our error
> paths? Or were you running in a system with very heavy memory
> pressure?)
>
> - Ted
>
This bug was triggered by the following scenario:
In ocfs2 file system, allocate a very large disk space for a small file
with ocfs2_fallocate(), while the journal file size is 32M.
Because there are much many journal blocks needed by jbd2_journal_restart(),
so that nblocks is greater than journal->j_max_transaction_buffers
in start_this_handle(), and then return -ENOSPC.
In start_this_handle():
if (nblocks > journal->j_max_transaction_buffers) {
printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
current->comm, nblocks,
journal->j_max_transaction_buffers);
return -ENOSPC;
}
Dump stack:
<ffffffffa0b04620>{jbd2:jbd2_journal_stop+0x50}
<ffffffffa0b671a3>{ocfs2:ocfs2_commit_trans+0x23}
<ffffffffa0b5c8ce>{ocfs2:__ocfs2_extend_allocation+0x66e}
<ffffffffa0b5cc55>{ocfs2:ocfs2_allocate_unwritten_extents+0xc5}
<ffffffffa0b5d315>{ocfs2:__ocfs2_change_file_space+0x3f5}
<ffffffffa0b5d67a>{ocfs2:ocfs2_fallocate+0x7a}
<ffffffff80127689>{do_fallocate+0x129}
<ffffffff801276d6>{sys_fallocate+0x46}
<ffffffff803fd423>{system_call_fastpath+0x16}
[<00007f7283b25030>]
This problem may be because jbd2_journal_restart() was called incorrectly
in __ocfs2_extend_allocation() which was called by ocfs2_fallocate().
Although I solved this question by modifing __ocfs2_extend_allocation(),
there is also a risk in jbd2_journal_restart() for jbd2 system.
So I put this potential risk to see if there is a better ideas.
>
>
>> ---
>> fs/jbd2/transaction.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index 325bc01..9ddb444 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -530,6 +530,8 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
>> lock_map_release(&handle->h_lockdep_map);
>> handle->h_buffer_credits = nblocks;
>> ret = start_this_handle(journal, handle, gfp_mask);
>> + if (ret < 0)
>> + atomic_inc(&transaction->t_updates);
>> return ret;
>> }
>> EXPORT_SYMBOL(jbd2__journal_restart);
>> --
>> 1.7.9.7
>>
>
> .
>
next prev parent reply other threads:[~2013-06-21 13:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-19 4:48 [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart() Younger Liu
2013-06-19 4:48 ` [Ocfs2-devel] " Younger Liu
2013-06-20 15:55 ` Theodore Ts'o
2013-06-20 15:55 ` Theodore Ts'o
2013-06-20 15:55 ` [Ocfs2-devel] " Theodore Ts'o
2013-06-20 16:08 ` [PATCH] jbd2: fix theoretical race in jbd2__journal_restart Theodore Ts'o
2013-06-20 16:08 ` [Ocfs2-devel] " Theodore Ts'o
2013-06-20 17:26 ` [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart() Josef Bacik
2013-06-20 17:26 ` [Ocfs2-devel] " Josef Bacik
2013-06-20 18:12 ` Theodore Ts'o
2013-06-20 18:12 ` [Ocfs2-devel] " Theodore Ts'o
2013-06-21 23:26 ` Jan Kara
2013-06-21 23:26 ` [Ocfs2-devel] " Jan Kara
2013-06-21 13:29 ` Younger Liu [this message]
2013-06-21 13:29 ` Younger Liu
2013-06-23 17:36 ` Theodore Ts'o
2013-06-23 17:36 ` [Ocfs2-devel] " Theodore Ts'o
2013-06-23 17:44 ` [PATCH] jbd2: invalidate handle if jbd2_journal_restart() fails Theodore Ts'o
2013-06-24 9:53 ` Jan Kara
2013-06-25 9:42 ` Younger Liu
2013-06-29 23:46 ` Theodore Ts'o
2013-07-03 12:22 ` Younger Liu
2013-07-03 12:36 ` Younger Liu
2013-06-25 8:30 ` [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart() Younger Liu
2013-06-25 8:30 ` [Ocfs2-devel] " Younger Liu
2013-06-29 13:22 ` Joel Becker
2013-06-29 13:22 ` [Ocfs2-devel] " Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51C4553B.7010809@huawei.com \
--to=younger.liu@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.