* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
@ 2013-05-13 9:53 EUNBONG SONG
2013-05-13 11:26 ` Zheng Liu
0 siblings, 1 reply; 6+ messages in thread
From: EUNBONG SONG @ 2013-05-13 9:53 UTC (permalink / raw)
To: Zheng Liu, Tony Luck
Cc: Dmitry Monakhov, Theodore Ts'o, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org
> Hi all,
> First of all I couldn't reproduce this regression in my sand box. So
> the following speculation is only my guess. I suspect that the commit
> (ae4647fb) isn't root cause. It just uncover a potential bug that has
> been there for a long time. I look at the code, and found two
> suspicious stuff in jbd2. The first one is in do_get_write_access().
> In this function we forgot to lock bh state when we check b_jlist ==
> BJ_Shadow. I generate a patch to fix it, and I really think it is the
> root cause. Further, in __journal_remove_journal_head() we check
> b_jlist == BJ_None. But, when this function is called, bh state won't
> be locked sometimes. So I suspect this is why we hit a BUG in
> jbd2_journal_put_journal_head(). But I don't have a good solution to
> fix this until now because I don't know whether we need to lock bh state
> here, or maybe we should remove this assertation.
>
> So, generally, Tony, Eunbong, could you please try the following patch?
>
> Thanks in advance,
> - Zheng
Hi, I tested your patch. Unfortunately, the same problem was reproduced.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
2013-05-13 9:53 Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+ EUNBONG SONG
@ 2013-05-13 11:26 ` Zheng Liu
2013-05-13 12:07 ` Dmitry Monakhov
0 siblings, 1 reply; 6+ messages in thread
From: Zheng Liu @ 2013-05-13 11:26 UTC (permalink / raw)
To: EUNBONG SONG
Cc: Tony Luck, Dmitry Monakhov, Theodore Ts'o,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
>
>
> > Hi all,
>
> > First of all I couldn't reproduce this regression in my sand box. So
> > the following speculation is only my guess. I suspect that the commit
> > (ae4647fb) isn't root cause. It just uncover a potential bug that has
> > been there for a long time. I look at the code, and found two
> > suspicious stuff in jbd2. The first one is in do_get_write_access().
> > In this function we forgot to lock bh state when we check b_jlist ==
> > BJ_Shadow. I generate a patch to fix it, and I really think it is the
> > root cause. Further, in __journal_remove_journal_head() we check
> > b_jlist == BJ_None. But, when this function is called, bh state won't
> > be locked sometimes. So I suspect this is why we hit a BUG in
> > jbd2_journal_put_journal_head(). But I don't have a good solution to
> > fix this until now because I don't know whether we need to lock bh state
> > here, or maybe we should remove this assertation.
> >
> > So, generally, Tony, Eunbong, could you please try the following patch?
> >
> > Thanks in advance,
> > - Zheng
>
>
> Hi, I tested your patch. Unfortunately, the same problem was reproduced.
> Thanks.
Thanks for trying this patch. Could you please repost the dmesg log for
me? I want to make sure whether the second suspicious stuff causes this
regression or not. Further, that would be great if you could try to
comment this line as the following?
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 886ec2f..a9e3779 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
buffer_head *bh)
J_ASSERT_JH(jh, jh->b_transaction == NULL);
J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
- J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
+ /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
J_ASSERT_BH(bh, buffer_jbd(bh));
J_ASSERT_BH(bh, jh2bh(jh) == bh);
BUFFER_TRACE(bh, "remove journal_head");
Really thanks,
- Zheng
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
2013-05-13 11:26 ` Zheng Liu
@ 2013-05-13 12:07 ` Dmitry Monakhov
2013-05-13 12:54 ` Eunbong Song
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2013-05-13 12:07 UTC (permalink / raw)
To: Zheng Liu, EUNBONG SONG
Cc: Tony Luck, Theodore Ts'o, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org
On Mon, 13 May 2013 19:26:34 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
> >
> >
> > > Hi all,
> >
> > > First of all I couldn't reproduce this regression in my sand box. So
> > > the following speculation is only my guess. I suspect that the commit
> > > (ae4647fb) isn't root cause. It just uncover a potential bug that has
> > > been there for a long time. I look at the code, and found two
> > > suspicious stuff in jbd2. The first one is in do_get_write_access().
> > > In this function we forgot to lock bh state when we check b_jlist ==
> > > BJ_Shadow. I generate a patch to fix it, and I really think it is the
> > > root cause. Further, in __journal_remove_journal_head() we check
> > > b_jlist == BJ_None. But, when this function is called, bh state won't
> > > be locked sometimes. So I suspect this is why we hit a BUG in
> > > jbd2_journal_put_journal_head(). But I don't have a good solution to
> > > fix this until now because I don't know whether we need to lock bh state
> > > here, or maybe we should remove this assertation.
> > >
> > > So, generally, Tony, Eunbong, could you please try the following patch?
> > >
> > > Thanks in advance,
> > > - Zheng
> >
> >
> > Hi, I tested your patch. Unfortunately, the same problem was reproduced.
> > Thanks.
>
> Thanks for trying this patch. Could you please repost the dmesg log for
> me? I want to make sure whether the second suspicious stuff causes this
> regression or not. Further, that would be great if you could try to
> comment this line as the following?
AFAIK following assertion was triggered jh->b_transaction != NULL
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 886ec2f..a9e3779 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
> buffer_head *bh)
> J_ASSERT_JH(jh, jh->b_transaction == NULL);
> J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
> J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
> - J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
> + /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
> J_ASSERT_BH(bh, buffer_jbd(bh));
> J_ASSERT_BH(bh, jh2bh(jh) == bh);
> BUFFER_TRACE(bh, "remove journal_head");
>
> Really thanks,
> - Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
2013-05-13 12:07 ` Dmitry Monakhov
@ 2013-05-13 12:54 ` Eunbong Song
0 siblings, 0 replies; 6+ messages in thread
From: Eunbong Song @ 2013-05-13 12:54 UTC (permalink / raw)
To: Dmitry Monakhov
Cc: Zheng Liu, EUNBONG SONG, Tony Luck, Theodore Ts'o,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Hi, I just wonder. Is there no problem with endianess.
I mean usually bit field is defined with __BIG_ENDIAN_BITFIELD or
__LITTLE_ENDIAN_BITFIELD. But b_jlist and b_modfied is defined with no
pad.
It seems to be good but i just want to make sure.
Thanks.
2013/5/13 Dmitry Monakhov <dmonakhov@openvz.org>:
> On Mon, 13 May 2013 19:26:34 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
>> On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
>> >
>> >
>> > > Hi all,
>> >
>> > > First of all I couldn't reproduce this regression in my sand box. So
>> > > the following speculation is only my guess. I suspect that the commit
>> > > (ae4647fb) isn't root cause. It just uncover a potential bug that has
>> > > been there for a long time. I look at the code, and found two
>> > > suspicious stuff in jbd2. The first one is in do_get_write_access().
>> > > In this function we forgot to lock bh state when we check b_jlist ==
>> > > BJ_Shadow. I generate a patch to fix it, and I really think it is the
>> > > root cause. Further, in __journal_remove_journal_head() we check
>> > > b_jlist == BJ_None. But, when this function is called, bh state won't
>> > > be locked sometimes. So I suspect this is why we hit a BUG in
>> > > jbd2_journal_put_journal_head(). But I don't have a good solution to
>> > > fix this until now because I don't know whether we need to lock bh state
>> > > here, or maybe we should remove this assertation.
>> > >
>> > > So, generally, Tony, Eunbong, could you please try the following patch?
>> > >
>> > > Thanks in advance,
>> > > - Zheng
>> >
>> >
>> > Hi, I tested your patch. Unfortunately, the same problem was reproduced.
>> > Thanks.
>>
>> Thanks for trying this patch. Could you please repost the dmesg log for
>> me? I want to make sure whether the second suspicious stuff causes this
>> regression or not. Further, that would be great if you could try to
>> comment this line as the following?
> AFAIK following assertion was triggered jh->b_transaction != NULL
>>
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 886ec2f..a9e3779 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
>> buffer_head *bh)
>> J_ASSERT_JH(jh, jh->b_transaction == NULL);
>> J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
>> J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
>> - J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
>> + /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
>> J_ASSERT_BH(bh, buffer_jbd(bh));
>> J_ASSERT_BH(bh, jh2bh(jh) == bh);
>> BUFFER_TRACE(bh, "remove journal_head");
>>
>> Really thanks,
>> - Zheng
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
@ 2013-05-13 2:21 EUNBONG SONG
2013-05-13 3:11 ` Tony Luck
0 siblings, 1 reply; 6+ messages in thread
From: EUNBONG SONG @ 2013-05-13 2:21 UTC (permalink / raw)
To: Tony Luck, Dmitry Monakhov
Cc: Theodore Ts'o, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org
> CONFIG_IA64_PAGE_SIZE_64KB=y
> fsblock size is whatever is the default for SLES11SP2 on ia64 - which
> tool will tell me?
> My git bisect finally competed and points the a finger at:
> bisect> git bisect good
> ae4647fb7654676fc44a97e86eb35f9f06b99f66 is first bad commit
> commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
> Author: Jan Kara
> Date: Fri Apr 12 00:03:42 2013 -0400
> jbd2: reduce journal_head size
> Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
> to be happening) and change b_modified and b_jlist to bitfields thus
> saving 8 bytes in the structure.
> Signed-off-by: Jan Kara
> Signed-off-by: "Theodore Ts'o"
> Reviewed-by: Zheng Liu
> :040000 040000 c39ece4341894b3daf84764ba425a87ffb90fe50
> d4e8d9185c2a1b740c235ca8ed05d496a442fce3 M include
Hi, my git bisect result is same yours. And i reported that to community yesterday.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+
2013-05-13 2:21 EUNBONG SONG
@ 2013-05-13 3:11 ` Tony Luck
0 siblings, 0 replies; 6+ messages in thread
From: Tony Luck @ 2013-05-13 3:11 UTC (permalink / raw)
To: eunb.song
Cc: Dmitry Monakhov, Theodore Ts'o, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org
On Sun, May 12, 2013 at 7:21 PM, EUNBONG SONG <eunb.song@samsung.com> wrote:
> Hi, my git bisect result is same yours. And i reported that to community yesterday.
Ah. Good to have some confirmation (I was never sure how long to keep
running before deciding that a test was "good". My slowest "bad" test took
about 2.5 hours. I mostly let the tests run for >6 hours before deciding.
I just confirmed that 3.10-rc1 still fails (30 minutes). Now running a test
on 3.10-rc1 with just this commit reverted. Only been going for about
15 minutes, so no useful information yet.
My best guess as to why this commit causes problems is that there are places
where updates to individual fields in this structure used to be independent
because they were to whole words. Now we have bitfileds there are races
between access to different fields in the same word.
-Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-13 12:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13 9:53 Re: Re: EXT4 panic at jbd2_journal_put_journal_head() in 3.9+ EUNBONG SONG
2013-05-13 11:26 ` Zheng Liu
2013-05-13 12:07 ` Dmitry Monakhov
2013-05-13 12:54 ` Eunbong Song
-- strict thread matches above, loose matches on Subject: below --
2013-05-13 2:21 EUNBONG SONG
2013-05-13 3:11 ` Tony Luck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).