* Is there some bug in ext3 in 2.4.25?
@ 2004-03-04 6:50 Daniel Fenert
2004-03-04 7:03 ` Daniel Fenert
2004-03-05 14:06 ` Marcelo Tosatti
0 siblings, 2 replies; 9+ messages in thread
From: Daniel Fenert @ 2004-03-04 6:50 UTC (permalink / raw)
To: linux-kernel
Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
lazy kernel: Assertion failure in __journal_drop_transaction() at
checkpoint.c:587: "transaction->t_ilist == NULL"
Networking still works, I've tried to login, but no luck here.
I've got one ssh console opened, and tried to reboot, but nothing happend, it
looks like it lost connection with hda :(
Where should I look for reason?
Machine as faaar away, and it's second or third time it hangs mysteriously,
the only difference is that this time I've got some console output.
--
Daniel Fenert --==> daniel@fenert.net <==--
==-P o w e r e d--b y--S l a c k w a r e-=-ICQ #37739641-==
=======- http://daniel.fenert.net/ -=======< +48604628083 >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-04 6:50 Is there some bug in ext3 in 2.4.25? Daniel Fenert
@ 2004-03-04 7:03 ` Daniel Fenert
2004-03-05 14:06 ` Marcelo Tosatti
1 sibling, 0 replies; 9+ messages in thread
From: Daniel Fenert @ 2004-03-04 7:03 UTC (permalink / raw)
To: linux-kernel
W dniu Thu, Mar 04, 2004 at 07:50:38AM +0100, Daniel Fenert wystukał(a):
>Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
>lazy kernel: Assertion failure in __journal_drop_transaction() at
>checkpoint.c:587: "transaction->t_ilist == NULL"
One more thing - it has happened when /var got full.
--
Daniel Fenert --==> daniel@fenert.net <==--
==-P o w e r e d--b y--S l a c k w a r e-=-ICQ #37739641-==
Absurd: przekonanie sprzeczne z Twoimi poglądami - [Ambrose Bierce]
=======- http://daniel.fenert.net/ -=======< +48604628083 >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-04 6:50 Is there some bug in ext3 in 2.4.25? Daniel Fenert
2004-03-04 7:03 ` Daniel Fenert
@ 2004-03-05 14:06 ` Marcelo Tosatti
2004-03-05 14:14 ` Michael Frank
2004-03-05 14:25 ` Stephen C. Tweedie
1 sibling, 2 replies; 9+ messages in thread
From: Marcelo Tosatti @ 2004-03-05 14:06 UTC (permalink / raw)
To: Daniel Fenert; +Cc: linux-kernel, sct, Michelle Konzack
Hi,
This sounds like memory corruption (which could be caused by a misbehaving
driver or by flaky hardware) because transaction->t_ilist is not used at
all by the kernel code. Did this box run stable with other kernels?
I found a similar report from Michelle (CCed), which can be found at:
http://marc.theaimsgroup.com/?l=linux-kernel&m=107529754608448&w=2
Searching a bit more, I found another message from Michelle with
topic "[SOLVED] Kernel-Bug (at checkpoint.c 587)"
http://lists.debian.org/debian-user-german/2004/debian-user-german-200401/msg04404.html
Unfortunately the said message is in German, which I can't understand.
Michelle, can you clarify it for me?
Stephen, Andrew, any idea how can transaction->t_ilist become not NULL?
On Thu, 4 Mar 2004, Daniel Fenert wrote:
> Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
> lazy kernel: Assertion failure in __journal_drop_transaction() at
> checkpoint.c:587: "transaction->t_ilist == NULL"
>
> Networking still works, I've tried to login, but no luck here.
> I've got one ssh console opened, and tried to reboot, but nothing happend, it
> looks like it lost connection with hda :(
> Where should I look for reason?
> Machine as faaar away, and it's second or third time it hangs mysteriously,
> the only difference is that this time I've got some console output.
>
>From daniel@fenert.net Fri Mar 5 10:48:26 2004
Date: Thu, 4 Mar 2004 08:03:29 +0100
From: Daniel Fenert <daniel@fenert.net>
To: linux-kernel@vger.kernel.org
Subject: Re: Is there some bug in ext3 in 2.4.25?
>Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
>lazy kernel: Assertion failure in __journal_drop_transaction() at
>checkpoint.c:587: "transaction->t_ilist == NULL"
One more thing - it has happened when /var got full.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-05 14:06 ` Marcelo Tosatti
@ 2004-03-05 14:14 ` Michael Frank
2004-03-05 14:26 ` Stephen C. Tweedie
2004-03-05 14:25 ` Stephen C. Tweedie
1 sibling, 1 reply; 9+ messages in thread
From: Michael Frank @ 2004-03-05 14:14 UTC (permalink / raw)
To: Marcelo Tosatti, Daniel Fenert; +Cc: linux-kernel, sct, Michelle Konzack
On Fri, 5 Mar 2004 11:06:02 -0300 (BRT), Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
> Hi,
>
> This sounds like memory corruption (which could be caused by a misbehaving
> driver or by flaky hardware) because transaction->t_ilist is not used at
> all by the kernel code. Did this box run stable with other kernels?
>
> I found a similar report from Michelle (CCed), which can be found at:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107529754608448&w=2
>
> Searching a bit more, I found another message from Michelle with
> topic "[SOLVED] Kernel-Bug (at checkpoint.c 587)"
> http://lists.debian.org/debian-user-german/2004/debian-user-german-200401/msg04404.html
>
> Unfortunately the said message is in German, which I can't understand.
> Michelle, can you clarify it for me?
> Hallo Leute,
> Auch wenn ich von der kernel-list@vger.kernel.org keine Antworterhalten habe, handelt es sich definitiv um einen echten Kernel-
> Bug in 2.4.22 der in 2.4.24 offensichtlich nicht mehr vorhandenlist.
Although I have nt received a reply from LKML, it is definitively
a real kernel bug in 2.4.22 which has been fixed in 2.4.24.
Ein weiterer Fehler trat mehrfach in 'exit.c' auf, der ebenfals
nach der Installation von Linux 2.4.24 verschwunden war.
Further bug occuring several times in 'exit.c' has also vanished
after installation of 2.4.24.
>
> Stephen, Andrew, any idea how can transaction->t_ilist become not NULL?
>
>
> On Thu, 4 Mar 2004, Daniel Fenert wrote:
>
>> Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
>> lazy kernel: Assertion failure in __journal_drop_transaction() at
>> checkpoint.c:587: "transaction->t_ilist == NULL"
>>
>> Networking still works, I've tried to login, but no luck here.
>> I've got one ssh console opened, and tried to reboot, but nothing happend, it
>> looks like it lost connection with hda :(
>> Where should I look for reason?
>> Machine as faaar away, and it's second or third time it hangs mysteriously,
>> the only difference is that this time I've got some console output.
>>
>
>> From daniel@fenert.net Fri Mar 5 10:48:26 2004
> Date: Thu, 4 Mar 2004 08:03:29 +0100
> From: Daniel Fenert <daniel@fenert.net>
> To: linux-kernel@vger.kernel.org
> Subject: Re: Is there some bug in ext3 in 2.4.25?
>
>> Message from syslogd@lazy at Thu Mar 4 08:31:58 2004 ...
>> lazy kernel: Assertion failure in __journal_drop_transaction() at
>> checkpoint.c:587: "transaction->t_ilist == NULL"
>
> One more thing - it has happened when /var got full.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-05 14:06 ` Marcelo Tosatti
2004-03-05 14:14 ` Michael Frank
@ 2004-03-05 14:25 ` Stephen C. Tweedie
2004-03-08 13:44 ` Daniel Fenert
2004-04-02 10:20 ` Daniel Fenert
1 sibling, 2 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 2004-03-05 14:25 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Daniel Fenert, linux-kernel, Michelle Konzack, Stephen Tweedie
Hi,
On Fri, 2004-03-05 at 14:06, Marcelo Tosatti wrote:
> This sounds like memory corruption (which could be caused by a misbehaving
> driver or by flaky hardware) because transaction->t_ilist is not used at
> all by the kernel code. Did this box run stable with other kernels?
Sounds like bad memory to me. The only other report of this I've seen
was at
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=115935
and that machine didn't pass memtest86.
> Stephen, Andrew, any idea how can transaction->t_ilist become not NULL?
Bad hardware is about the only way I can think of. If it was a random
kernel memory scribble, you'd expect it to show up in other places too:
the transaction struct is a very very long-lived struct, you wouldn't
expect it to be the only place to show up slab corruptions.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-05 14:14 ` Michael Frank
@ 2004-03-05 14:26 ` Stephen C. Tweedie
0 siblings, 0 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 2004-03-05 14:26 UTC (permalink / raw)
To: Michael Frank
Cc: Marcelo Tosatti, Daniel Fenert, linux-kernel, Michelle Konzack,
Stephen Tweedie
Hi,
On Fri, 2004-03-05 at 14:14, Michael Frank wrote:
> Although I have nt received a reply from LKML, it is definitively
> a real kernel bug in 2.4.22 which has been fixed in 2.4.24.
>
> Ein weiterer Fehler trat mehrfach in 'exit.c' auf, der ebenfals
> nach der Installation von Linux 2.4.24 verschwunden war.
>
> Further bug occuring several times in 'exit.c' has also vanished
> after installation of 2.4.24.
Sounds like bad memory. It's quite impossible for a bad memory module
to show up a problem in one kernel but not in another, simply because
kernels are storing their active data in slightly different memory
locations from one release to another (or even from one compiler, or one
set of config options, to another.)
I'd definitely be running memtest86 as the next step here.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-05 14:25 ` Stephen C. Tweedie
@ 2004-03-08 13:44 ` Daniel Fenert
2004-04-02 10:20 ` Daniel Fenert
1 sibling, 0 replies; 9+ messages in thread
From: Daniel Fenert @ 2004-03-08 13:44 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: linux-kernel
W dniu Fri, Mar 05, 2004 at 02:25:13PM +0000, Stephen C. Tweedie wystukał(a):
>> This sounds like memory corruption (which could be caused by a misbehaving
>> driver or by flaky hardware) because transaction->t_ilist is not used at
>> all by the kernel code. Did this box run stable with other kernels?
>
>Sounds like bad memory to me. The only other report of this I've seen
>was at
>
>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=115935
>
>and that machine didn't pass memtest86.
I'll check this this week, BIG thanks for replies.
(the machine was stable for few years, AFAIR 3 years).
--
Daniel Fenert --==> daniel@fenert.net <==--
==-P o w e r e d--b y--S l a c k w a r e-=-ICQ #37739641-==
Who does not love wine, women, and song, remains a fool his whole life long.
=======- http://daniel.fenert.net/ -=======< +48604628083 >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-03-05 14:25 ` Stephen C. Tweedie
2004-03-08 13:44 ` Daniel Fenert
@ 2004-04-02 10:20 ` Daniel Fenert
2004-04-02 10:37 ` Stephen C. Tweedie
1 sibling, 1 reply; 9+ messages in thread
From: Daniel Fenert @ 2004-04-02 10:20 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Marcelo Tosatti, linux-kernel, Michelle Konzack
Old thread, but I've managed to test machine.
>> This sounds like memory corruption (which could be caused by a misbehaving
>> driver or by flaky hardware) because transaction->t_ilist is not used at
>> all by the kernel code. Did this box run stable with other kernels?
>
>Sounds like bad memory to me. The only other report of this I've seen
>was at
>
>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=115935
>
>and that machine didn't pass memtest86.
It passed memtest86, 6 or 7 hours, any further hints?
--
Daniel Fenert --==> daniel@fenert.net <==--
==-P o w e r e d--b y--S l a c k w a r e-=-ICQ #37739641-==
Najprościej pytać dlaczego, najtrudniej znaleźć odpowiedź --J. Szczawiński
=======- http://daniel.fenert.net/ -=======< +48604628083 >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Is there some bug in ext3 in 2.4.25?
2004-04-02 10:20 ` Daniel Fenert
@ 2004-04-02 10:37 ` Stephen C. Tweedie
0 siblings, 0 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 2004-04-02 10:37 UTC (permalink / raw)
To: Daniel Fenert
Cc: Marcelo Tosatti, linux-kernel, Michelle Konzack, Stephen Tweedie
Hi,
On Fri, 2004-04-02 at 11:20, Daniel Fenert wrote:
> >> This sounds like memory corruption (which could be caused by a misbehaving
> >> driver or by flaky hardware) because transaction->t_ilist is not used at
> >> all by the kernel code. Did this box run stable with other kernels?
> >
> >Sounds like bad memory to me. The only other report of this I've seen
> >was at
> >
> >https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=115935
> >
> >and that machine didn't pass memtest86.
>
> It passed memtest86, 6 or 7 hours, any further hints?
Well, 7 hours is often not enough for memtest86, I usually recommend 24
hours if there are signs of bad hardware. But other than that, I can't
think of anything ext3-related --- ext3 simply doesn't ever set that
flag. If it's being set, something is stomping on ext3's transaction
struct. That _could_ be the kernel, but it could be just about anything
touching memory after it's freed; or it could be bad hardware.
What modules are you using? Is there anything unusual in common between
your machine or its use and that in #115935?
Rebuilding the kernel to enable slab debugging may well be useful if
there's something stomping on transaction structs.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-04-02 10:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 6:50 Is there some bug in ext3 in 2.4.25? Daniel Fenert
2004-03-04 7:03 ` Daniel Fenert
2004-03-05 14:06 ` Marcelo Tosatti
2004-03-05 14:14 ` Michael Frank
2004-03-05 14:26 ` Stephen C. Tweedie
2004-03-05 14:25 ` Stephen C. Tweedie
2004-03-08 13:44 ` Daniel Fenert
2004-04-02 10:20 ` Daniel Fenert
2004-04-02 10:37 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox