Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
@ 2004-09-08  0:16 Richard A Nelson
  2004-09-08  9:04 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08  0:16 UTC (permalink / raw)
  To: linux-kernel

I've received a few of these already - always during *very* heavy disk
activity. After the Oops, the disk becomes strangely idle :), and a reboot
is required.

 Unable to handle kernel paging request at virtual address 6b6b6b93
  printing eip:
 c01ae727
 *pde = 00000000
 Oops: 0000 [#1]
 PREEMPT
 Modules linked in: ppp_generic slhc radeon msr ds lp binfmt_misc autofs4
	thermal fan button ac battery af_packet sch_ingress cls_u32 sch_sfq
	sch_htb ipt_MASQUERADE ip6t_multiport ipt_multiport ipt_TOS ipt_state
	ipt_TARPIT ip6t_limit ipt_limit ipt_REJECT ip6t_LOG ipt_LOG ipt_pkttype
	ipt_recent ip6table_mangle iptable_mangle ip6table_filter ip6_tables
	iptable_filter eepro100 snd_intel8x0m hw_random usbhid uhci_hcd usbcore
	parport_pc parport irtty_sir sir_dev irda crc_ccitt pcspkr yenta_socket
	pcmcia_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
	snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi
	snd_seq_device snd soundcore nls_iso8859_1 nls_cp437 vfat fat dm_mod
	joydev evdev psmouse nvram capability commoncap intel_agp agpgart tun
	e100 mii ip_nat_tftp ip_nat_irc ip_conntrack_irc ip_nat_ftp iptable_nat
	ip_conntrack_ftp ip_conntrack ip_tables md5 ipv6 proc_intf acpi
	freq_table processor microcode cpuid rtc unix
 CPU:    0
 EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
 EFLAGS: 00010202   (2.6.9-rc1-mm4)
 EIP is at __journal_clean_checkpoint_list+0xc7/0xf0
 eax: ce70e650   ebx: 6b6b6b6b   ecx: 00000000   edx: cf5aa000
 esi: cf5aa000   edi: c322f7a8   ebp: cf5aadb8   esp: cf5aad90
 ds: 007b   es: 007b   ss: 0068
 Process kjournald (pid: 1351, threadinfo=cf5aa000 task=cf588000)
 Stack: cf5aa000 c322f7a8 0000017f ce70e578 c3f2550c ce70e650 cfcbe9a8 cf5aa000
        00000000 00000000 cf5aaf58 c01abc6e 00000000 5a5a5a5a 5a5a5a5a 5a5a5a5a
        5a5a5a5a cfcbea04 cf5aa000 5a5a5a5a 5a5a5a5a 00000000 00000000 00000000
 Call Trace:
  [show_stack+122/144] show_stack+0x7a/0x90
  [show_registers+329/432] show_registers+0x149/0x1b0
  [die+221/368] die+0xdd/0x170
  [do_page_fault+565/1463] do_page_fault+0x235/0x5b7
  [error_code+45/56] error_code+0x2d/0x38
  [journal_commit_transaction+670/6480] journal_commit_transaction+0x29e/0x1950
  [kjournald+342/992] kjournald+0x156/0x3e0
  [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
 Code: 45 e0 83 c4 1c 5b 5e 5f 5d c3 e8 f5 1a 14 00 eb ee 8b 45 d8 ff 48
14 8b 55 d8 8b 42 08 a8 08 75 2b 8b 45 ec 8b 58 28 85 db 74 09 <8b> 43
28 8b 55 ec 89 42 28 8b 45 f0 8b 40 30 85 c0 89 45 ec 74
-- 
Rick Nelson
<gholam> well I'm impressed
<gholam> win98 managed to crash X from within vmware.
* gholam applauds.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08  0:16 2.6.9-rc1-mm4 kjournald oops (repeatable) Richard A Nelson
@ 2004-09-08  9:04 ` Andrew Morton
  2004-09-08  9:23   ` Stephen C. Tweedie
  2004-09-08 17:11   ` Richard A Nelson
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2004-09-08  9:04 UTC (permalink / raw)
  To: Richard A Nelson; +Cc: linux-kernel

Richard A Nelson <cowboy@debian.org> wrote:
>
>  I've received a few of these already - always during *very* heavy disk
>  activity. After the Oops, the disk becomes strangely idle :), and a reboot
>  is required.
> 
>   Unable to handle kernel paging request at virtual address 6b6b6b93
> ...
>   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI

This might have been caused by a fishy latency-reduction patch.  I today
dropped that patch so could you please test next -mm and let me know?

Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08  9:04 ` Andrew Morton
@ 2004-09-08  9:23   ` Stephen C. Tweedie
  2004-09-08 17:12     ` Richard A Nelson
  2004-09-08 17:11   ` Richard A Nelson
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen C. Tweedie @ 2004-09-08  9:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Tweedie, Richard A Nelson, linux-kernel

Hi,

On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:

> >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > ...
> >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
> 
> This might have been caused by a fishy latency-reduction patch.  I today
> dropped that patch so could you please test next -mm and let me know?

That, or preempt.  If the next -mm still breaks, time to hunt for the
preempt problem, I guess.

--Stephen



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08  9:04 ` Andrew Morton
  2004-09-08  9:23   ` Stephen C. Tweedie
@ 2004-09-08 17:11   ` Richard A Nelson
  1 sibling, 0 replies; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08 17:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Wed, 8 Sep 2004, Andrew Morton wrote:

> Richard A Nelson <cowboy@debian.org> wrote:
> >
> >  I've received a few of these already - always during *very* heavy disk
> >  activity. After the Oops, the disk becomes strangely idle :), and a reboot
> >  is required.
> >
> >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > ...
> >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
>
> This might have been caused by a fishy latency-reduction patch.  I today
> dropped that patch so could you please test next -mm and let me know?
>
> Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch

Reverted and building now, will reboot and test asap
-- 
Rick Nelson
<Addi> Alter.net seems to have replaced one of its router with a zucchini.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08  9:23   ` Stephen C. Tweedie
@ 2004-09-08 17:12     ` Richard A Nelson
  2004-09-08 23:07       ` Richard A Nelson
  0 siblings, 1 reply; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08 17:12 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-kernel

On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:

> On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
>
> > >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > > ...
> > >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
> >
> > This might have been caused by a fishy latency-reduction patch.  I today
> > dropped that patch so could you please test next -mm and let me know?
>
> That, or preempt.  If the next -mm still breaks, time to hunt for the
> preempt problem, I guess.

Ok, if it still fails (I'll have to wait until this afternoon for the
true test - dpkg breaks it everytime), I'll check out preempt.

Thanks,
-- 
Rick Nelson
<Knghtbrd> learn to love Window Maker.
<Knghtbrd> a little NeXTStep is good for the soul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08 17:12     ` Richard A Nelson
@ 2004-09-08 23:07       ` Richard A Nelson
  2004-09-08 23:16         ` Lee Revell
  0 siblings, 1 reply; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08 23:07 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-kernel

On Wed, 8 Sep 2004, Richard A Nelson wrote:

> On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
>
> > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> >
> > > >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > ...
> > > >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
> > >
> > > This might have been caused by a fishy latency-reduction patch.  I today
> > > dropped that patch so could you please test next -mm and let me know?
> >
> > That, or preempt.  If the next -mm still breaks, time to hunt for the
> > preempt problem, I guess.
>
> Ok, if it still fails (I'll have to wait until this afternoon for the
> true test - dpkg breaks it everytime), I'll check out preempt.

Well, it looks like backing out the patch was sufficient, I've made it
through the torture that is a dpkg install (70+meg).

So we needn't (at this time) look to preempt.

-- 
Rick Nelson
<LackOfKan> What are 'bots'?
<``Erik> rsg is a bot, not a human, not a human usable client, just a bot.
<``Erik> about the same as a quake bot, except irc bots are (usually)
         built to help, not shoot your ass full of holes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08 23:07       ` Richard A Nelson
@ 2004-09-08 23:16         ` Lee Revell
  2004-09-08 23:51           ` Richard A Nelson
  0 siblings, 1 reply; 12+ messages in thread
From: Lee Revell @ 2004-09-08 23:16 UTC (permalink / raw)
  To: Richard A Nelson; +Cc: Stephen C. Tweedie, Andrew Morton, linux-kernel

On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote:
> On Wed, 8 Sep 2004, Richard A Nelson wrote:
> 
> > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
> >
> > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> > >
> > > > >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > > ...
> > > > >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
> > > >
> > > > This might have been caused by a fishy latency-reduction patch.  I today
> > > > dropped that patch so could you please test next -mm and let me know?
> > >
> > > That, or preempt.  If the next -mm still breaks, time to hunt for the
> > > preempt problem, I guess.
> >
> > Ok, if it still fails (I'll have to wait until this afternoon for the
> > true test - dpkg breaks it everytime), I'll check out preempt.
> 
> Well, it looks like backing out the patch was sufficient, I've made it
> through the torture that is a dpkg install (70+meg).
> 
> So we needn't (at this time) look to preempt.

Hmm, I have been running this patch for weeks as part of the voluntary
preemption patches, and put it through every torture test I can think
of, with nary an Oops.  None of the other VP testers have reported
problems either.  Maybe this is some interaction between that patch and
something else in -mm.

Lee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08 23:16         ` Lee Revell
@ 2004-09-08 23:51           ` Richard A Nelson
  2004-09-09 21:56             ` Bongani Hlope
  0 siblings, 1 reply; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08 23:51 UTC (permalink / raw)
  To: Lee Revell; +Cc: Stephen C. Tweedie, Andrew Morton, linux-kernel

On Wed, 8 Sep 2004, Lee Revell wrote:

> On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote:
> > On Wed, 8 Sep 2004, Richard A Nelson wrote:
> >
> > > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
> > >
> > > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> > > >
> > > > > >   Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > > > ...
> > > > > >   EIP: 0060:[__journal_clean_checkpoint_list+199/240]    Not tainted VLI
> > > > >
> > > > > This might have been caused by a fishy latency-reduction patch.  I today
> > > > > dropped that patch so could you please test next -mm and let me know?
> > > >
> > > > That, or preempt.  If the next -mm still breaks, time to hunt for the
> > > > preempt problem, I guess.
> > >
> > > Ok, if it still fails (I'll have to wait until this afternoon for the
> > > true test - dpkg breaks it everytime), I'll check out preempt.
> >
> > Well, it looks like backing out the patch was sufficient, I've made it
> > through the torture that is a dpkg install (70+meg).
> >
> > So we needn't (at this time) look to preempt.
>
> Hmm, I have been running this patch for weeks as part of the voluntary
> preemption patches, and put it through every torture test I can think
> of, with nary an Oops.  None of the other VP testers have reported
> problems either.  Maybe this is some interaction between that patch and
> something else in -mm.

Interestingly, I notice Zwane had a very similiar oops, posted on the
7th: Oops in __journal_clean_checkpoint_list
He also had preempt enabled...

I've found upgrading my Debian system using dselect to be a *very* good
stress test of the filesystem...

If you have candidates, I'll try to test them - I've typically had no
problem reproducing the issue :)

-- 
Rick Nelson
 * Equivalent code is available from RSA Data Security, Inc.
 * This code has been tested against that, and is equivalent,
 * except that you don't need to include two pages of legalese
 * with every copy.
        -- public domain MD5 source

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-09 21:56             ` Bongani Hlope
@ 2004-09-09 20:20               ` Stephen C. Tweedie
  2004-09-09 20:27                 ` Lee Revell
  2004-09-09 22:39                 ` Bongani Hlope
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 2004-09-09 20:20 UTC (permalink / raw)
  To: Bongani Hlope
  Cc: Richard A Nelson, Lee Revell, Andrew Morton, linux-kernel,
	Stephen Tweedie

Hi,

On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:

> Ok it seem I'm not the only one. Ive bee trying to find this for a
> while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> mm[13] ). I was only able to capture the Oops this morning (pen and
> paper) I also have preempt enabled. This only happens on my PII though
> (Mandrake cooker updates and kernel compiles), my dual opteron has
> been running this since last night without any problems (gentoo sync
> and kernel compile), also with preempt 

The journal_clean_checkpoint_list-latency-fix.patch was added in
2.6.9rc1-mm2 and is still there in mm4, so your problem is also
consistent with a bug in that patch; could you try backing that one diff
out and seeing if it fixes it for you too?

Thanks,
 Stephen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-09 20:20               ` Stephen C. Tweedie
@ 2004-09-09 20:27                 ` Lee Revell
  2004-09-09 22:39                 ` Bongani Hlope
  1 sibling, 0 replies; 12+ messages in thread
From: Lee Revell @ 2004-09-09 20:27 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Bongani Hlope, Richard A Nelson, Andrew Morton, linux-kernel

On Thu, 2004-09-09 at 16:20, Stephen C. Tweedie wrote:
> Hi,
> 
> On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:
> 
> > Ok it seem I'm not the only one. Ive bee trying to find this for a
> > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> > mm[13] ). I was only able to capture the Oops this morning (pen and
> > paper) I also have preempt enabled. This only happens on my PII though
> > (Mandrake cooker updates and kernel compiles), my dual opteron has
> > been running this since last night without any problems (gentoo sync
> > and kernel compile), also with preempt 
> 
> The journal_clean_checkpoint_list-latency-fix.patch was added in
> 2.6.9rc1-mm2 and is still there in mm4, so your problem is also
> consistent with a bug in that patch; could you try backing that one diff
> out and seeing if it fixes it for you too?
> 

This is not in fact the same journal_clean_checkpoint latency fix that
is in the VP patches, looks like that one is just a simple lock break. 
So, disregard my previous comment, all the evidence does in fact point
to journal_clean_checkpoint_list-latency-fix.patch.

Lee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-08 23:51           ` Richard A Nelson
@ 2004-09-09 21:56             ` Bongani Hlope
  2004-09-09 20:20               ` Stephen C. Tweedie
  0 siblings, 1 reply; 12+ messages in thread
From: Bongani Hlope @ 2004-09-09 21:56 UTC (permalink / raw)
  To: Richard A Nelson
  Cc: Lee Revell, Stephen C. Tweedie, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

On Wed, 8 Sep 2004 16:51:35 -0700 (PDT)
Richard A Nelson <cowboy@debian.org> wrote:

> On Wed, 8 Sep 2004, Lee Revell wrote:

8<

> >
> > Hmm, I have been running this patch for weeks as part of the voluntary
> > preemption patches, and put it through every torture test I can think
> > of, with nary an Oops.  None of the other VP testers have reported
> > problems either.  Maybe this is some interaction between that patch and
> > something else in -mm.
> 
> Interestingly, I notice Zwane had a very similiar oops, posted on the
> 7th: Oops in __journal_clean_checkpoint_list
> He also had preempt enabled...
> 
> I've found upgrading my Debian system using dselect to be a *very* good
> stress test of the filesystem...
> 
> If you have candidates, I'll try to test them - I've typically had no
> problem reproducing the issue :)
> 

Ok it seem I'm not the only one. Ive bee trying to find this for a while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried mm[13] ). I was only able to capture the Oops this morning (pen and paper) I also have preempt enabled. This only happens on my PII though (Mandrake cooker updates and kernel compiles), my dual opteron has been running this since last night without any problems (gentoo sync and kernel compile), also with preempt 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
  2004-09-09 20:20               ` Stephen C. Tweedie
  2004-09-09 20:27                 ` Lee Revell
@ 2004-09-09 22:39                 ` Bongani Hlope
  1 sibling, 0 replies; 12+ messages in thread
From: Bongani Hlope @ 2004-09-09 22:39 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Richard A Nelson, Lee Revell, Andrew Morton, linux-kernel,
	Stephen Tweedie

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

On 09 Sep 2004 21:20:40 +0100
"Stephen C. Tweedie" <sct@redhat.com> wrote:

> Hi,
> 
> On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:
> 
> > Ok it seem I'm not the only one. Ive bee trying to find this for a
> > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> > mm[13] ). I was only able to capture the Oops this morning (pen and
> > paper) I also have preempt enabled. This only happens on my PII though
> > (Mandrake cooker updates and kernel compiles), my dual opteron has
> > been running this since last night without any problems (gentoo sync
> > and kernel compile), also with preempt 
> 
> The journal_clean_checkpoint_list-latency-fix.patch was added in
> 2.6.9rc1-mm2 and is still there in mm4, so your problem is also
> consistent with a bug in that patch; could you try backing that one diff
> out and seeing if it fixes it for you too?
> 
> Thanks,
>  Stephen
> 

Busy compiling, I'll let you know how it goes.
Thanx

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-09-09 20:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-08  0:16 2.6.9-rc1-mm4 kjournald oops (repeatable) Richard A Nelson
2004-09-08  9:04 ` Andrew Morton
2004-09-08  9:23   ` Stephen C. Tweedie
2004-09-08 17:12     ` Richard A Nelson
2004-09-08 23:07       ` Richard A Nelson
2004-09-08 23:16         ` Lee Revell
2004-09-08 23:51           ` Richard A Nelson
2004-09-09 21:56             ` Bongani Hlope
2004-09-09 20:20               ` Stephen C. Tweedie
2004-09-09 20:27                 ` Lee Revell
2004-09-09 22:39                 ` Bongani Hlope
2004-09-08 17:11   ` Richard A Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox