* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)
@ 2004-09-08 0:16 Richard A Nelson
2004-09-08 9:04 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Richard A Nelson @ 2004-09-08 0:16 UTC (permalink / raw)
To: linux-kernel
I've received a few of these already - always during *very* heavy disk
activity. After the Oops, the disk becomes strangely idle :), and a reboot
is required.
Unable to handle kernel paging request at virtual address 6b6b6b93
printing eip:
c01ae727
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: ppp_generic slhc radeon msr ds lp binfmt_misc autofs4
thermal fan button ac battery af_packet sch_ingress cls_u32 sch_sfq
sch_htb ipt_MASQUERADE ip6t_multiport ipt_multiport ipt_TOS ipt_state
ipt_TARPIT ip6t_limit ipt_limit ipt_REJECT ip6t_LOG ipt_LOG ipt_pkttype
ipt_recent ip6table_mangle iptable_mangle ip6table_filter ip6_tables
iptable_filter eepro100 snd_intel8x0m hw_random usbhid uhci_hcd usbcore
parport_pc parport irtty_sir sir_dev irda crc_ccitt pcspkr yenta_socket
pcmcia_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi
snd_seq_device snd soundcore nls_iso8859_1 nls_cp437 vfat fat dm_mod
joydev evdev psmouse nvram capability commoncap intel_agp agpgart tun
e100 mii ip_nat_tftp ip_nat_irc ip_conntrack_irc ip_nat_ftp iptable_nat
ip_conntrack_ftp ip_conntrack ip_tables md5 ipv6 proc_intf acpi
freq_table processor microcode cpuid rtc unix
CPU: 0
EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
EFLAGS: 00010202 (2.6.9-rc1-mm4)
EIP is at __journal_clean_checkpoint_list+0xc7/0xf0
eax: ce70e650 ebx: 6b6b6b6b ecx: 00000000 edx: cf5aa000
esi: cf5aa000 edi: c322f7a8 ebp: cf5aadb8 esp: cf5aad90
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1351, threadinfo=cf5aa000 task=cf588000)
Stack: cf5aa000 c322f7a8 0000017f ce70e578 c3f2550c ce70e650 cfcbe9a8 cf5aa000
00000000 00000000 cf5aaf58 c01abc6e 00000000 5a5a5a5a 5a5a5a5a 5a5a5a5a
5a5a5a5a cfcbea04 cf5aa000 5a5a5a5a 5a5a5a5a 00000000 00000000 00000000
Call Trace:
[show_stack+122/144] show_stack+0x7a/0x90
[show_registers+329/432] show_registers+0x149/0x1b0
[die+221/368] die+0xdd/0x170
[do_page_fault+565/1463] do_page_fault+0x235/0x5b7
[error_code+45/56] error_code+0x2d/0x38
[journal_commit_transaction+670/6480] journal_commit_transaction+0x29e/0x1950
[kjournald+342/992] kjournald+0x156/0x3e0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Code: 45 e0 83 c4 1c 5b 5e 5f 5d c3 e8 f5 1a 14 00 eb ee 8b 45 d8 ff 48
14 8b 55 d8 8b 42 08 a8 08 75 2b 8b 45 ec 8b 58 28 85 db 74 09 <8b> 43
28 8b 55 ec 89 42 28 8b 45 f0 8b 40 30 85 c0 89 45 ec 74
--
Rick Nelson
<gholam> well I'm impressed
<gholam> win98 managed to crash X from within vmware.
* gholam applauds.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 0:16 2.6.9-rc1-mm4 kjournald oops (repeatable) Richard A Nelson @ 2004-09-08 9:04 ` Andrew Morton 2004-09-08 9:23 ` Stephen C. Tweedie 2004-09-08 17:11 ` Richard A Nelson 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2004-09-08 9:04 UTC (permalink / raw) To: Richard A Nelson; +Cc: linux-kernel Richard A Nelson <cowboy@debian.org> wrote: > > I've received a few of these already - always during *very* heavy disk > activity. After the Oops, the disk becomes strangely idle :), and a reboot > is required. > > Unable to handle kernel paging request at virtual address 6b6b6b93 > ... > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI This might have been caused by a fishy latency-reduction patch. I today dropped that patch so could you please test next -mm and let me know? Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 9:04 ` Andrew Morton @ 2004-09-08 9:23 ` Stephen C. Tweedie 2004-09-08 17:12 ` Richard A Nelson 2004-09-08 17:11 ` Richard A Nelson 1 sibling, 1 reply; 12+ messages in thread From: Stephen C. Tweedie @ 2004-09-08 9:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen Tweedie, Richard A Nelson, linux-kernel Hi, On Wed, 2004-09-08 at 10:04, Andrew Morton wrote: > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > ... > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > This might have been caused by a fishy latency-reduction patch. I today > dropped that patch so could you please test next -mm and let me know? That, or preempt. If the next -mm still breaks, time to hunt for the preempt problem, I guess. --Stephen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 9:23 ` Stephen C. Tweedie @ 2004-09-08 17:12 ` Richard A Nelson 2004-09-08 23:07 ` Richard A Nelson 0 siblings, 1 reply; 12+ messages in thread From: Richard A Nelson @ 2004-09-08 17:12 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-kernel On Wed, 8 Sep 2004, Stephen C. Tweedie wrote: > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote: > > > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > > ... > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > > > This might have been caused by a fishy latency-reduction patch. I today > > dropped that patch so could you please test next -mm and let me know? > > That, or preempt. If the next -mm still breaks, time to hunt for the > preempt problem, I guess. Ok, if it still fails (I'll have to wait until this afternoon for the true test - dpkg breaks it everytime), I'll check out preempt. Thanks, -- Rick Nelson <Knghtbrd> learn to love Window Maker. <Knghtbrd> a little NeXTStep is good for the soul. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 17:12 ` Richard A Nelson @ 2004-09-08 23:07 ` Richard A Nelson 2004-09-08 23:16 ` Lee Revell 0 siblings, 1 reply; 12+ messages in thread From: Richard A Nelson @ 2004-09-08 23:07 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-kernel On Wed, 8 Sep 2004, Richard A Nelson wrote: > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote: > > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote: > > > > > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > > > ... > > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > > > > > This might have been caused by a fishy latency-reduction patch. I today > > > dropped that patch so could you please test next -mm and let me know? > > > > That, or preempt. If the next -mm still breaks, time to hunt for the > > preempt problem, I guess. > > Ok, if it still fails (I'll have to wait until this afternoon for the > true test - dpkg breaks it everytime), I'll check out preempt. Well, it looks like backing out the patch was sufficient, I've made it through the torture that is a dpkg install (70+meg). So we needn't (at this time) look to preempt. -- Rick Nelson <LackOfKan> What are 'bots'? <``Erik> rsg is a bot, not a human, not a human usable client, just a bot. <``Erik> about the same as a quake bot, except irc bots are (usually) built to help, not shoot your ass full of holes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 23:07 ` Richard A Nelson @ 2004-09-08 23:16 ` Lee Revell 2004-09-08 23:51 ` Richard A Nelson 0 siblings, 1 reply; 12+ messages in thread From: Lee Revell @ 2004-09-08 23:16 UTC (permalink / raw) To: Richard A Nelson; +Cc: Stephen C. Tweedie, Andrew Morton, linux-kernel On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote: > On Wed, 8 Sep 2004, Richard A Nelson wrote: > > > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote: > > > > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote: > > > > > > > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > > > > ... > > > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > > > > > > > This might have been caused by a fishy latency-reduction patch. I today > > > > dropped that patch so could you please test next -mm and let me know? > > > > > > That, or preempt. If the next -mm still breaks, time to hunt for the > > > preempt problem, I guess. > > > > Ok, if it still fails (I'll have to wait until this afternoon for the > > true test - dpkg breaks it everytime), I'll check out preempt. > > Well, it looks like backing out the patch was sufficient, I've made it > through the torture that is a dpkg install (70+meg). > > So we needn't (at this time) look to preempt. Hmm, I have been running this patch for weeks as part of the voluntary preemption patches, and put it through every torture test I can think of, with nary an Oops. None of the other VP testers have reported problems either. Maybe this is some interaction between that patch and something else in -mm. Lee ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 23:16 ` Lee Revell @ 2004-09-08 23:51 ` Richard A Nelson 2004-09-09 21:56 ` Bongani Hlope 0 siblings, 1 reply; 12+ messages in thread From: Richard A Nelson @ 2004-09-08 23:51 UTC (permalink / raw) To: Lee Revell; +Cc: Stephen C. Tweedie, Andrew Morton, linux-kernel On Wed, 8 Sep 2004, Lee Revell wrote: > On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote: > > On Wed, 8 Sep 2004, Richard A Nelson wrote: > > > > > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote: > > > > > > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote: > > > > > > > > > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > > > > > ... > > > > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > > > > > > > > > This might have been caused by a fishy latency-reduction patch. I today > > > > > dropped that patch so could you please test next -mm and let me know? > > > > > > > > That, or preempt. If the next -mm still breaks, time to hunt for the > > > > preempt problem, I guess. > > > > > > Ok, if it still fails (I'll have to wait until this afternoon for the > > > true test - dpkg breaks it everytime), I'll check out preempt. > > > > Well, it looks like backing out the patch was sufficient, I've made it > > through the torture that is a dpkg install (70+meg). > > > > So we needn't (at this time) look to preempt. > > Hmm, I have been running this patch for weeks as part of the voluntary > preemption patches, and put it through every torture test I can think > of, with nary an Oops. None of the other VP testers have reported > problems either. Maybe this is some interaction between that patch and > something else in -mm. Interestingly, I notice Zwane had a very similiar oops, posted on the 7th: Oops in __journal_clean_checkpoint_list He also had preempt enabled... I've found upgrading my Debian system using dselect to be a *very* good stress test of the filesystem... If you have candidates, I'll try to test them - I've typically had no problem reproducing the issue :) -- Rick Nelson * Equivalent code is available from RSA Data Security, Inc. * This code has been tested against that, and is equivalent, * except that you don't need to include two pages of legalese * with every copy. -- public domain MD5 source ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 23:51 ` Richard A Nelson @ 2004-09-09 21:56 ` Bongani Hlope 2004-09-09 20:20 ` Stephen C. Tweedie 0 siblings, 1 reply; 12+ messages in thread From: Bongani Hlope @ 2004-09-09 21:56 UTC (permalink / raw) To: Richard A Nelson Cc: Lee Revell, Stephen C. Tweedie, Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1290 bytes --] On Wed, 8 Sep 2004 16:51:35 -0700 (PDT) Richard A Nelson <cowboy@debian.org> wrote: > On Wed, 8 Sep 2004, Lee Revell wrote: 8< > > > > Hmm, I have been running this patch for weeks as part of the voluntary > > preemption patches, and put it through every torture test I can think > > of, with nary an Oops. None of the other VP testers have reported > > problems either. Maybe this is some interaction between that patch and > > something else in -mm. > > Interestingly, I notice Zwane had a very similiar oops, posted on the > 7th: Oops in __journal_clean_checkpoint_list > He also had preempt enabled... > > I've found upgrading my Debian system using dselect to be a *very* good > stress test of the filesystem... > > If you have candidates, I'll try to test them - I've typically had no > problem reproducing the issue :) > Ok it seem I'm not the only one. Ive bee trying to find this for a while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried mm[13] ). I was only able to capture the Oops this morning (pen and paper) I also have preempt enabled. This only happens on my PII though (Mandrake cooker updates and kernel compiles), my dual opteron has been running this since last night without any problems (gentoo sync and kernel compile), also with preempt [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-09 21:56 ` Bongani Hlope @ 2004-09-09 20:20 ` Stephen C. Tweedie 2004-09-09 20:27 ` Lee Revell 2004-09-09 22:39 ` Bongani Hlope 0 siblings, 2 replies; 12+ messages in thread From: Stephen C. Tweedie @ 2004-09-09 20:20 UTC (permalink / raw) To: Bongani Hlope Cc: Richard A Nelson, Lee Revell, Andrew Morton, linux-kernel, Stephen Tweedie Hi, On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote: > Ok it seem I'm not the only one. Ive bee trying to find this for a > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried > mm[13] ). I was only able to capture the Oops this morning (pen and > paper) I also have preempt enabled. This only happens on my PII though > (Mandrake cooker updates and kernel compiles), my dual opteron has > been running this since last night without any problems (gentoo sync > and kernel compile), also with preempt The journal_clean_checkpoint_list-latency-fix.patch was added in 2.6.9rc1-mm2 and is still there in mm4, so your problem is also consistent with a bug in that patch; could you try backing that one diff out and seeing if it fixes it for you too? Thanks, Stephen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-09 20:20 ` Stephen C. Tweedie @ 2004-09-09 20:27 ` Lee Revell 2004-09-09 22:39 ` Bongani Hlope 1 sibling, 0 replies; 12+ messages in thread From: Lee Revell @ 2004-09-09 20:27 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Bongani Hlope, Richard A Nelson, Andrew Morton, linux-kernel On Thu, 2004-09-09 at 16:20, Stephen C. Tweedie wrote: > Hi, > > On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote: > > > Ok it seem I'm not the only one. Ive bee trying to find this for a > > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried > > mm[13] ). I was only able to capture the Oops this morning (pen and > > paper) I also have preempt enabled. This only happens on my PII though > > (Mandrake cooker updates and kernel compiles), my dual opteron has > > been running this since last night without any problems (gentoo sync > > and kernel compile), also with preempt > > The journal_clean_checkpoint_list-latency-fix.patch was added in > 2.6.9rc1-mm2 and is still there in mm4, so your problem is also > consistent with a bug in that patch; could you try backing that one diff > out and seeing if it fixes it for you too? > This is not in fact the same journal_clean_checkpoint latency fix that is in the VP patches, looks like that one is just a simple lock break. So, disregard my previous comment, all the evidence does in fact point to journal_clean_checkpoint_list-latency-fix.patch. Lee ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-09 20:20 ` Stephen C. Tweedie 2004-09-09 20:27 ` Lee Revell @ 2004-09-09 22:39 ` Bongani Hlope 1 sibling, 0 replies; 12+ messages in thread From: Bongani Hlope @ 2004-09-09 22:39 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Richard A Nelson, Lee Revell, Andrew Morton, linux-kernel, Stephen Tweedie [-- Attachment #1: Type: text/plain, Size: 957 bytes --] On 09 Sep 2004 21:20:40 +0100 "Stephen C. Tweedie" <sct@redhat.com> wrote: > Hi, > > On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote: > > > Ok it seem I'm not the only one. Ive bee trying to find this for a > > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried > > mm[13] ). I was only able to capture the Oops this morning (pen and > > paper) I also have preempt enabled. This only happens on my PII though > > (Mandrake cooker updates and kernel compiles), my dual opteron has > > been running this since last night without any problems (gentoo sync > > and kernel compile), also with preempt > > The journal_clean_checkpoint_list-latency-fix.patch was added in > 2.6.9rc1-mm2 and is still there in mm4, so your problem is also > consistent with a bug in that patch; could you try backing that one diff > out and seeing if it fixes it for you too? > > Thanks, > Stephen > Busy compiling, I'll let you know how it goes. Thanx [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.9-rc1-mm4 kjournald oops (repeatable) 2004-09-08 9:04 ` Andrew Morton 2004-09-08 9:23 ` Stephen C. Tweedie @ 2004-09-08 17:11 ` Richard A Nelson 1 sibling, 0 replies; 12+ messages in thread From: Richard A Nelson @ 2004-09-08 17:11 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 8 Sep 2004, Andrew Morton wrote: > Richard A Nelson <cowboy@debian.org> wrote: > > > > I've received a few of these already - always during *very* heavy disk > > activity. After the Oops, the disk becomes strangely idle :), and a reboot > > is required. > > > > Unable to handle kernel paging request at virtual address 6b6b6b93 > > ... > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI > > This might have been caused by a fishy latency-reduction patch. I today > dropped that patch so could you please test next -mm and let me know? > > Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch Reverted and building now, will reboot and test asap -- Rick Nelson <Addi> Alter.net seems to have replaced one of its router with a zucchini. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-09-09 20:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-08 0:16 2.6.9-rc1-mm4 kjournald oops (repeatable) Richard A Nelson 2004-09-08 9:04 ` Andrew Morton 2004-09-08 9:23 ` Stephen C. Tweedie 2004-09-08 17:12 ` Richard A Nelson 2004-09-08 23:07 ` Richard A Nelson 2004-09-08 23:16 ` Lee Revell 2004-09-08 23:51 ` Richard A Nelson 2004-09-09 21:56 ` Bongani Hlope 2004-09-09 20:20 ` Stephen C. Tweedie 2004-09-09 20:27 ` Lee Revell 2004-09-09 22:39 ` Bongani Hlope 2004-09-08 17:11 ` Richard A Nelson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox