* kernel panic on 2.6.24/iTCO_wdt not rebooting machine @ 2008-02-01 15:12 Denys Fedoryshchenko 2008-02-01 17:11 ` Len Brown 0 siblings, 1 reply; 7+ messages in thread From: Denys Fedoryshchenko @ 2008-02-01 15:12 UTC (permalink / raw) To: linux-kernel; +Cc: wim, lenb Hi I sent already report to netdev, but most interesting question i have, that machine is not rebooted (it was set over sysctl value to kernel.panic) and watchdog didnt reboot it too. I set: kernel.panic = 10 kernel.panic_on_oops = 10 watchdog iTCO_wdt + watchdog from busybox, and still machine didn't came back online from panic! But after pressing reset button by guy on location (it is very far in mountains, roads is blocked by snow now, there is no keyboard/ screen even to check what's happening). After testing i notice that iTCO_wdt not working on this motherboard. in dmesg Feb 1 19:34:17 10.184.184.1 kernel: [ 58.112496] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.02 (26-Jul-2007) Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113114] iTCO_wdt: Found a ICH9R TCO device (Version=2, TCOBASE=0x0460) Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113654] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) 1)i launch busybox watchdog: watchdog -t 5 /dev/watchdog i can see it in processes 2)then i do killall -9 watchdog i can see in dmesg Feb 2 00:55:23 10.184.184.1 kernel: [ 6400.419418] iTCO_wdt: Unexpected close, not stopping watchdog! Machine is not rebooting. It is not rebooting also on panic (over sysctl value). Motherboard: Intel DP35DP Here is panic message, just for information. Feb 1 09:08:50 SERVER [12380.067104] BUG: unable to handle kernel NULL pointer dereference Feb 1 09:08:50 SERVER at virtual address 00000008 Feb 1 09:08:50 SERVER [12380.067140] printing eip: c01f10ed Feb 1 09:08:50 SERVER *pde = 00000000 Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067162] Oops: 0000 [#1] Feb 1 09:08:50 SERVER SMP Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067181] Modules linked in: Feb 1 09:08:50 SERVER netconsole Feb 1 09:08:50 SERVER configfs Feb 1 09:08:50 SERVER iTCO_wdt Feb 1 09:08:50 SERVER nf_nat_pptp Feb 1 09:08:50 SERVER nf_conntrack_pptp Feb 1 09:08:50 SERVER nf_conntrack_proto_gre Feb 1 09:08:50 SERVER nf_nat_proto_gre Feb 1 09:08:50 SERVER sch_esfq Feb 1 09:08:50 SERVER xt_tcpudp Feb 1 09:08:50 SERVER ipt_TTL Feb 1 09:08:50 SERVER ipt_ttl Feb 1 09:08:50 SERVER xt_NOTRACK Feb 1 09:08:50 SERVER iptable_raw Feb 1 09:08:50 SERVER iptable_mangle Feb 1 09:08:50 SERVER ifb Feb 1 09:08:50 SERVER e1000e Feb 1 09:08:50 SERVER em_nbyte Feb 1 09:08:50 SERVER cls_tcindex Feb 1 09:08:50 SERVER act_gact Feb 1 09:08:50 SERVER cls_rsvp Feb 1 09:08:50 SERVER sch_htb Feb 1 09:08:50 SERVER cls_fw Feb 1 09:08:50 SERVER act_mirred Feb 1 09:08:50 SERVER em_u32 Feb 1 09:08:50 SERVER sch_red Feb 1 09:08:50 SERVER sch_sfq Feb 1 09:08:50 SERVER sch_tbf Feb 1 09:08:50 SERVER sch_teql Feb 1 09:08:50 SERVER cls_basic Feb 1 09:08:50 SERVER act_police Feb 1 09:08:50 SERVER sch_gred Feb 1 09:08:50 SERVER act_pedit Feb 1 09:08:50 SERVER sch_hfsc Feb 1 09:08:50 SERVER cls_rsvp6 Feb 1 09:08:50 SERVER sch_ingress Feb 1 09:08:50 SERVER em_meta Feb 1 09:08:50 SERVER em_text Feb 1 09:08:50 SERVER act_ipt Feb 1 09:08:50 SERVER sch_dsmark Feb 1 09:08:50 SERVER sch_prio Feb 1 09:08:50 SERVER sch_netem Feb 1 09:08:50 SERVER act_simple Feb 1 09:08:50 SERVER cls_u32 Feb 1 09:08:50 SERVER em_cmp Feb 1 09:08:50 SERVER sch_cbq Feb 1 09:08:50 SERVER cls_route Feb 1 09:08:50 SERVER xt_TCPMSS Feb 1 09:08:50 SERVER iptable_nat Feb 1 09:08:50 SERVER nf_conntrack_ipv4 Feb 1 09:08:50 SERVER ipt_LOG Feb 1 09:08:50 SERVER ipt_MASQUERADE Feb 1 09:08:50 SERVER ipt_REDIRECT Feb 1 09:08:50 SERVER nf_nat Feb 1 09:08:50 SERVER nf_conntrack Feb 1 09:08:50 SERVER nfnetlink Feb 1 09:08:50 SERVER iptable_filter Feb 1 09:08:50 SERVER ip_tables Feb 1 09:08:50 SERVER x_tables Feb 1 09:08:50 SERVER 8021q Feb 1 09:08:50 SERVER tun Feb 1 09:08:50 SERVER tulip Feb 1 09:08:50 SERVER r8169 Feb 1 09:08:50 SERVER sky2 Feb 1 09:08:50 SERVER via_velocity Feb 1 09:08:50 SERVER via_rhine Feb 1 09:08:50 SERVER sis900 Feb 1 09:08:50 SERVER ne2k_pci Feb 1 09:08:50 SERVER 8390 Feb 1 09:08:50 SERVER skge Feb 1 09:08:50 SERVER tg3 Feb 1 09:08:50 SERVER 8139too Feb 1 09:08:50 SERVER e1000 Feb 1 09:08:50 SERVER e100 Feb 1 09:08:50 SERVER usb_storage Feb 1 09:08:50 SERVER mtdblock Feb 1 09:08:50 SERVER mtd_blkdevs Feb 1 09:08:50 SERVER usbhid Feb 1 09:08:50 SERVER uhci_hcd Feb 1 09:08:50 SERVER ehci_hcd Feb 1 09:08:50 SERVER ohci_hcd Feb 1 09:08:50 SERVER usbcore Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067515] Feb 1 09:08:50 SERVER [12380.067530] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 1 09:08:50 SERVER [12380.067550] EIP: 0060:[<c01f10ed>] EFLAGS: 00010086 CPU: 0 Feb 1 09:08:50 SERVER [12380.067571] EIP is at rb_erase+0x110/0x22f Feb 1 09:08:50 SERVER [12380.067589] EAX: f52bbea0 EBX: 00000000 ECX: 00000000 EDX: f52bbea0 Feb 1 09:08:50 SERVER [12380.067608] ESI: f717df50 EDI: c1fed000 EBP: c1fecf80 ESP: c037fda8 Feb 1 09:08:50 SERVER [12380.067628] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Feb 1 09:08:50 SERVER [12380.067647] Process swapper (pid: 0, ti=c037e000 task=c03533a0 task.ti=c037e000) Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067668] Stack: Feb 1 09:08:50 SERVER 00000001 Feb 1 09:08:50 SERVER c1fed000 Feb 1 09:08:50 SERVER c1fecf78 Feb 1 09:08:50 SERVER 00000002 Feb 1 09:08:50 SERVER 00000001 Feb 1 09:08:50 SERVER c0134663 Feb 1 09:08:50 SERVER c1fed000 Feb 1 09:08:50 SERVER c1fecf78 Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067714] Feb 1 09:08:50 SERVER c1fecf40 Feb 1 09:08:50 SERVER c013515b Feb 1 09:08:50 SERVER 00000000 Feb 1 09:08:50 SERVER 4f3f473e Feb 1 09:08:50 SERVER 000002d0 Feb 1 09:08:50 SERVER ffffffff Feb 1 09:08:50 SERVER 7fffffff Feb 1 09:08:50 SERVER 4f3f473e Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067760] Feb 1 09:08:50 SERVER 000002d0 Feb 1 09:08:50 SERVER 00000000 Feb 1 09:08:50 SERVER c1fec120 Feb 1 09:08:50 SERVER c037ff84 Feb 1 09:08:50 SERVER c037fe70 Feb 1 09:08:50 SERVER f76ae880 Feb 1 09:08:50 SERVER c0113963 Feb 1 09:08:50 SERVER c1ff5f78 Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067806] Call Trace: Feb 1 09:08:50 SERVER [12380.067839] [<c0134663>] Feb 1 09:08:50 SERVER __remove_hrtimer+0x5d/0x64 Feb 1 09:08:50 SERVER [12380.067861] [<c013515b>] Feb 1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a Feb 1 09:08:50 SERVER [12380.067883] [<c0113963>] Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80 Feb 1 09:08:50 SERVER [12380.067905] [<c0105838>] Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 Feb 1 09:08:50 SERVER [12380.067928] [<c02be6d7>] Feb 1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27 Feb 1 09:08:50 SERVER [12380.067949] [<c0134bc7>] Feb 1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f Feb 1 09:08:50 SERVER [12380.067970] [<c0134ca0>] Feb 1 09:08:50 SERVER hrtimer_start+0x16/0xf4 Feb 1 09:08:50 SERVER [12380.067991] [<c027ec43>] Feb 1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21 Feb 1 09:08:50 SERVER [12380.068013] [<f89f8fe6>] Feb 1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb] Feb 1 09:08:50 SERVER [12380.068036] [<c028ac4d>] Feb 1 09:08:50 SERVER ip_rcv+0x1fc/0x237 Feb 1 09:08:50 SERVER [12380.068057] [<c0135297>] Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb Feb 1 09:08:50 SERVER [12380.068078] [<c0135297>] Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb Feb 1 09:08:50 SERVER [12380.068099] [<c0136e26>] Feb 1 09:08:50 SERVER getnstimeofday+0x2b/0xb5 Feb 1 09:08:50 SERVER [12380.068118] [<c0138d70>] Feb 1 09:08:50 SERVER clockevents_program_event+0xe0/0xee Feb 1 09:08:50 SERVER [12380.068140] [<c027da0e>] Feb 1 09:08:50 SERVER __qdisc_run+0x2a/0x163 Feb 1 09:08:50 SERVER [12380.068161] [<c02722d8>] Feb 1 09:08:50 SERVER net_tx_action+0xa8/0xcc Feb 1 09:08:50 SERVER [12380.068180] [<c027ec65>] Feb 1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b Feb 1 09:08:50 SERVER [12380.068199] [<c027ec7d>] Feb 1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b Feb 1 09:08:50 SERVER [12380.068218] [<c0135007>] Feb 1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96 Feb 1 09:08:50 SERVER [12380.068241] [<c0126a82>] Feb 1 09:08:50 SERVER __do_softirq+0x5d/0xc1 Feb 1 09:08:50 SERVER [12380.068260] [<c0126b18>] Feb 1 09:08:50 SERVER do_softirq+0x32/0x36 Feb 1 09:08:50 SERVER [12380.068279] [<c0126d6a>] Feb 1 09:08:50 SERVER irq_exit+0x38/0x6b Feb 1 09:08:50 SERVER [12380.068298] [<c0113968>] Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80 Feb 1 09:08:50 SERVER [12380.068319] [<c0105838>] Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 Feb 1 09:08:50 SERVER [12380.068343] [<c0103243>] Feb 1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40 Feb 1 09:08:50 SERVER [12380.068365] [<c0103247>] Feb 1 09:08:50 SERVER mwait_idle+0x0/0xa Feb 1 09:08:50 SERVER [12380.068384] [<c010357e>] Feb 1 09:08:50 SERVER cpu_idle+0x98/0xb9 Feb 1 09:08:50 SERVER [12380.068403] [<c03848c2>] Feb 1 09:08:50 SERVER start_kernel+0x2d7/0x2df Feb 1 09:08:50 SERVER [12380.068422] [<c03840e0>] Feb 1 09:08:50 SERVER unknown_bootoption+0x0/0x195 Feb 1 09:08:50 SERVER [12380.068444] ======================= Feb 1 09:08:50 SERVER [12380.068460] Code: Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 08 Feb 1 09:08:50 SERVER 39 Feb 1 09:08:50 SERVER d9 Feb 1 09:08:50 SERVER 0f Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 04 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER a8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 75 Feb 1 09:08:50 SERVER 14 Feb 1 09:08:50 SERVER 83 Feb 1 09:08:50 SERVER c8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER ea Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER f0 Feb 1 09:08:50 SERVER 83 Feb 1 09:08:50 SERVER 26 Feb 1 09:08:50 SERVER fe Feb 1 09:08:50 SERVER e8 Feb 1 09:08:50 SERVER 1e Feb 1 09:08:50 SERVER fd Feb 1 09:08:50 SERVER ff Feb 1 09:08:50 SERVER ff Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 04 Feb 1 07:08:49 SERVER unparseable log message: "<8b> " Feb 1 09:08:50 SERVER 59 Feb 1 09:08:50 SERVER 08 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER db Feb 1 09:08:50 SERVER 74 Feb 1 09:08:50 SERVER 06 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 03 Feb 1 09:08:50 SERVER a8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 74 Feb 1 09:08:50 SERVER 15 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 41 Feb 1 09:08:50 SERVER 04 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER c0 Feb 1 09:08:50 SERVER 0f Feb 1 09:08:50 SERVER 84 Feb 1 09:08:50 SERVER c6 Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.068753] EIP: [<c01f10ed>] Feb 1 09:08:50 SERVER rb_erase+0x110/0x22f Feb 1 09:08:50 SERVER SS:ESP 0068:c037fda8 Feb 1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal exception in interrupt -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-01 15:12 kernel panic on 2.6.24/iTCO_wdt not rebooting machine Denys Fedoryshchenko @ 2008-02-01 17:11 ` Len Brown 2008-02-01 19:15 ` Denys Fedoryshchenko 0 siblings, 1 reply; 7+ messages in thread From: Len Brown @ 2008-02-01 17:11 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: linux-kernel, wim On Friday 01 February 2008 10:12, Denys Fedoryshchenko wrote: > Hi > > I sent already report to netdev, but most interesting question i have, that > machine is not rebooted (it was set over sysctl value to kernel.panic) and > watchdog didnt reboot it too. > > I set: > > kernel.panic = 10 > kernel.panic_on_oops = 10 > > watchdog iTCO_wdt + watchdog from busybox, and still machine didn't came back > online from panic! But after pressing reset button by guy on location (it is > very far in mountains, roads is blocked by snow now, there is no keyboard/ > screen even to check what's happening). > > After testing i notice that iTCO_wdt not working on this motherboard. > > in dmesg > Feb 1 19:34:17 10.184.184.1 kernel: [ 58.112496] iTCO_wdt: Intel TCO > WatchDog Timer Driver v1.02 (26-Jul-2007) > Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113114] iTCO_wdt: Found a ICH9R > TCO device (Version=2, TCOBASE=0x0460) > Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113654] iTCO_wdt: initialized. > heartbeat=30 sec (nowayout=0) > > 1)i launch busybox watchdog: > watchdog -t 5 /dev/watchdog > i can see it in processes > > 2)then i do > killall -9 watchdog > i can see in dmesg > Feb 2 00:55:23 10.184.184.1 kernel: [ 6400.419418] iTCO_wdt: Unexpected > close, not stopping watchdog! > > Machine is not rebooting. It is not rebooting also on panic (over sysctl > value). Motherboard: Intel DP35DP > > Here is panic message, just for information. > ... > Feb 1 09:08:50 SERVER [12380.067806] Call Trace: > Feb 1 09:08:50 SERVER [12380.067839] [<c0134663>] > Feb 1 09:08:50 SERVER __remove_hrtimer+0x5d/0x64 > Feb 1 09:08:50 SERVER [12380.067861] [<c013515b>] > Feb 1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a > Feb 1 09:08:50 SERVER [12380.067883] [<c0113963>] > Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80 > Feb 1 09:08:50 SERVER [12380.067905] [<c0105838>] > Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 > Feb 1 09:08:50 SERVER [12380.067928] [<c02be6d7>] > Feb 1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27 > Feb 1 09:08:50 SERVER [12380.067949] [<c0134bc7>] > Feb 1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f > Feb 1 09:08:50 SERVER [12380.067970] [<c0134ca0>] > Feb 1 09:08:50 SERVER hrtimer_start+0x16/0xf4 > Feb 1 09:08:50 SERVER [12380.067991] [<c027ec43>] > Feb 1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21 > Feb 1 09:08:50 SERVER [12380.068013] [<f89f8fe6>] > Feb 1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb] > Feb 1 09:08:50 SERVER [12380.068036] [<c028ac4d>] > Feb 1 09:08:50 SERVER ip_rcv+0x1fc/0x237 > Feb 1 09:08:50 SERVER [12380.068057] [<c0135297>] > Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb > Feb 1 09:08:50 SERVER [12380.068078] [<c0135297>] > Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb > Feb 1 09:08:50 SERVER [12380.068099] [<c0136e26>] > Feb 1 09:08:50 SERVER getnstimeofday+0x2b/0xb5 > Feb 1 09:08:50 SERVER [12380.068118] [<c0138d70>] > Feb 1 09:08:50 SERVER clockevents_program_event+0xe0/0xee > Feb 1 09:08:50 SERVER [12380.068140] [<c027da0e>] > Feb 1 09:08:50 SERVER __qdisc_run+0x2a/0x163 > Feb 1 09:08:50 SERVER [12380.068161] [<c02722d8>] > Feb 1 09:08:50 SERVER net_tx_action+0xa8/0xcc > Feb 1 09:08:50 SERVER [12380.068180] [<c027ec65>] > Feb 1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b > Feb 1 09:08:50 SERVER [12380.068199] [<c027ec7d>] > Feb 1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b > Feb 1 09:08:50 SERVER [12380.068218] [<c0135007>] > Feb 1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96 > Feb 1 09:08:50 SERVER [12380.068241] [<c0126a82>] > Feb 1 09:08:50 SERVER __do_softirq+0x5d/0xc1 > Feb 1 09:08:50 SERVER [12380.068260] [<c0126b18>] > Feb 1 09:08:50 SERVER do_softirq+0x32/0x36 > Feb 1 09:08:50 SERVER [12380.068279] [<c0126d6a>] > Feb 1 09:08:50 SERVER irq_exit+0x38/0x6b > Feb 1 09:08:50 SERVER [12380.068298] [<c0113968>] > Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80 > Feb 1 09:08:50 SERVER [12380.068319] [<c0105838>] > Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 > Feb 1 09:08:50 SERVER [12380.068343] [<c0103243>] > Feb 1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40 > Feb 1 09:08:50 SERVER [12380.068365] [<c0103247>] > Feb 1 09:08:50 SERVER mwait_idle+0x0/0xa > Feb 1 09:08:50 SERVER [12380.068384] [<c010357e>] > Feb 1 09:08:50 SERVER cpu_idle+0x98/0xb9 > Feb 1 09:08:50 SERVER [12380.068403] [<c03848c2>] > Feb 1 09:08:50 SERVER start_kernel+0x2d7/0x2df > Feb 1 09:08:50 SERVER [12380.068422] [<c03840e0>] > Feb 1 09:08:50 SERVER unknown_bootoption+0x0/0x195 > Feb 1 09:08:50 SERVER [12380.068444] ======================= What do you see if you build with CONFIG_HIGH_RES_TIMERS=n Does it work better if you boot with "acpi=off"? if yes, how about with just pnpacpi=off? thanks, -Len ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-01 17:11 ` Len Brown @ 2008-02-01 19:15 ` Denys Fedoryshchenko 2008-02-01 20:39 ` Len Brown 0 siblings, 1 reply; 7+ messages in thread From: Denys Fedoryshchenko @ 2008-02-01 19:15 UTC (permalink / raw) To: Len Brown; +Cc: linux-kernel, wim On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n > > Does it work better if you boot with "acpi=off"? > if yes, how about with just pnpacpi=off? > > thanks, > -Len It is not very easy to test. About bug - most probably it is related to third party ESFQ patch, i will drop it and then test more properly when i will be able to make watchdog work fine. But more important i notice - that iTCO_wdt is not working at all. I think hrtimers doesn't change anything on that. About testing, i cannot take even small risk now(and near 3-5 days) by changing kernel options, i set now maximum available set of watchdogs, cause there is noone to maintain server, area is unreachable because of snow and bad weather. Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work? Maybe just registers addresses or way how TCO watchdog activated changed on this chipset? -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-01 19:15 ` Denys Fedoryshchenko @ 2008-02-01 20:39 ` Len Brown 2008-02-02 0:44 ` Denys Fedoryshchenko 2008-02-02 4:18 ` Denys Fedoryshchenko 0 siblings, 2 replies; 7+ messages in thread From: Len Brown @ 2008-02-01 20:39 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: linux-kernel, wim On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote: > > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote > > > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n > > > > Does it work better if you boot with "acpi=off"? > > if yes, how about with just pnpacpi=off? > > > > thanks, > > -Len > > It is not very easy to test. About bug - most probably it is related to third > party ESFQ patch, i will drop it and then test more properly when i will be > able to make watchdog work fine. But more important i notice - that iTCO_wdt > is not working at all. I think hrtimers doesn't change anything on that. > About testing, i cannot take even small risk now(and near 3-5 days) by > changing kernel options, i set now maximum available set of watchdogs, cause > there is noone to maintain server, area is unreachable because of snow and > bad weather. > > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work? > Maybe just registers addresses or way how TCO watchdog activated changed on > this chipset? yes, i'm wondering if the changes in IO resource reservations in the PNPACPI layer are interfering with the native driver. unfortunately, if you boot with acpi=off or pnpacpi=off, you may run into other, unrelated, issues (or not). one way to isolate the problem is if you revert these two lines from their 2.6.24 values to their 2.6.23 values by applying this patch: --- diff --git a/include/linux/pnp.h b/include/linux/pnp.h index 2a6d62c..16b46aa 100644 --- a/include/linux/pnp.h +++ b/include/linux/pnp.h @@ -13,8 +13,8 @@ #include <linux/errno.h> #include <linux/mod_devicetable.h> -#define PNP_MAX_PORT 40 -#define PNP_MAX_MEM 12 +#define PNP_MAX_PORT 8 +#define PNP_MAX_MEM 4 #define PNP_MAX_IRQ 2 #define PNP_MAX_DMA 2 #define PNP_NAME_LEN 50 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-01 20:39 ` Len Brown @ 2008-02-02 0:44 ` Denys Fedoryshchenko 2008-02-02 4:18 ` Denys Fedoryshchenko 1 sibling, 0 replies; 7+ messages in thread From: Denys Fedoryshchenko @ 2008-02-02 0:44 UTC (permalink / raw) To: Len Brown; +Cc: linux-kernel, wim I check, watchdog still doesn't work with acpi=off, nor with pnpacpi=off I will try to check technical documents about chipset, to find any reference to watchdog registers, maybe i can see there something useful. On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote > On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote: > > > > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote > > > > > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n > > > > > > Does it work better if you boot with "acpi=off"? > > > if yes, how about with just pnpacpi=off? > > > > > > thanks, > > > -Len > > > > It is not very easy to test. About bug - most probably it is related to third > > party ESFQ patch, i will drop it and then test more properly when i will be > > able to make watchdog work fine. But more important i notice - that iTCO_wdt > > is not working at all. I think hrtimers doesn't change anything on that. > > About testing, i cannot take even small risk now(and near 3-5 days) by > > changing kernel options, i set now maximum available set of watchdogs, cause > > there is noone to maintain server, area is unreachable because of snow and > > bad weather. > > > > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work? > > Maybe just registers addresses or way how TCO watchdog activated changed on > > this chipset? > > yes, i'm wondering if the changes in IO resource reservations > in the PNPACPI layer are interfering with the native driver. > > unfortunately, if you boot with acpi=off or pnpacpi=off, you may > run into other, unrelated, issues (or not). > > one way to isolate the problem is if you revert these two lines > from their 2.6.24 values to their 2.6.23 values by applying this patch: > --- > diff --git a/include/linux/pnp.h b/include/linux/pnp.h > index 2a6d62c..16b46aa 100644 > --- a/include/linux/pnp.h > +++ b/include/linux/pnp.h > @@ -13,8 +13,8 @@ > #include <linux/errno.h> > #include <linux/mod_devicetable.h> > > -#define PNP_MAX_PORT 40 > -#define PNP_MAX_MEM 12 > +#define PNP_MAX_PORT 8 > +#define PNP_MAX_MEM 4 > #define PNP_MAX_IRQ 2 > #define PNP_MAX_DMA 2 > #define PNP_NAME_LEN 50 -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-01 20:39 ` Len Brown 2008-02-02 0:44 ` Denys Fedoryshchenko @ 2008-02-02 4:18 ` Denys Fedoryshchenko 2008-02-02 18:38 ` Wim Van Sebroeck 1 sibling, 1 reply; 7+ messages in thread From: Denys Fedoryshchenko @ 2008-02-02 4:18 UTC (permalink / raw) To: Len Brown; +Cc: linux-kernel, wim I think i found issue, but not able to understand how to fix it. I did small patch to make sure, that code able to change TCO_EN(bit 13) to 0. It cannot change it, because TCO_LOCK bit is set. I For example i did patch to see that: --- /usr/src/linux-2.6.24/drivers/watchdog/iTCO_wdt.c 2008-01-25 00:58:37.000000000 +0200 +++ /WORK/globalosii/linux-embedded/drivers/watchdog/iTCO_wdt.c 2008-02-02 05:11:46.000000000 +0200 @@ -659,8 +659,13 @@ goto out; } val32 = inl(SMI_EN); + printk(KERN_INFO PFX "TCO_EN was %04lX\n", val32); val32 &= 0xffffdfff; /* Turn off SMI clearing watchdog */ + printk(KERN_INFO PFX "TCO_EN will try to set %04lX\n", val32); outl(val32, SMI_EN); + val32 = inl(SMI_EN); + printk(KERN_INFO PFX "TCO_EN after set %04lX\n", val32); + release_region(SMI_EN, 4); /* The TCO I/O registers reside in a 32-byte range pointed to by the TCOBASE value */ and i got in dmesg [ 589.913354] iTCO_wdt: TCO_EN was 0000203B [ 589.913356] iTCO_wdt: TCO_EN will try to set 0000003B [ 589.913360] iTCO_wdt: TCO_EN after set 0000203B So this function will not work in some conditions, for example in my situation. It is a bit dangerous, because as i understand function is supposed to disable unexpected reboots during watchdog setup, so maybe must be added check for TCO_LOCK bit, or just to check if value really has been changed. Also i dont understand code: TCO1_STS for example, to clear bit's needs to write 1 to each one (it is not WRITE, it is WRITECLEAR almost all of them, except Bit 0 on TCO1_STS which is Read Only) (i read that in ICH9 datasheet). So outb(0, TCO1_STS), just will not do anything. TCO2_STS, bit 0 is responsible Intruder Detect on ICH8 and ICH9!!! Probably it is not good to reset this bit. Code: /* Clear out the (probably old) status */ outb(0, TCO1_STS); outb(3, TCO2_STS); But that all small issues, and doesn't explain why it doesn't work. I did small patch, and instead of resetting timer, i am getting current value of timer. Patch looks like this: @@ -483,6 +484,7 @@ static ssize_t iTCO_wdt_write (struct file *file, const char __user *data, size_t len, loff_t * ppos) { + unsigned int val16; /* See if we got the magic character 'V' and reload the timer */ if (len) { if (!nowayout) { @@ -503,7 +505,14 @@ } /* someone wrote to us, we should reload the timer */ - iTCO_wdt_keepalive(); + //iTCO_wdt_keepalive(); + spin_lock(&iTCO_wdt_private.io_lock); + val16 = inw(TCO_RLD); + val16 &= 0x3ff; + spin_unlock(&iTCO_wdt_private.io_lock); + + printk(KERN_INFO PFX "Remaining time %d\n", (val16 * 6) / 10); + [ 2505.979453] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.02 (26-Jul-2007) [ 2505.980073] iTCO_wdt: TCO_EN was 0000203B [ 2505.980076] iTCO_wdt: TCO_EN will try to set 0000003B [ 2505.980083] iTCO_wdt: TCO_EN after set 0000203B [ 2505.980085] iTCO_wdt: Found a ICH9R TCO device (Version=2, TCOBASE=0x0460) [ 2505.980088] iTCO_wdt: TCO1_STS was 0000 [ 2505.980090] iTCO_wdt: TCO2_STS was 0000 [ 2505.980664] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) [ 2515.908192] iTCO_wdt: Remaining time 30 [ 2516.408459] iTCO_wdt: Remaining time 29 [ 2516.908687] iTCO_wdt: Remaining time 28 [ 2517.408917] iTCO_wdt: Remaining time 28 [ 2517.909144] iTCO_wdt: Remaining time 28 [ 2518.409373] iTCO_wdt: Remaining time 27 [ 2518.909601] iTCO_wdt: Remaining time 27 [ 2519.409829] iTCO_wdt: Remaining time 26 [ 2519.910057] iTCO_wdt: Remaining time 25 [ 2520.410287] iTCO_wdt: Remaining time 25 [ 2520.910515] iTCO_wdt: Remaining time 25 [ 2521.410745] iTCO_wdt: Remaining time 24 [ 2521.910972] iTCO_wdt: Remaining time 24 [ 2522.411201] iTCO_wdt: Remaining time 23 [ 2522.911429] iTCO_wdt: Remaining time 22 [ 2523.411658] iTCO_wdt: Remaining time 22 [ 2523.911886] iTCO_wdt: Remaining time 21 [ 2524.412115] iTCO_wdt: Remaining time 21 [ 2524.912343] iTCO_wdt: Remaining time 21 [ 2525.412573] iTCO_wdt: Remaining time 20 [ 2525.912801] iTCO_wdt: Remaining time 19 [ 2526.413030] iTCO_wdt: Remaining time 19 [ 2526.913258] iTCO_wdt: Remaining time 18 [ 2527.413487] iTCO_wdt: Remaining time 18 [ 2527.913715] iTCO_wdt: Remaining time 18 [ 2528.413944] iTCO_wdt: Remaining time 17 [ 2528.914172] iTCO_wdt: Remaining time 16 [ 2529.414401] iTCO_wdt: Remaining time 16 [ 2529.914629] iTCO_wdt: Remaining time 15 [ 2530.414859] iTCO_wdt: Remaining time 15 [ 2530.915087] iTCO_wdt: Remaining time 14 [ 2531.415315] iTCO_wdt: Remaining time 14 [ 2531.915544] iTCO_wdt: Remaining time 13 [ 2532.415773] iTCO_wdt: Remaining time 13 [ 2532.916001] iTCO_wdt: Remaining time 12 [ 2533.416230] iTCO_wdt: Remaining time 12 [ 2533.916459] iTCO_wdt: Remaining time 11 [ 2534.416688] iTCO_wdt: Remaining time 10 [ 2534.916916] iTCO_wdt: Remaining time 10 [ 2535.417144] iTCO_wdt: Remaining time 10 [ 2535.917373] iTCO_wdt: Remaining time 9 [ 2536.417602] iTCO_wdt: Remaining time 9 [ 2536.917830] iTCO_wdt: Remaining time 8 [ 2537.418059] iTCO_wdt: Remaining time 7 [ 2537.918287] iTCO_wdt: Remaining time 7 [ 2538.418516] iTCO_wdt: Remaining time 7 [ 2538.918744] iTCO_wdt: Remaining time 6 [ 2539.418973] iTCO_wdt: Remaining time 6 [ 2539.919201] iTCO_wdt: Remaining time 5 [ 2540.419431] iTCO_wdt: Remaining time 4 [ 2540.919658] iTCO_wdt: Remaining time 4 [ 2541.419888] iTCO_wdt: Remaining time 4 [ 2541.920116] iTCO_wdt: Remaining time 3 [ 2542.420345] iTCO_wdt: Remaining time 3 [ 2542.920573] iTCO_wdt: Remaining time 2 [ 2543.420802] iTCO_wdt: Remaining time 1 [ 2543.921030] iTCO_wdt: Remaining time 1 [ 2544.421259] iTCO_wdt: Remaining time 0 [ 2544.921487] iTCO_wdt: Remaining time 0 [ 2545.421716] iTCO_wdt: Remaining time 2 [ 2545.921945] iTCO_wdt: Remaining time 1 [ 2546.422173] iTCO_wdt: Remaining time 1 [ 2546.922402] iTCO_wdt: Remaining time 0 [ 2547.422631] iTCO_wdt: Remaining time 2 [ 2547.922859] iTCO_wdt: Remaining time 1 [ 2548.423088] iTCO_wdt: Remaining time 1 I tried to watch register each 100ms [ 3525.608533] iTCO_wdt: Remaining ticks 3 [ 3525.709376] iTCO_wdt: Remaining ticks 3 [ 3525.810220] iTCO_wdt: Remaining ticks 3 [ 3525.911065] iTCO_wdt: Remaining ticks 3 [ 3526.011909] iTCO_wdt: Remaining ticks 2 [ 3526.112753] iTCO_wdt: Remaining ticks 2 [ 3526.213598] iTCO_wdt: Remaining ticks 2 [ 3526.314443] iTCO_wdt: Remaining ticks 2 [ 3526.415287] iTCO_wdt: Remaining ticks 2 [ 3526.516135] iTCO_wdt: Remaining ticks 2 [ 3526.616977] iTCO_wdt: Remaining ticks 1 [ 3526.717820] iTCO_wdt: Remaining ticks 1 [ 3526.818665] iTCO_wdt: Remaining ticks 1 [ 3526.919510] iTCO_wdt: Remaining ticks 1 [ 3527.020354] iTCO_wdt: Remaining ticks 1 [ 3527.121199] iTCO_wdt: Remaining ticks 4 [ 3527.222043] iTCO_wdt: Remaining ticks 4 [ 3527.322890] iTCO_wdt: Remaining ticks 4 [ 3527.423732] iTCO_wdt: Remaining ticks 4 [ 3527.524577] iTCO_wdt: Remaining ticks 4 [ 3527.625422] iTCO_wdt: Remaining ticks 4 Which means timer reaching 0... and, nothing happen! It goes again 2 and then again 0. I check even STS registers, they are still zero! Register just set back to default value 0004h. Probably someone can help me with this? Or it is hardware bug of chipset? I will try to look more docs, maybe i will be able to find whats wrong there. On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote > On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote: > > > > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote > > > > > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n > > > > > > Does it work better if you boot with "acpi=off"? > > > if yes, how about with just pnpacpi=off? > > > > > > thanks, > > > -Len > > > > It is not very easy to test. About bug - most probably it is related to third > > party ESFQ patch, i will drop it and then test more properly when i will be > > able to make watchdog work fine. But more important i notice - that iTCO_wdt > > is not working at all. I think hrtimers doesn't change anything on that. > > About testing, i cannot take even small risk now(and near 3-5 days) by > > changing kernel options, i set now maximum available set of watchdogs, cause > > there is noone to maintain server, area is unreachable because of snow and > > bad weather. > > > > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work? > > Maybe just registers addresses or way how TCO watchdog activated changed on > > this chipset? > > yes, i'm wondering if the changes in IO resource reservations > in the PNPACPI layer are interfering with the native driver. > > unfortunately, if you boot with acpi=off or pnpacpi=off, you may > run into other, unrelated, issues (or not). > > one way to isolate the problem is if you revert these two lines > from their 2.6.24 values to their 2.6.23 values by applying this patch: > --- > diff --git a/include/linux/pnp.h b/include/linux/pnp.h > index 2a6d62c..16b46aa 100644 > --- a/include/linux/pnp.h > +++ b/include/linux/pnp.h > @@ -13,8 +13,8 @@ > #include <linux/errno.h> > #include <linux/mod_devicetable.h> > > -#define PNP_MAX_PORT 40 > -#define PNP_MAX_MEM 12 > +#define PNP_MAX_PORT 8 > +#define PNP_MAX_MEM 4 > #define PNP_MAX_IRQ 2 > #define PNP_MAX_DMA 2 > #define PNP_NAME_LEN 50 -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine 2008-02-02 4:18 ` Denys Fedoryshchenko @ 2008-02-02 18:38 ` Wim Van Sebroeck 0 siblings, 0 replies; 7+ messages in thread From: Wim Van Sebroeck @ 2008-02-02 18:38 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Len Brown, linux-kernel Hi Denys, > Probably someone can help me with this? Or it is hardware bug of chipset? > I will try to look more docs, maybe i will be able to find whats wrong there. I'll have a look at it next week. Greetings, Wim. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-02 18:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-01 15:12 kernel panic on 2.6.24/iTCO_wdt not rebooting machine Denys Fedoryshchenko 2008-02-01 17:11 ` Len Brown 2008-02-01 19:15 ` Denys Fedoryshchenko 2008-02-01 20:39 ` Len Brown 2008-02-02 0:44 ` Denys Fedoryshchenko 2008-02-02 4:18 ` Denys Fedoryshchenko 2008-02-02 18:38 ` Wim Van Sebroeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox