* 4.1.12 kernel crash in rtnetlink_put_metrics @ 2015-11-04 16:00 Andrew 2015-11-04 19:55 ` Daniel Borkmann 0 siblings, 1 reply; 5+ messages in thread From: Andrew @ 2015-11-04 16:00 UTC (permalink / raw) To: netdev Hi all. Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF). This server becomes unstable after updating from 3.2.x kernel to 4.1.x (other servers with slightly different CPUs/MBs also have troubles - but they hang less frequently). Place in kernel code: (gdb) list *rtnetlink_put_metrics+0x50 0xc131c7d0 is in rtnetlink_put_metrics (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672). 667 mx = nla_nest_start(skb, RTA_METRICS); 668 if (mx == NULL) 669 return -ENOBUFS; 670 671 for (i = 0; i < RTAX_MAX; i++) { 672 if (metrics[i]) { 673 if (i == RTAX_CC_ALGO - 1) { 674 char tmp[TCP_CA_NAME_MAX], *name; 675 676 name = tcp_ca_get_name_by_key(metrics[i], tmp); Here's trace: [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt =0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP [41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.475807]CPU:2PID:10877Comm:bird Tainted:G O 4.1.12-i686 #1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000 [41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP isat rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000 EBP:e1365ca8 ESP:e1365c78 [41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873 d4316f64 00000000e0b38000 e1365d00 c1213989 00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00 c138362e e1365cd8 0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000 000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c109 4bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8 89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7 45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0 8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP 0068:e1365c78 [41358.477576]CR2:0000000000000000[41358.477880]---[endtrace 6e3e7e6b81407c0a]---[41358.499813]------------[cut here ]------------[41358.499879]WARNING:CPU:2PID:0at /var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G D O 4.1.12-i686 #1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540 f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4 0000000200000000[41358.502610] c14b0540 000003b0c13373e8 00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50 c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7 [41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x6e /0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e>]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace 6e3e7e6b81407c0b]--- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics 2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew @ 2015-11-04 19:55 ` Daniel Borkmann 2016-03-07 22:15 ` subashab 0 siblings, 1 reply; 5+ messages in thread From: Daniel Borkmann @ 2015-11-04 19:55 UTC (permalink / raw) To: Andrew; +Cc: netdev Hi Andrew, thanks for the report! On 11/04/2015 05:00 PM, Andrew wrote: > Hi all. > > Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF). This server becomes unstable after updating from 3.2.x kernel to 4.1.x (other servers with slightly different CPUs/MBs also have troubles - but they hang less frequently). > > Place in kernel code: > (gdb) list *rtnetlink_put_metrics+0x50 > 0xc131c7d0 is in rtnetlink_put_metrics (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672). > 667 mx = nla_nest_start(skb, RTA_METRICS); > 668 if (mx == NULL) > 669 return -ENOBUFS; > 670 > 671 for (i = 0; i < RTAX_MAX; i++) { > 672 if (metrics[i]) { ( Making the trace a bit more readable ... ) [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null) [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180 [...] CallTrace: [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0 [41358.476557][<c1213989>]?__nla_put+0x9/0xb0 [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0 [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678 [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180 [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100 [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270 [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40 [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360 [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30 [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120 [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130 [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0 [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80 [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80 [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0 [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60 [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100 [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30 [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120 [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7 [...] Strange that rtnetlink_put_metrics() itself is not part of the above call trace (it's an exported symbol). So, your analysis suggests that metrics itself is NULL in this case? (Can you confirm that?) How frequently does this trigger? Are the seen call traces all the same kind? Is there an easy way to reproduce this? I presume you don't use any per route congestion control settings, right? Thanks, Daniel > 673 if (i == RTAX_CC_ALGO - 1) { > 674 char tmp[TCP_CA_NAME_MAX], *name; > 675 > 676 name = tcp_ca_get_name_by_key(metrics[i], tmp); > > > Here's trace: > > [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt =0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP [41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.475807]CPU:2PID:10877Comm:bird Tainted:G O 4.1.12-i686 #1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS > V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000 [41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP isat rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000 EBP:e1365ca8 ESP:e1365c78 [41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873 d4316f64 00000000e0b38000 e1365d00 c1213989 00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00 c138362e e1365cd8 0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000 > 000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c1 094bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358 .477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8 > 89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7 45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0 8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP 0068:e1365c78 [41358.477576]CR2:0000000000000000[41358.477880]---[endtrace 6e3e7e6b81407c0a]---[41358.499813]------------[cut here ]------------[41358.499879]WARNING:CPU:2PID:0at /var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci > libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G D O 4.1.12-i686 #1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540 f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4 0000000200000000[41358.502610] c14b0540 000003b0c13373e8 00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50 c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7 > [41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x 6e/0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e> ]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace > 6e3e7e6b81407c0b]--- > > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics 2015-11-04 19:55 ` Daniel Borkmann @ 2016-03-07 22:15 ` subashab 2016-03-07 23:39 ` Daniel Borkmann 0 siblings, 1 reply; 5+ messages in thread From: subashab @ 2016-03-07 22:15 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Andrew, netdev, netdev-owner On , Daniel Borkmann wrote: > Hi Andrew, > > thanks for the report! > > ( Making the trace a bit more readable ... ) > > [41358.475254]BUG:unable to handle kernel NULL pointer dereference at > (null) > [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180 > [...] > CallTrace: > [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0 > [41358.476557][<c1213989>]?__nla_put+0x9/0xb0 > [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0 > [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678 > [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180 > [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100 > [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270 > [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40 > [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360 > [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30 > [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 > [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120 > [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 > [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130 > [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0 > [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80 > [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80 > [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0 > [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60 > [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100 > [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30 > [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120 > [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7 > [...] > > Strange that rtnetlink_put_metrics() itself is not part of the above > call trace (it's an exported symbol). > > So, your analysis suggests that metrics itself is NULL in this case? > (Can you confirm that?) > > How frequently does this trigger? Are the seen call traces all the same > kind? > > Is there an easy way to reproduce this? > > I presume you don't use any per route congestion control settings, > right? > > Thanks, > Daniel Hi Daniel I am observing a similar crash as well. This is on a 3.10 based ARM64 kernel. Unfortunately, the crash is occurring in a regression test rack, so I am not sure of the exact test case to reproduce this crash. This seems to have occurred twice so far with both cases having metrics as NULL. | rt_=_0xFFFFFFC012DA4300 -> ( | dst = ( | callback_head = (next = 0x0, func = 0xFFFFFF800262D040), | child = 0xFFFFFFC03B8BC2B0, | dev = 0xFFFFFFC012DA4318, | ops = 0xFFFFFFC012DA4318, | _metrics = 0, | expires = 0, | path = 0x0, | from = 0x0, | xfrm = 0x0, | input = 0xFFFFFFC0AD498000, | output = 0x000000010401C411, | flags = 0, | pending_confirm = 0, | error = 0, | obsolete = 0, | header_len = 3, | trailer_len = 0, | __pad2 = 4096, 168539.549000: <6> Process ip (pid: 28473, stack limit = 0xffffffc04b584060) 168539.549006: <2> Call trace: 168539.549016: <2> [<ffffffc000a95900>] rtnetlink_put_metrics+0x4c/0xec 168539.549027: <2> [<ffffffc000b5e198>] rt6_fill_node.isra.34+0x2b8/0x3c8 168539.549035: <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c 168539.549043: <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74 168539.549051: <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4 168539.549059: <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8 168539.549067: <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234 168539.549076: <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc 168539.549084: <2> [<ffffffc000ab22f0>] __netlink_dump_start+0x128/0x170 168539.549093: <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0 168539.549101: <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8 168539.549110: <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c 168539.549117: <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8 168539.549125: <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4 168539.549134: <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0 168539.549143: <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110 I am using the following patch as a workaround now. I do not have any per route congestion control settings enabled. Any pointers to debug this would be greatly appreciated. diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a67310e..c63098e 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics) int i, valid = 0; mx = nla_nest_start(skb, RTA_METRICS); - if (mx == NULL) + if (mx == NULL || metrics == NULL) return -ENOBUFS; for (i = 0; i < RTAX_MAX; i++) { ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics 2016-03-07 22:15 ` subashab @ 2016-03-07 23:39 ` Daniel Borkmann 2016-03-08 4:27 ` subashab 0 siblings, 1 reply; 5+ messages in thread From: Daniel Borkmann @ 2016-03-07 23:39 UTC (permalink / raw) To: subashab; +Cc: Andrew, netdev, kafai On 03/07/2016 11:15 PM, subashab@codeaurora.org wrote: > On , Daniel Borkmann wrote: >> Hi Andrew, >> >> thanks for the report! >> >> ( Making the trace a bit more readable ... ) >> >> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null) >> [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180 >> [...] >> CallTrace: >> [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0 >> [41358.476557][<c1213989>]?__nla_put+0x9/0xb0 >> [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0 >> [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678 >> [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180 >> [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100 >> [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270 >> [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40 >> [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360 >> [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30 >> [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 >> [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120 >> [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30 >> [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130 >> [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0 >> [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80 >> [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80 >> [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0 >> [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60 >> [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100 >> [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30 >> [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120 >> [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7 >> [...] >> >> Strange that rtnetlink_put_metrics() itself is not part of the above >> call trace (it's an exported symbol). >> >> So, your analysis suggests that metrics itself is NULL in this case? >> (Can you confirm that?) >> >> How frequently does this trigger? Are the seen call traces all the same kind? >> >> Is there an easy way to reproduce this? >> >> I presume you don't use any per route congestion control settings, right? >> >> Thanks, >> Daniel > > Hi Daniel > > I am observing a similar crash as well. This is on a 3.10 based ARM64 kernel. > Unfortunately, the crash is occurring in a regression test rack, so I am not > sure of the exact test case to reproduce this crash. This seems to have > occurred twice so far with both cases having metrics as NULL. > > | rt_=_0xFFFFFFC012DA4300 -> ( > | dst = ( > | callback_head = (next = 0x0, func = 0xFFFFFF800262D040), > | child = 0xFFFFFFC03B8BC2B0, > | dev = 0xFFFFFFC012DA4318, > | ops = 0xFFFFFFC012DA4318, > | _metrics = 0, > | expires = 0, > | path = 0x0, > | from = 0x0, > | xfrm = 0x0, > | input = 0xFFFFFFC0AD498000, > | output = 0x000000010401C411, > | flags = 0, > | pending_confirm = 0, > | error = 0, > | obsolete = 0, > | header_len = 3, > | trailer_len = 0, > | __pad2 = 4096, > > 168539.549000: <6> Process ip (pid: 28473, stack limit = 0xffffffc04b584060) > 168539.549006: <2> Call trace: > 168539.549016: <2> [<ffffffc000a95900>] rtnetlink_put_metrics+0x4c/0xec > 168539.549027: <2> [<ffffffc000b5e198>] rt6_fill_node.isra.34+0x2b8/0x3c8 > 168539.549035: <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c > 168539.549043: <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74 > 168539.549051: <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4 > 168539.549059: <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8 > 168539.549067: <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234 > 168539.549076: <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc > 168539.549084: <2> [<ffffffc000ab22f0>] __netlink_dump_start+0x128/0x170 > 168539.549093: <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0 > 168539.549101: <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8 > 168539.549110: <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c > 168539.549117: <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8 > 168539.549125: <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4 > 168539.549134: <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0 > 168539.549143: <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110 > > I am using the following patch as a workaround now. I do not have any > per route congestion control settings enabled. > Any pointers to debug this would be greatly appreciated. Hmm, if it was 4.1.X like in original reporter case, I might have thought something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly in ip6_blackhole_route()") ... any chance on reproducing this on a latest kernel? > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index a67310e..c63098e 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics) > int i, valid = 0; > > mx = nla_nest_start(skb, RTA_METRICS); > - if (mx == NULL) > + if (mx == NULL || metrics == NULL) > return -ENOBUFS; > > for (i = 0; i < RTAX_MAX; i++) { > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics 2016-03-07 23:39 ` Daniel Borkmann @ 2016-03-08 4:27 ` subashab 0 siblings, 0 replies; 5+ messages in thread From: subashab @ 2016-03-08 4:27 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Andrew, netdev, kafai > Hmm, if it was 4.1.X like in original reporter case, I might have thought > something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly > in ip6_blackhole_route()") ... any chance on reproducing this on a latest > kernel? > Unfortunately, I haven't encountered a similar crash on newer kernels as of now. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-08 4:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew 2015-11-04 19:55 ` Daniel Borkmann 2016-03-07 22:15 ` subashab 2016-03-07 23:39 ` Daniel Borkmann 2016-03-08 4:27 ` subashab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).