* 3.17-rc1 oops during network interface configuration
@ 2014-08-18 12:18 Bart Van Assche
2014-08-19 14:56 ` Chuck Lever
[not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
0 siblings, 2 replies; 12+ messages in thread
From: Bart Van Assche @ 2014-08-18 12:18 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-rdma
Hello,
Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
following call trace is triggered during boot on a system on which kernel
3.16 runs fine:
BUG: unable to handle kernel paging request at ffff88090000007e
IP: __dev_queue_xmit+0x519
Call Trace:
? __dev_queue_xmit+0x49
dev_queue_xmit+0x10
neigh_connected_output
? ip_finish_output
ip_finish_output
? ip_finish_output
? netif_rx_ni
ip_mc_output
ip_local_out_sk
ip_send_skb
udp_send_skb
udp_sendmsg
? ip_reply_glue_bits
? __lock_is_held
inet_sendmsg
? inet_sendmsg
sock_sendmsg
? might_fault
? might_fault
? move_addr_to_kernel.part.38
SYSC_sendto
? sysret_check
? trace_hardirqs_on_caller
? trace_hardirqs_on_thunk
SyS_sendto
system_call_fastpath
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
A screenshot of this kernel oops can be found here:
https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
gdb translates the crash address into the following (not sure this makes sense
since offset 0x519 is past the end of __dev_queue_xmit()):
(gdb) list *(__dev_queue_xmit+0x519)
0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
5163 {
5164 struct netdev_adjacent *iter;
5165
5166 list_for_each_entry(iter, &dev->adj_list.upper, list) {
5167 netdev_adjacent_sysfs_del(iter->dev, oldname,
5168 &iter->dev->adj_list.lower);
5169 netdev_adjacent_sysfs_add(iter->dev, dev,
5170 &iter->dev->adj_list.lower);
5171 }
And the address __dev_queue_xmit+0x49 is translated by gdb into:
(gdb) list *(__dev_queue_xmit+0x49)
0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
70 * The various preempt_count add/sub methods
71 */
72
73 static __always_inline void __preempt_count_add(int val)
74 {
75 raw_cpu_add_4(__preempt_count, val);
76 }
77
78 static __always_inline void __preempt_count_sub(int val)
79 {
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 3.17-rc1 oops during network interface configuration 2014-08-18 12:18 3.17-rc1 oops during network interface configuration Bart Van Assche @ 2014-08-19 14:56 ` Chuck Lever [not found] ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org> 1 sibling, 1 reply; 12+ messages in thread From: Chuck Lever @ 2014-08-19 14:56 UTC (permalink / raw) To: Bart Van Assche; +Cc: netdev, linux-rdma On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche@acm.org> wrote: > Hello, > > Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? After updating to 3.17-rc1 this morning, I hit the same issue. > The > following call trace is triggered during boot on a system on which kernel > 3.16 runs fine: > > BUG: unable to handle kernel paging request at ffff88090000007e > IP: __dev_queue_xmit+0x519 > Call Trace: > ? __dev_queue_xmit+0x49 > dev_queue_xmit+0x10 > neigh_connected_output > ? ip_finish_output > ip_finish_output > ? ip_finish_output > ? netif_rx_ni > ip_mc_output > ip_local_out_sk > ip_send_skb > udp_send_skb > udp_sendmsg > ? ip_reply_glue_bits > ? __lock_is_held > inet_sendmsg > ? inet_sendmsg > sock_sendmsg > ? might_fault > ? might_fault > ? move_addr_to_kernel.part.38 > SYSC_sendto > ? sysret_check > ? trace_hardirqs_on_caller > ? trace_hardirqs_on_thunk > SyS_sendto > system_call_fastpath > > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > drm_kms_helper: panic occurred, switching back to text console > > A screenshot of this kernel oops can be found here: > https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ > > gdb translates the crash address into the following (not sure this makes sense > since offset 0x519 is past the end of __dev_queue_xmit()): > > (gdb) list *(__dev_queue_xmit+0x519) > 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167). > 5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname) > 5163 { > 5164 struct netdev_adjacent *iter; > 5165 > 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { > 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, > 5168 &iter->dev->adj_list.lower); > 5169 netdev_adjacent_sysfs_add(iter->dev, dev, > 5170 &iter->dev->adj_list.lower); > 5171 } > > And the address __dev_queue_xmit+0x49 is translated by gdb into: > > (gdb) list *(__dev_queue_xmit+0x49) > 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75). > 70 * The various preempt_count add/sub methods > 71 */ > 72 > 73 static __always_inline void __preempt_count_add(int val) > 74 { > 75 raw_cpu_add_4(__preempt_count, val); > 76 } > 77 > 78 static __always_inline void __preempt_count_sub(int val) > 79 { > > Bart. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* RE: 3.17-rc1 oops during network interface configuration [not found] ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2014-09-09 16:24 ` Steve Wise 2014-09-09 16:41 ` Chuck Lever 0 siblings, 1 reply; 12+ messages in thread From: Steve Wise @ 2014-09-09 16:24 UTC (permalink / raw) To: 'Chuck Lever', 'Bart Van Assche' Cc: netdev-u79uwXL29TY76Z2rM5mHXA, 'linux-rdma' > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever > Sent: Tuesday, August 19, 2014 9:56 AM > To: Bart Van Assche > Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma > Subject: Re: 3.17-rc1 oops during network interface configuration > > > On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote: > > > Hello, > > > > Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? > > After updating to 3.17-rc1 this morning, I hit the same issue. > I still see this with 3.17-rc4. Perhaps somebody should fix it. :) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.17-rc1 oops during network interface configuration 2014-09-09 16:24 ` Steve Wise @ 2014-09-09 16:41 ` Chuck Lever 0 siblings, 0 replies; 12+ messages in thread From: Chuck Lever @ 2014-09-09 16:41 UTC (permalink / raw) To: Steve Wise; +Cc: Bart Van Assche, netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma On Sep 9, 2014, at 12:24 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote: > > >> -----Original Message----- >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Chuck Lever >> Sent: Tuesday, August 19, 2014 9:56 AM >> To: Bart Van Assche >> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma >> Subject: Re: 3.17-rc1 oops during network interface configuration >> >> >> On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote: >> >>> Hello, >>> >>> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? >> >> After updating to 3.17-rc1 this morning, I hit the same issue. >> > > I still see this with 3.17-rc4. Perhaps somebody should fix it. :) I’m working on it right now. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>]
* Re: 3.17-rc1 oops during network interface configuration [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org> @ 2014-08-20 10:31 ` Or Gerlitz 2014-09-09 19:30 ` Chuck Lever 0 siblings, 1 reply; 12+ messages in thread From: Or Gerlitz @ 2014-08-20 10:31 UTC (permalink / raw) To: Bart Van Assche Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Saeed Mahameed, Tal Alon, Yevgeny Petrilin On 18/08/2014 15:18, Bart Van Assche wrote: > Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The > following call trace is triggered during boot on a system on which kernel > 3.16 runs fine: Yep, I see it on my systems too. I narrowed this down a bit to happen only when the port link type (these nodes have ConnectX) is IB and IPoIB gets to load. I reverted (below) all the IPoIB changes since 3.16 (except for the trivial commit c835a67) and the crash still exists. I guess this needs to go through systematic bisection. Or. > net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/ > 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism" > 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context" > 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key" > 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()" > e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove() > dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key > 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context > db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism > c835a67 net: set name_assign_type in alloc_netdev() > BUG: unable to handle kernel paging request at ffff88090000007e > IP: __dev_queue_xmit+0x519 > Call Trace: > ? __dev_queue_xmit+0x49 > dev_queue_xmit+0x10 > neigh_connected_output > ? ip_finish_output > ip_finish_output > ? ip_finish_output > ? netif_rx_ni > ip_mc_output > ip_local_out_sk > ip_send_skb > udp_send_skb > udp_sendmsg > ? ip_reply_glue_bits > ? __lock_is_held > inet_sendmsg > ? inet_sendmsg > sock_sendmsg > ? might_fault > ? might_fault > ? move_addr_to_kernel.part.38 > SYSC_sendto > ? sysret_check > ? trace_hardirqs_on_caller > ? trace_hardirqs_on_thunk > SyS_sendto > system_call_fastpath > > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > drm_kms_helper: panic occurred, switching back to text console > > A screenshot of this kernel oops can be found here: > https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ > > gdb translates the crash address into the following (not sure this makes sense > since offset 0x519 is past the end of __dev_queue_xmit()): > > (gdb) list *(__dev_queue_xmit+0x519) > 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167). > 5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname) > 5163 { > 5164 struct netdev_adjacent *iter; > 5165 > 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { > 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, > 5168 &iter->dev->adj_list.lower); > 5169 netdev_adjacent_sysfs_add(iter->dev, dev, > 5170 &iter->dev->adj_list.lower); > 5171 } > > And the address __dev_queue_xmit+0x49 is translated by gdb into: > > (gdb) list *(__dev_queue_xmit+0x49) > 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75). > 70 * The various preempt_count add/sub methods > 71 */ > 72 > 73 static __always_inline void __preempt_count_add(int val) > 74 { > 75 raw_cpu_add_4(__preempt_count, val); > 76 } > 77 > 78 static __always_inline void __preempt_count_sub(int val) > 79 { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.17-rc1 oops during network interface configuration 2014-08-20 10:31 ` Or Gerlitz @ 2014-09-09 19:30 ` Chuck Lever [not found] ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Chuck Lever @ 2014-09-09 19:30 UTC (permalink / raw) To: netdev, LKML Kernel Cc: linux-rdma, Bart Van Assche, Or Gerlitz, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, _govind On Aug 20, 2014, at 6:31 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote: > On 18/08/2014 15:18, Bart Van Assche wrote: >> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The >> following call trace is triggered during boot on a system on which kernel >> 3.16 runs fine: > > Yep, I see it on my systems too. > > I narrowed this down a bit to happen only when the port link type (these nodes have ConnectX) is IB and IPoIB gets to load. > > I reverted (below) all the IPoIB changes since 3.16 (except for the trivial commit c835a67) and the crash still exists. > > I guess this needs to go through systematic bisection. This crash happens when booting v3.17-rcN on any of my IB-enabled systems. I have both ConnectX-2 and mthca systems, all are affected. I bisected this to: commit e0f31d8498676fda36289603a054d0d490aa2679 Author: Govindarajulu Varadarajan <_govind@gmx.com> AuthorDate: Mon Jun 23 16:07:58 2014 +0530 Commit: David S. Miller <davem@davemloft.net> CommitDate: Mon Jun 23 14:32:19 2014 -0700 flow_keys: Record IP layer protocol in skb_flow_dissect() skb_flow_dissect() dissects only transport header type in ip_proto. It dose not give any information about IPv4 or IPv6. This patch adds new member, n_proto, to struct flow_keys. Which records the IP layer type. i.e IPv4 or IPv6. This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow. Adding new member to flow_keys increases the struct size by around 4 bytes. This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in qdisc_cb_private_validate() So increase data size by 4 Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> This commit includes a hunk that increases the size of struct qdisc_skb_cb by at least 4 bytes: > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index 624f985..a3cfb8e 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -231,7 +231,7 @@ struct qdisc_skb_cb { > unsigned int pkt_len; > u16 slave_dev_queue_mapping; > u16 _pad; > - unsigned char data[20]; > + unsigned char data[24]; > }; > > static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h: > struct ipoib_cb { > struct qdisc_skb_cb qdisc_cb; > u8 hwaddr[INFINIBAND_ALEN]; > }; IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes. After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes 52 bytes. Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst field, which contains a pointer. By the time we get into __dev_queue_xmit() and try to use the result of skb_dst(), that pointer is garbage, and we oops. Obviously, cb[] could be increased to 56 bytes to accommodate struct ipoib_cb. I tried this, and it is effective in preventing the oops on one of my systems. But I suspect there is an historical reason I’m not aware of that it has remained 48 bytes for years. > Or. > >> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/ >> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism" >> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context" >> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key" >> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()" >> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove() >> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key >> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context >> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism >> c835a67 net: set name_assign_type in alloc_netdev() > > >> BUG: unable to handle kernel paging request at ffff88090000007e >> IP: __dev_queue_xmit+0x519 >> Call Trace: >> ? __dev_queue_xmit+0x49 >> dev_queue_xmit+0x10 >> neigh_connected_output >> ? ip_finish_output >> ip_finish_output >> ? ip_finish_output >> ? netif_rx_ni >> ip_mc_output >> ip_local_out_sk >> ip_send_skb >> udp_send_skb >> udp_sendmsg >> ? ip_reply_glue_bits >> ? __lock_is_held >> inet_sendmsg >> ? inet_sendmsg >> sock_sendmsg >> ? might_fault >> ? might_fault >> ? move_addr_to_kernel.part.38 >> SYSC_sendto >> ? sysret_check >> ? trace_hardirqs_on_caller >> ? trace_hardirqs_on_thunk >> SyS_sendto >> system_call_fastpath >> >> Kernel panic - not syncing: Fatal exception in interrupt >> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) >> drm_kms_helper: panic occurred, switching back to text console >> >> A screenshot of this kernel oops can be found here: >> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ >> >> gdb translates the crash address into the following (not sure this makes sense >> since offset 0x519 is past the end of __dev_queue_xmit()): >> >> (gdb) list *(__dev_queue_xmit+0x519) >> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167). >> 5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname) >> 5163 { >> 5164 struct netdev_adjacent *iter; >> 5165 >> 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { >> 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, >> 5168 &iter->dev->adj_list.lower); >> 5169 netdev_adjacent_sysfs_add(iter->dev, dev, >> 5170 &iter->dev->adj_list.lower); >> 5171 } >> >> And the address __dev_queue_xmit+0x49 is translated by gdb into: >> >> (gdb) list *(__dev_queue_xmit+0x49) >> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75). >> 70 * The various preempt_count add/sub methods >> 71 */ >> 72 >> 73 static __always_inline void __preempt_count_add(int val) >> 74 { >> 75 raw_cpu_add_4(__preempt_count, val); >> 76 } >> 77 >> 78 static __always_inline void __preempt_count_sub(int val) >> 79 { > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: 3.17-rc1 oops during network interface configuration [not found] ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2014-09-10 7:42 ` Or Gerlitz 2014-09-10 14:24 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Or Gerlitz @ 2014-09-10 7:42 UTC (permalink / raw) To: Chuck Lever, Govindarajulu Varadarajan, David S. Miller Cc: netdev-u79uwXL29TY76Z2rM5mHXA, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, Roland Dreier On 9/9/2014 10:30 PM, Chuck Lever wrote: > This crash happens when booting v3.17-rcN on any of my IB-enabled > systems. I have both ConnectX-2 and mthca systems, all are affected. > > I bisected this to: > > commit e0f31d8498676fda36289603a054d0d490aa2679 > Author: Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org> > AuthorDate: Mon Jun 23 16:07:58 2014 +0530 > Commit: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> > CommitDate: Mon Jun 23 14:32:19 2014 -0700 > > flow_keys: Record IP layer protocol in skb_flow_dissect() > > skb_flow_dissect() dissects only transport header type in ip_proto. It dose not > give any information about IPv4 or IPv6. > This patch adds new member, n_proto, to struct flow_keys. Which records the > IP layer type. i.e IPv4 or IPv6. > This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow. > Adding new member to flow_keys increases the struct size by around 4 bytes. > This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in > qdisc_cb_private_validate() > So increase data size by 4 > > Signed-off-by: Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org> > Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> > > > This commit includes a hunk that increases the size of struct qdisc_skb_cb > by at least 4 bytes: > >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >> index 624f985..a3cfb8e 100644 >> --- a/include/net/sch_generic.h >> +++ b/include/net/sch_generic.h >> @@ -231,7 +231,7 @@ struct qdisc_skb_cb { >> unsigned int pkt_len; >> u16 slave_dev_queue_mapping; >> u16 _pad; >> - unsigned char data[20]; >> + unsigned char data[24]; >> }; >> >> static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) > > > IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h: > >> struct ipoib_cb { >> struct qdisc_skb_cb qdisc_cb; >> u8 hwaddr[INFINIBAND_ALEN]; >> }; > IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes. > After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes > 52 bytes. > > Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst > field, which contains a pointer. By the time we get into > __dev_queue_xmit() and try to use the result of skb_dst(), that pointer > is garbage, and we oops. > > Obviously, cb[] could be increased to 56 bytes to accommodate struct > ipoib_cb. I tried this, and it is effective in preventing the oops on > one of my systems. > > But I suspect there is an historical reason I’m not aware of that it > has remained 48 bytes for years. Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses" we are using the skb->cb field to enable proper work under GRO and avoid another historical quirk we had there... so I think we can definetly consider commit e0f31d849 to introduce a severe regression... Govindarajulu, Dave - what's your thinking here? any quick idea on how to fix? Also, I was thinking we have the mechanics in the kernel, e.g commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch such over-flows? Or. >>> BUG: unable to handle kernel paging request at ffff88090000007e >>> IP: __dev_queue_xmit+0x519 >>> Call Trace: >>> ? __dev_queue_xmit+0x49 >>> dev_queue_xmit+0x10 >>> neigh_connected_output >>> ? ip_finish_output >>> ip_finish_output >>> ? ip_finish_output >>> ? netif_rx_ni >>> ip_mc_output >>> ip_local_out_sk >>> ip_send_skb >>> udp_send_skb >>> udp_sendmsg >>> ? ip_reply_glue_bits >>> ? __lock_is_held >>> inet_sendmsg >>> ? inet_sendmsg >>> sock_sendmsg >>> ? might_fault >>> ? might_fault >>> ? move_addr_to_kernel.part.38 >>> SYSC_sendto >>> ? sysret_check >>> ? trace_hardirqs_on_caller >>> ? trace_hardirqs_on_thunk >>> SyS_sendto >>> system_call_fastpath >>> >>> Kernel panic - not syncing: Fatal exception in interrupt >>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) >>> drm_kms_helper: panic occurred, switching back to text console >>> >>> A screenshot of this kernel oops can be found here: >>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ >>> >>> gdb translates the crash address into the following (not sure this makes sense >>> since offset 0x519 is past the end of __dev_queue_xmit()): >>> >>> (gdb) list *(__dev_queue_xmit+0x519) >>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167). >>> 5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname) >>> 5163 { >>> 5164 struct netdev_adjacent *iter; >>> 5165 >>> 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { >>> 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, >>> 5168 &iter->dev->adj_list.lower); >>> 5169 netdev_adjacent_sysfs_add(iter->dev, dev, >>> 5170 &iter->dev->adj_list.lower); >>> 5171 } >>> >>> And the address __dev_queue_xmit+0x49 is translated by gdb into: >>> >>> (gdb) list *(__dev_queue_xmit+0x49) >>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75). >>> 70 * The various preempt_count add/sub methods >>> 71 */ >>> 72 >>> 73 static __always_inline void __preempt_count_add(int val) >>> 74 { >>> 75 raw_cpu_add_4(__preempt_count, val); >>> 76 } >>> 77 >>> 78 static __always_inline void __preempt_count_sub(int val) >>> 79 { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.17-rc1 oops during network interface configuration 2014-09-10 7:42 ` Or Gerlitz @ 2014-09-10 14:24 ` Eric Dumazet [not found] ` <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2014-09-29 18:52 ` Chuck Lever 2 siblings, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2014-09-10 14:24 UTC (permalink / raw) To: Or Gerlitz Cc: Chuck Lever, Govindarajulu Varadarajan, David S. Miller, netdev, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, Roland Dreier On Wed, 2014-09-10 at 10:42 +0300, Or Gerlitz wrote: > Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 > commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb > to stash LL addresses" we are using the skb->cb field to enable proper > work under GRO and avoid another historical quirk we had there... so I > think we can definetly consider commit e0f31d849 to introduce a severe > regression... Govindarajulu, Dave - what's your thinking here? any quick > idea on how to fix? I mentioned the IB stuff when patch was posted, and David replied : "I think this is fine, IPOIB's control block will need still just 44 bytes after these changes, so there will still be 4 bytes to spare." http://patchwork.ozlabs.org/patch/357584/ My suggestion was to reduce sch_choke usage and not store the whole flow_keys. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: 3.17-rc1 oops during network interface configuration [not found] ` <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2014-09-10 20:04 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2014-09-10 20:04 UTC (permalink / raw) To: ogerlitz-VPRAkNaXOzVWk0Htik3J/w Cc: chuck.lever-QHcLZuEGTsvQT0dZR+AlfA, _govind-KK0ffGbhmjU, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA, bvanassche-HInyCGIudOg, saeedm-VPRAkNaXOzVWk0Htik3J/w, talal-VPRAkNaXOzVWk0Htik3J/w, yevgenyp-VPRAkNaXOzVWk0Htik3J/w, roland-DgEjT+Ai2ygdnm+yROfE0A From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Date: Wed, 10 Sep 2014 10:42:41 +0300 > Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 > commit 936d7de "IPoIB: Stop lying about hard_header_len and use > skb->cb to stash LL addresses" we are using the skb->cb field to > enable proper work under GRO and avoid another historical quirk we had > there... so I think we can definetly consider commit e0f31d849 to > introduce a severe regression... Govindarajulu, Dave - what's your > thinking here? any quick idea on how to fix? Eric mentioned that we could reduce the amount of flow state stored in the qdisc cb in order to handle this better. Making skb->cb[] larger is basically out of the question as far as I'm concerned. > Also, I was thinking we have the mechanics in the kernel, e.g commit > a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to > catch such over-flows? Yes we should have added a build-time check so that we would discover this issue more quickly. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.17-rc1 oops during network interface configuration 2014-09-10 7:42 ` Or Gerlitz 2014-09-10 14:24 ` Eric Dumazet [not found] ` <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2014-09-29 18:52 ` Chuck Lever [not found] ` <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 12+ messages in thread From: Chuck Lever @ 2014-09-29 18:52 UTC (permalink / raw) To: Or Gerlitz Cc: Govindarajulu Varadarajan, David S. Miller, netdev, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, Roland Dreier On Sep 10, 2014, at 3:42 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote: > > On 9/9/2014 10:30 PM, Chuck Lever wrote: >> This crash happens when booting v3.17-rcN on any of my IB-enabled >> systems. I have both ConnectX-2 and mthca systems, all are affected. >> >> I bisected this to: >> >> commit e0f31d8498676fda36289603a054d0d490aa2679 >> Author: Govindarajulu Varadarajan <_govind@gmx.com> >> AuthorDate: Mon Jun 23 16:07:58 2014 +0530 >> Commit: David S. Miller <davem@davemloft.net> >> CommitDate: Mon Jun 23 14:32:19 2014 -0700 >> >> flow_keys: Record IP layer protocol in skb_flow_dissect() >> >> skb_flow_dissect() dissects only transport header type in ip_proto. It dose not >> give any information about IPv4 or IPv6. >> This patch adds new member, n_proto, to struct flow_keys. Which records the >> IP layer type. i.e IPv4 or IPv6. >> This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow. >> Adding new member to flow_keys increases the struct size by around 4 bytes. >> This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in >> qdisc_cb_private_validate() >> So increase data size by 4 >> >> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> >> This commit includes a hunk that increases the size of struct qdisc_skb_cb >> by at least 4 bytes: >> >>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >>> index 624f985..a3cfb8e 100644 >>> --- a/include/net/sch_generic.h >>> +++ b/include/net/sch_generic.h >>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb { >>> unsigned int pkt_len; >>> u16 slave_dev_queue_mapping; >>> u16 _pad; >>> - unsigned char data[20]; >>> + unsigned char data[24]; >>> }; >>> static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) >> >> >> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h: >> >>> struct ipoib_cb { >>> struct qdisc_skb_cb qdisc_cb; >>> u8 hwaddr[INFINIBAND_ALEN]; >>> }; >> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes. >> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes >> 52 bytes. >> >> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst >> field, which contains a pointer. By the time we get into >> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer >> is garbage, and we oops. >> >> Obviously, cb[] could be increased to 56 bytes to accommodate struct >> ipoib_cb. I tried this, and it is effective in preventing the oops on >> one of my systems. >> >> But I suspect there is an historical reason I’m not aware of that it >> has remained 48 bytes for years. > > Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses" we are using the skb->cb field to enable proper work under GRO and avoid another historical quirk we had there... so I think we can definetly consider commit e0f31d849 to introduce a severe regression... Govindarajulu, Dave - what's your thinking here? any quick idea on how to fix? Hi Or- Is there a bugzilla report filed for this issue? Has there been any progress towards a fix? > Also, I was thinking we have the mechanics in the kernel, e.g commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch such over-flows? > > Or. > >>>> BUG: unable to handle kernel paging request at ffff88090000007e >>>> IP: __dev_queue_xmit+0x519 >>>> Call Trace: >>>> ? __dev_queue_xmit+0x49 >>>> dev_queue_xmit+0x10 >>>> neigh_connected_output >>>> ? ip_finish_output >>>> ip_finish_output >>>> ? ip_finish_output >>>> ? netif_rx_ni >>>> ip_mc_output >>>> ip_local_out_sk >>>> ip_send_skb >>>> udp_send_skb >>>> udp_sendmsg >>>> ? ip_reply_glue_bits >>>> ? __lock_is_held >>>> inet_sendmsg >>>> ? inet_sendmsg >>>> sock_sendmsg >>>> ? might_fault >>>> ? might_fault >>>> ? move_addr_to_kernel.part.38 >>>> SYSC_sendto >>>> ? sysret_check >>>> ? trace_hardirqs_on_caller >>>> ? trace_hardirqs_on_thunk >>>> SyS_sendto >>>> system_call_fastpath >>>> >>>> Kernel panic - not syncing: Fatal exception in interrupt >>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) >>>> drm_kms_helper: panic occurred, switching back to text console >>>> >>>> A screenshot of this kernel oops can be found here: >>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ >>>> >>>> gdb translates the crash address into the following (not sure this makes sense >>>> since offset 0x519 is past the end of __dev_queue_xmit()): >>>> >>>> (gdb) list *(__dev_queue_xmit+0x519) >>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167). >>>> 5162 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname) >>>> 5163 { >>>> 5164 struct netdev_adjacent *iter; >>>> 5165 >>>> 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { >>>> 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, >>>> 5168 &iter->dev->adj_list.lower); >>>> 5169 netdev_adjacent_sysfs_add(iter->dev, dev, >>>> 5170 &iter->dev->adj_list.lower); >>>> 5171 } >>>> >>>> And the address __dev_queue_xmit+0x49 is translated by gdb into: >>>> >>>> (gdb) list *(__dev_queue_xmit+0x49) >>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75). >>>> 70 * The various preempt_count add/sub methods >>>> 71 */ >>>> 72 >>>> 73 static __always_inline void __preempt_count_add(int val) >>>> 74 { >>>> 75 raw_cpu_add_4(__preempt_count, val); >>>> 76 } >>>> 77 >>>> 78 static __always_inline void __preempt_count_sub(int val) >>>> 79 { > -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: 3.17-rc1 oops during network interface configuration [not found] ` <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2014-09-29 19:00 ` Eric Dumazet 2014-09-30 14:56 ` Chuck Lever 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2014-09-29 19:00 UTC (permalink / raw) To: Chuck Lever Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, Roland Dreier On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote: > Is there a bugzilla report filed for this issue? Has there been any progress > towards a fix? This is fixed in Linus tree. http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.17-rc1 oops during network interface configuration 2014-09-29 19:00 ` Eric Dumazet @ 2014-09-30 14:56 ` Chuck Lever 0 siblings, 0 replies; 12+ messages in thread From: Chuck Lever @ 2014-09-30 14:56 UTC (permalink / raw) To: Eric Dumazet Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin, Roland Dreier On Sep 29, 2014, at 3:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote: > >> Is there a bugzilla report filed for this issue? Has there been any progress >> towards a fix? > > This is fixed in Linus tree. > > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b > > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce Tested 3.17-rc7, fix confirmed. Many thanks, Eric! -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-09-30 14:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18 12:18 3.17-rc1 oops during network interface configuration Bart Van Assche
2014-08-19 14:56 ` Chuck Lever
[not found] ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-09 16:24 ` Steve Wise
2014-09-09 16:41 ` Chuck Lever
[not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
2014-08-20 10:31 ` Or Gerlitz
2014-09-09 19:30 ` Chuck Lever
[not found] ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-10 7:42 ` Or Gerlitz
2014-09-10 14:24 ` Eric Dumazet
[not found] ` <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-10 20:04 ` David Miller
2014-09-29 18:52 ` Chuck Lever
[not found] ` <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-29 19:00 ` Eric Dumazet
2014-09-30 14:56 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).