public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: 3.17-rc1 oops during network interface configuration
       [not found] ` <53F47917.8080003@mellanox.com>
@ 2014-09-09 19:30   ` Chuck Lever
  2014-09-10  7:42     ` Or Gerlitz
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2014-09-09 19:30 UTC (permalink / raw)
  To: netdev, LKML Kernel
  Cc: linux-rdma, Bart Van Assche, Or Gerlitz, Saeed Mahameed, Tal Alon,
	Yevgeny Petrilin, _govind


On Aug 20, 2014, at 6:31 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:

> On 18/08/2014 15:18, Bart Van Assche wrote:
>> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
>> following call trace is triggered during boot on a system on which kernel
>> 3.16 runs fine:
> 
> Yep, I see it on my systems too.
> 
> I narrowed this down a bit to happen only when the port link type (these nodes have ConnectX) is IB and IPoIB gets to load.
> 
> I reverted (below) all the IPoIB changes since 3.16 (except for the trivial commit c835a67) and the crash still exists.
> 
> I guess this needs to go through systematic bisection.

This crash happens when booting v3.17-rcN on any of my IB-enabled
systems. I have both ConnectX-2 and mthca systems, all are affected.

I bisected this to:

commit e0f31d8498676fda36289603a054d0d490aa2679
Author:     Govindarajulu Varadarajan <_govind@gmx.com>
AuthorDate: Mon Jun 23 16:07:58 2014 +0530
Commit:     David S. Miller <davem@davemloft.net>
CommitDate: Mon Jun 23 14:32:19 2014 -0700

    flow_keys: Record IP layer protocol in skb_flow_dissect()

    skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
    give any information about IPv4 or IPv6.
    This patch adds new member, n_proto, to struct flow_keys. Which records the
    IP layer type. i.e IPv4 or IPv6.
    This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
    Adding new member to flow_keys increases the struct size by around 4 bytes.
    This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
    qdisc_cb_private_validate()
    So increase data size by 4

    Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


This commit includes a hunk that increases the size of struct qdisc_skb_cb
by at least 4 bytes:

> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
>  
>  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)



IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:

> struct ipoib_cb {
>         struct qdisc_skb_cb     qdisc_cb;
>         u8                      hwaddr[INFINIBAND_ALEN];
> };

IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
52 bytes.

Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
field, which contains a pointer. By the time we get into
__dev_queue_xmit() and try to use the result of skb_dst(), that pointer
is garbage, and we oops.

Obviously, cb[] could be increased to 56 bytes to accommodate struct
ipoib_cb. I tried this, and it is effective in preventing the oops on
one of my systems.

But I suspect there is an historical reason I’m not aware of that it
has remained 48 bytes for years.


> Or.
> 
>> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
>> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
>> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
>> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
>> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
>> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
>> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
>> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
>> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
>> c835a67 net: set name_assign_type in alloc_netdev()
> 
> 
>> BUG: unable to handle kernel paging request at ffff88090000007e
>> IP: __dev_queue_xmit+0x519
>> Call Trace:
>> ? __dev_queue_xmit+0x49
>> dev_queue_xmit+0x10
>> neigh_connected_output
>> ? ip_finish_output
>> ip_finish_output
>> ? ip_finish_output
>> ? netif_rx_ni
>> ip_mc_output
>> ip_local_out_sk
>> ip_send_skb
>> udp_send_skb
>> udp_sendmsg
>> ? ip_reply_glue_bits
>> ? __lock_is_held
>> inet_sendmsg
>> ? inet_sendmsg
>> sock_sendmsg
>> ? might_fault
>> ? might_fault
>> ? move_addr_to_kernel.part.38
>> SYSC_sendto
>> ? sysret_check
>> ? trace_hardirqs_on_caller
>> ? trace_hardirqs_on_thunk
>> SyS_sendto
>> system_call_fastpath
>> 
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>> drm_kms_helper: panic occurred, switching back to text console
>> 
>> A screenshot of this kernel oops can be found here:
>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>> 
>> gdb translates the crash address into the following (not sure this makes sense
>> since offset 0x519 is past the end of __dev_queue_xmit()):
>> 
>> (gdb) list *(__dev_queue_xmit+0x519)
>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>> 5163    {
>> 5164            struct netdev_adjacent *iter;
>> 5165
>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>> 5168                                              &iter->dev->adj_list.lower);
>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>> 5170                                              &iter->dev->adj_list.lower);
>> 5171            }
>> 
>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>> 
>> (gdb) list *(__dev_queue_xmit+0x49)
>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>> 70       * The various preempt_count add/sub methods
>> 71       */
>> 72
>> 73      static __always_inline void __preempt_count_add(int val)
>> 74      {
>> 75              raw_cpu_add_4(__preempt_count, val);
>> 76      }
>> 77
>> 78      static __always_inline void __preempt_count_sub(int val)
>> 79      {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-09 19:30   ` 3.17-rc1 oops during network interface configuration Chuck Lever
@ 2014-09-10  7:42     ` Or Gerlitz
  2014-09-10 14:24       ` Eric Dumazet
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Or Gerlitz @ 2014-09-10  7:42 UTC (permalink / raw)
  To: Chuck Lever, Govindarajulu Varadarajan, David S. Miller
  Cc: netdev, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier


On 9/9/2014 10:30 PM, Chuck Lever wrote:
> This crash happens when booting v3.17-rcN on any of my IB-enabled
> systems. I have both ConnectX-2 and mthca systems, all are affected.
>
> I bisected this to:
>
> commit e0f31d8498676fda36289603a054d0d490aa2679
> Author:     Govindarajulu Varadarajan <_govind@gmx.com>
> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
> Commit:     David S. Miller <davem@davemloft.net>
> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>
>      flow_keys: Record IP layer protocol in skb_flow_dissect()
>
>      skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>      give any information about IPv4 or IPv6.
>      This patch adds new member, n_proto, to struct flow_keys. Which records the
>      IP layer type. i.e IPv4 or IPv6.
>      This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>      Adding new member to flow_keys increases the struct size by around 4 bytes.
>      This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>      qdisc_cb_private_validate()
>      So increase data size by 4
>
>      Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> This commit includes a hunk that increases the size of struct qdisc_skb_cb
> by at least 4 bytes:
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>          unsigned int            pkt_len;
>>          u16                     slave_dev_queue_mapping;
>>          u16                     _pad;
>> -       unsigned char           data[20];
>> +       unsigned char           data[24];
>>   };
>>   
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>
>
> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>
>> struct ipoib_cb {
>>          struct qdisc_skb_cb     qdisc_cb;
>>          u8                      hwaddr[INFINIBAND_ALEN];
>> };
> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
> 52 bytes.
>
> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
> field, which contains a pointer. By the time we get into
> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
> is garbage, and we oops.
>
> Obviously, cb[] could be increased to 56 bytes to accommodate struct
> ipoib_cb. I tried this, and it is effective in preventing the oops on
> one of my systems.
>
> But I suspect there is an historical reason I’m not aware of that it
> has remained 48 bytes for years.

Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?

Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?

Or.

>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>> IP: __dev_queue_xmit+0x519
>>> Call Trace:
>>> ? __dev_queue_xmit+0x49
>>> dev_queue_xmit+0x10
>>> neigh_connected_output
>>> ? ip_finish_output
>>> ip_finish_output
>>> ? ip_finish_output
>>> ? netif_rx_ni
>>> ip_mc_output
>>> ip_local_out_sk
>>> ip_send_skb
>>> udp_send_skb
>>> udp_sendmsg
>>> ? ip_reply_glue_bits
>>> ? __lock_is_held
>>> inet_sendmsg
>>> ? inet_sendmsg
>>> sock_sendmsg
>>> ? might_fault
>>> ? might_fault
>>> ? move_addr_to_kernel.part.38
>>> SYSC_sendto
>>> ? sysret_check
>>> ? trace_hardirqs_on_caller
>>> ? trace_hardirqs_on_thunk
>>> SyS_sendto
>>> system_call_fastpath
>>>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>> drm_kms_helper: panic occurred, switching back to text console
>>>
>>> A screenshot of this kernel oops can be found here:
>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>
>>> gdb translates the crash address into the following (not sure this makes sense
>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>
>>> (gdb) list *(__dev_queue_xmit+0x519)
>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>> 5163    {
>>> 5164            struct netdev_adjacent *iter;
>>> 5165
>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>> 5168                                              &iter->dev->adj_list.lower);
>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>> 5170                                              &iter->dev->adj_list.lower);
>>> 5171            }
>>>
>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>
>>> (gdb) list *(__dev_queue_xmit+0x49)
>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>> 70       * The various preempt_count add/sub methods
>>> 71       */
>>> 72
>>> 73      static __always_inline void __preempt_count_add(int val)
>>> 74      {
>>> 75              raw_cpu_add_4(__preempt_count, val);
>>> 76      }
>>> 77
>>> 78      static __always_inline void __preempt_count_sub(int val)
>>> 79      {


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42     ` Or Gerlitz
@ 2014-09-10 14:24       ` Eric Dumazet
  2014-09-10 20:04       ` David Miller
  2014-09-29 18:52       ` Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2014-09-10 14:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Chuck Lever, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier

On Wed, 2014-09-10 at 10:42 +0300, Or Gerlitz wrote:

> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
> commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
> to stash LL addresses" we are using the skb->cb field to enable proper 
> work under GRO and avoid another historical quirk we had there... so I 
> think we can definetly consider commit e0f31d849 to introduce a severe 
> regression... Govindarajulu, Dave - what's your thinking here? any quick 
> idea on how to fix?

I mentioned the IB stuff when patch was posted, and David replied :

"I think this is fine, IPOIB's control block will need still just 44
bytes after these changes, so there will still be 4 bytes to spare."

http://patchwork.ozlabs.org/patch/357584/

My suggestion was to reduce sch_choke usage and not store the whole
flow_keys.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42     ` Or Gerlitz
  2014-09-10 14:24       ` Eric Dumazet
@ 2014-09-10 20:04       ` David Miller
  2014-09-29 18:52       ` Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2014-09-10 20:04 UTC (permalink / raw)
  To: ogerlitz
  Cc: chuck.lever, _govind, netdev, linux-kernel, linux-rdma,
	bvanassche, saeedm, talal, yevgenyp, roland

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 10 Sep 2014 10:42:41 +0300

> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2
> commit 936d7de "IPoIB: Stop lying about hard_header_len and use
> skb->cb to stash LL addresses" we are using the skb->cb field to
> enable proper work under GRO and avoid another historical quirk we had
> there... so I think we can definetly consider commit e0f31d849 to
> introduce a severe regression... Govindarajulu, Dave - what's your
> thinking here? any quick idea on how to fix?

Eric mentioned that we could reduce the amount of flow state stored
in the qdisc cb in order to handle this better.

Making skb->cb[] larger is basically out of the question as far as
I'm concerned.

> Also, I was thinking we have the mechanics in the kernel, e.g commit
> a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to
> catch such over-flows?

Yes we should have added a build-time check so that we would discover
this issue more quickly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42     ` Or Gerlitz
  2014-09-10 14:24       ` Eric Dumazet
  2014-09-10 20:04       ` David Miller
@ 2014-09-29 18:52       ` Chuck Lever
  2014-09-29 19:00         ` Eric Dumazet
  2 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2014-09-29 18:52 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Govindarajulu Varadarajan, David S. Miller, netdev, LKML Kernel,
	linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon,
	Yevgeny Petrilin, Roland Dreier


On Sep 10, 2014, at 3:42 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:

> 
> On 9/9/2014 10:30 PM, Chuck Lever wrote:
>> This crash happens when booting v3.17-rcN on any of my IB-enabled
>> systems. I have both ConnectX-2 and mthca systems, all are affected.
>> 
>> I bisected this to:
>> 
>> commit e0f31d8498676fda36289603a054d0d490aa2679
>> Author:     Govindarajulu Varadarajan <_govind@gmx.com>
>> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
>> Commit:     David S. Miller <davem@davemloft.net>
>> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>> 
>>     flow_keys: Record IP layer protocol in skb_flow_dissect()
>> 
>>     skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>>     give any information about IPv4 or IPv6.
>>     This patch adds new member, n_proto, to struct flow_keys. Which records the
>>     IP layer type. i.e IPv4 or IPv6.
>>     This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>>     Adding new member to flow_keys increases the struct size by around 4 bytes.
>>     This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>>     qdisc_cb_private_validate()
>>     So increase data size by 4
>> 
>>     Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>> 
>> 
>> This commit includes a hunk that increases the size of struct qdisc_skb_cb
>> by at least 4 bytes:
>> 
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 624f985..a3cfb8e 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>>         unsigned int            pkt_len;
>>>         u16                     slave_dev_queue_mapping;
>>>         u16                     _pad;
>>> -       unsigned char           data[20];
>>> +       unsigned char           data[24];
>>>  };
>>>    static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>> 
>> 
>> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>> 
>>> struct ipoib_cb {
>>>         struct qdisc_skb_cb     qdisc_cb;
>>>         u8                      hwaddr[INFINIBAND_ALEN];
>>> };
>> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
>> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
>> 52 bytes.
>> 
>> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
>> field, which contains a pointer. By the time we get into
>> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
>> is garbage, and we oops.
>> 
>> Obviously, cb[] could be increased to 56 bytes to accommodate struct
>> ipoib_cb. I tried this, and it is effective in preventing the oops on
>> one of my systems.
>> 
>> But I suspect there is an historical reason I’m not aware of that it
>> has remained 48 bytes for years.
> 
> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses" we are using the skb->cb field to enable proper work under GRO and avoid another historical quirk we had there... so I think we can definetly consider commit e0f31d849 to introduce a severe regression... Govindarajulu, Dave - what's your thinking here? any quick idea on how to fix?

Hi Or-

Is there a bugzilla report filed for this issue? Has there been any progress
towards a fix?


> Also, I was thinking we have the mechanics in the kernel, e.g commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch such over-flows?
> 
> Or.
> 
>>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>>> IP: __dev_queue_xmit+0x519
>>>> Call Trace:
>>>> ? __dev_queue_xmit+0x49
>>>> dev_queue_xmit+0x10
>>>> neigh_connected_output
>>>> ? ip_finish_output
>>>> ip_finish_output
>>>> ? ip_finish_output
>>>> ? netif_rx_ni
>>>> ip_mc_output
>>>> ip_local_out_sk
>>>> ip_send_skb
>>>> udp_send_skb
>>>> udp_sendmsg
>>>> ? ip_reply_glue_bits
>>>> ? __lock_is_held
>>>> inet_sendmsg
>>>> ? inet_sendmsg
>>>> sock_sendmsg
>>>> ? might_fault
>>>> ? might_fault
>>>> ? move_addr_to_kernel.part.38
>>>> SYSC_sendto
>>>> ? sysret_check
>>>> ? trace_hardirqs_on_caller
>>>> ? trace_hardirqs_on_thunk
>>>> SyS_sendto
>>>> system_call_fastpath
>>>> 
>>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>>> drm_kms_helper: panic occurred, switching back to text console
>>>> 
>>>> A screenshot of this kernel oops can be found here:
>>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>> 
>>>> gdb translates the crash address into the following (not sure this makes sense
>>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x519)
>>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>>> 5163    {
>>>> 5164            struct netdev_adjacent *iter;
>>>> 5165
>>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>>> 5168                                              &iter->dev->adj_list.lower);
>>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>>> 5170                                              &iter->dev->adj_list.lower);
>>>> 5171            }
>>>> 
>>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x49)
>>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>>> 70       * The various preempt_count add/sub methods
>>>> 71       */
>>>> 72
>>>> 73      static __always_inline void __preempt_count_add(int val)
>>>> 74      {
>>>> 75              raw_cpu_add_4(__preempt_count, val);
>>>> 76      }
>>>> 77
>>>> 78      static __always_inline void __preempt_count_sub(int val)
>>>> 79      {
> 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-29 18:52       ` Chuck Lever
@ 2014-09-29 19:00         ` Eric Dumazet
  2014-09-30 14:56           ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-09-29 19:00 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier

On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote:

> Is there a bugzilla report filed for this issue? Has there been any progress
> towards a fix?

This is fixed in Linus tree.

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-29 19:00         ` Eric Dumazet
@ 2014-09-30 14:56           ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2014-09-30 14:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier


On Sep 29, 2014, at 3:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote:
> 
>> Is there a bugzilla report filed for this issue? Has there been any progress
>> towards a fix?
> 
> This is fixed in Linus tree.
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce

Tested 3.17-rc7, fix confirmed. Many thanks, Eric!

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-30 14:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <53F1EF18.7010909@acm.org>
     [not found] ` <53F47917.8080003@mellanox.com>
2014-09-09 19:30   ` 3.17-rc1 oops during network interface configuration Chuck Lever
2014-09-10  7:42     ` Or Gerlitz
2014-09-10 14:24       ` Eric Dumazet
2014-09-10 20:04       ` David Miller
2014-09-29 18:52       ` Chuck Lever
2014-09-29 19:00         ` Eric Dumazet
2014-09-30 14:56           ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox