Netdev List
 help / color / mirror / Atom feed
* Re: bonding and SR-IOV -- do we need arp_validation for loadbalancing too?
From: Jay Vosburgh @ 2012-07-24 20:49 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jiri Pirko, netdev, andy
In-Reply-To: <500F032D.3070104@genband.com>

Chris Friesen <chris.friesen@genband.com> wrote:

>On 07/24/2012 12:13 PM, Jay Vosburgh wrote:
>> Jiri Pirko<jiri@resnulli.us>  wrote:
>>
>>> Tue, Jul 24, 2012 at 05:57:03PM CEST, chris.friesen@genband.com wrote:
>>>> Hi all,
>>>>
>>>> We've been starting to look at bonding VFs from separate physical
>>>> devices in a guest, but we've run into a problem.
>>>>
>>>> The host is bonding the corresponding PFs, and it uses arp
>>>> monitoring.  What we have found is that any broadcast traffic from
>>>> the guest (if they enable arp monitoring, for example) will be seen
>>>> by the internal L2 switch of the NIC and sent up into the host, where
>>>> the bonding driver will count it as incoming packets and use it to
>>>> mark the link as good.
>>>>
>>>> The only solutions I've been able to come up with are:
>>>> 1) add arp validation for load balancing modes as well as active-backup.
>>> This is my favourite.... No reason to not to turn arp validation on.
>>> TEAM device (teamd arpping linkwatch) does arp or NSNA validation
>>> always.
>> 	How does that operate for a load balancing mode?
>>
>> 	For arp validate to function (as it's implemented in bonding),
>> the arp requests (broadcasts) or the arp replies (unicasts) must be seen
>> by each slave at regular intervals.  Most load balance systems
>> (etherchannel or 802.3ad, for example) don't flood the broadcast
>> requests to all members of a channel group, and the unicast replies only
>> go to one member.
>>
>> 	This generally results in either only one slave staying up, or
>> slaves going up and down at odd intervals.  The arp monitor for the load
>> balance modes is already dependent upon there being a steady stream of
>> traffic to all slaves, and can be unreliable in low traffic conditions
>> (because not all slaves receive traffic with sufficient frequency).
>
>In loadbalance mode wouldn't it just work similar to active-backup?  If
>it's a reply then verify that it came from the arp target, if it's a
>request then check to see if it came from one of the other slaves.

	The problem isn't verifying the requests or replies, it's that
the ARP packets are not distributed across all slaves (because the
switch ports are in a channel group / aggregator), so some slaves do not
receive any ARPs.

	The bond sends the ARP request as a broadcast.  For
active-backup, this ends up at the inactive slaves because the switch
sends the broadcast to all ports.  For a loadbalance mode, the switch
won't send the broadcast ARP to the other slaves, because all the slaves
are in a channel group or lacp aggregator, which is treated by the
switch as effectively a single switch port for this case.

	Similarly, the ARP replies are unicast, and the switch will send
those unicast replies to only one member of the channel group or
aggregator.  The choice there is usually a hash of some kind, so
generally only one slave will receive the replies.

>In our case we have control over the L2 switches involved so we ensure
>that the broadcast arp request is sent to all the other slaves, while the
>reply comes back to the sender.  I think we still have a window where you
>could have a device with a faulty tx but functional rx and never detect
>the problem in the monitor.

	You can set up -xor or -rr mode against a switch without setting
up a channel group on the switch, but that has the down side that any
incoming broadcast or multicast packet may be received multiple times
(one copy per slave).  Some switches will also disable ports (due to MAC
flapping) or complain about seeing the same MAC address on multiple
ports for this case.  This also will not load balance incoming traffic
to the bond very well.

>On 07/24/2012 02:18 PM, Chris Friesen wrote:
>> A more general solution might be to have the device driver also track
>> the time of the last incoming packet that came from the external network
>> (rather than a VF) and having the bond driver ignore those packets for
>> the purpose of link health.  Doing this efficiently would likely require
>> some kind of hardware support though--as an example the 82599 seems to
>> support this with the "LB" bit in the rx descriptor.
>
>That should of course be reversed.  We want the bond driver to only use
>the packets from the external network for the purpose of link health.
>
>Does anyone other than bonding actually care about dev->last_rx?  If not
>then we could just change the drivers to only set it for external packets.

	I believe bonding is the main user of last_rx (a search shows a
couple of drivers using it internally).  For bonding use, in current
mainline last_rx is set by bonding itself, not in the network device
driver.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [net-next PATCH 1/1] bnx2x: Correct EEE statistics gathering
From: David Miller @ 2012-07-24 20:56 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, eilong
In-Reply-To: <1343114166-30834-1-git-send-email-yuvalmin@broadcom.com>

From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Tue, 24 Jul 2012 10:16:06 +0300

> In boards with 4-ports, Tx LPI statistics were gathered incorrectly.
> This patch guarantees that each pmf will only query its own port for
> these statistics.
> 
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: early_demux fixes
From: David Miller @ 2012-07-24 20:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1343128771.2626.11059.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 Jul 2012 13:19:31 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> 1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux()
>    and fix a potential bug if skb->head was reallocated
>    (iph & th pointers were not reloaded)
> 
> TCP stack will pull/check headers anyway.
> 
> 2) must reload iph in ip_rcv_finish() after early_demux()
>  call since skb->head might have changed.
> 
> 3) skb->dev->ifindex can be now replaced by skb->skb_iif
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [patch net-next] team: init error value to 0 in team_netpoll_setup()
From: David Miller @ 2012-07-24 20:56 UTC (permalink / raw)
  To: jiri; +Cc: netdev, edumazet
In-Reply-To: <1343128848-1284-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 24 Jul 2012 13:20:48 +0200

> This will ensure correct value is returned in case the port list is empty.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Applied.

^ permalink raw reply

* Re: [PATCH] caif: fix NULL pointer check
From: David Miller @ 2012-07-24 20:57 UTC (permalink / raw)
  To: alan; +Cc: sjur.brandeland, netdev
In-Reply-To: <20120724124207.4444.49789.stgit@localhost.localdomain>

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Tue, 24 Jul 2012 13:42:14 +0100

> From: Alan Cox <alan@linux.intel.com>
> 
> Reported-by: <rucsoftsec@gmail.com>
> Resolves-bug: http://bugzilla.kernel.org/show_bug?44441
> Signed-off-by: Alan Cox <alan@linux.intel.com>

Applied and queued up for -stable.

> +	if (dev == NULL)

I adjusted this to be "if (!dev)"

^ permalink raw reply

* Re: [PATCH] wanmain: comparing array with NULL
From: David Miller @ 2012-07-24 20:57 UTC (permalink / raw)
  To: alan; +Cc: netdev
In-Reply-To: <20120724181622.27921.80598.stgit@localhost.localdomain>

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Tue, 24 Jul 2012 19:16:25 +0100

> From: Alan Cox <alan@linux.intel.com>
> 
> gcc really should warn about these !
> 
> Signed-off-by: Alan Cox <alan@linux.intel.com>

Applied and queued up for -stable.

> +		printk(KERN_INFO "%s: registering interface %s...\n",
>  				wanrouter_modname, dev->name);

I adjusted the indentation of the second line, as needed.

^ permalink raw reply

* Re: [PATCH] cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
From: David Miller @ 2012-07-24 20:57 UTC (permalink / raw)
  To: dcbw-H+wXaHxf7aLQT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw
In-Reply-To: <1343155402.29196.7.camel-wKZy7rqYPVb5EHUCmHmTqw@public.gmane.org>

From: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Tue, 24 Jul 2012 13:43:22 -0500

> Tag Ericsson NCM devices as WWAN modems, since they almost certainly all
> are.  This way userspace clients know that the device requires further
> setup on the AT-capable serial ports before connectivity is available.
> 
> Signed-off-by: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RESEND net-next V2] IB/ipoib: break linkage to neighbouring system
From: David Miller @ 2012-07-24 21:00 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, netdev, eric.dumazet, cl, shlomop
In-Reply-To: <1343150651-12568-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Tue, 24 Jul 2012 20:24:11 +0300

> Dave Miller <davem@davemloft.net> provided a detailed description of why the
> way IPoIB is using neighbours for its own ipoib_neigh struct is buggy:
 ...
> Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Acked-by: David S. Miller <davem@davemloft.net>

Looks good, Roland you got this?

^ permalink raw reply

* net-next closed
From: David Miller @ 2012-07-24 21:02 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, linux-wireless


Linus has pulled all of the pending merge window work from net-next
into his tree.

Therefore, net-next is closed until after the end of the merge
window.

'net' is open for bug fixes

^ permalink raw reply

* Re: [PATCH] mlx4: Add support for EEH error recovery
From: David Miller @ 2012-07-24 21:03 UTC (permalink / raw)
  To: klebers; +Cc: netdev, jackm, yevgenyp, ogerlitz, cascardo, brking
In-Reply-To: <1342814143-5744-1-git-send-email-klebers@linux.vnet.ibm.com>

From: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Date: Fri, 20 Jul 2012 16:55:43 -0300

> Currently the mlx4 drivers don't have the necessary callbacks to
> implement EEH errors detection and recovery, so the PCI layer uses the
> probe and remove callbacks to try to recover the device after an error on
> the bus. However, these callbacks have race conditions with the internal
> catastrophic error recovery functions, which will also detect the error
> and this can cause the system to crash if both EEH and catas functions
> try to reset the device.
> 
> This patch adds the necessary error recovery callbacks and makes sure
> that the internal catastrophic error functions will not try to reset the
> device in such scenarios. It also adds some calls to
> pci_channel_offline() to suppress reads/writes on the bus when the slot
> cannot accept I/O operations so we prevent unnecessary accesses to the
> bus and speed up the device removal.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>

Or, you promised an ACK today, I still haven't seen it.

There is no reason Kleber should be penalized and have his work
miss the merge window just because you guys can't be bothered
to approve this patch in a reasonable amount of time.

Therefore I'm just going to apply it later today, and don't do this
with someone's submission ever again, it impedes progress and
frustrates contributors.

^ permalink raw reply

* Re: bonding and SR-IOV -- do we need arp_validation for loadbalancing too?
From: Nicolas de Pesloüan @ 2012-07-24 21:15 UTC (permalink / raw)
  To: Jay Vosburgh, Jiri Pirko; +Cc: Chris Friesen, netdev, andy
In-Reply-To: <24104.1343162975@death.nxdomain>

Le 24/07/2012 22:49, Jay Vosburgh a écrit :
[...]
>> In loadbalance mode wouldn't it just work similar to active-backup?  If
>> it's a reply then verify that it came from the arp target, if it's a
>> request then check to see if it came from one of the other slaves.
>
> 	The problem isn't verifying the requests or replies, it's that
> the ARP packets are not distributed across all slaves (because the
> switch ports are in a channel group / aggregator), so some slaves do not
> receive any ARPs.
>
> 	The bond sends the ARP request as a broadcast.  For
> active-backup, this ends up at the inactive slaves because the switch
> sends the broadcast to all ports.  For a loadbalance mode, the switch
> won't send the broadcast ARP to the other slaves, because all the slaves
> are in a channel group or lacp aggregator, which is treated by the
> switch as effectively a single switch port for this case.
>
> 	Similarly, the ARP replies are unicast, and the switch will send
> those unicast replies to only one member of the channel group or
> aggregator.  The choice there is usually a hash of some kind, so
> generally only one slave will receive the replies.

I assume team should suffer the exact same problem, because most of this is on the switch side and 
out of the control of the host. Jiri, can you confirm?

[...]

> 	I believe bonding is the main user of last_rx (a search shows a
> couple of drivers using it internally).  For bonding use, in current
> mainline last_rx is set by bonding itself, not in the network device
> driver.

If last_rx is set and used internally by bonding and mostly unused elsewhere, can't we remove it 
from net_device and move it into private data for the slaves in bonding?

A comment in netdevice.h even recommends not to set it into drivers:

         unsigned long           last_rx;        /* Time of last Rx
                                                  * This should not be set in
                                                  * drivers, unless really needed,
                                                  * because network stack (bonding)
                                                  * use it if/when necessary, to
                                                  * avoid dirtying this cache line.
                                                  */

	Nicolas.

^ permalink raw reply

* Re: bonding and SR-IOV -- do we need arp_validation for loadbalancing too?
From: Chris Friesen @ 2012-07-24 21:38 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Jiri Pirko, netdev, andy
In-Reply-To: <24104.1343162975@death.nxdomain>

On 07/24/2012 02:49 PM, Jay Vosburgh wrote:
> Chris Friesen<chris.friesen@genband.com>  wrote:
>
>> In loadbalance mode wouldn't it just work similar to active-backup?  If
>> it's a reply then verify that it came from the arp target, if it's a
>> request then check to see if it came from one of the other slaves.
>
> 	The problem isn't verifying the requests or replies, it's that
> the ARP packets are not distributed across all slaves (because the
> switch ports are in a channel group / aggregator), so some slaves do not
> receive any ARPs.

Yeah, okay.  And if we turn on arp validation then we ignore all the 
other packets and so they looks dead.  Got it.

In our environment (ATCA shelf) the switches have been customized to 
handle some of this stuff so arpmon does work reliably with xor.

In the general case it sounds like the "PF bonding ignores packets from 
VFs" is a better bet then.


>> On 07/24/2012 02:18 PM, Chris Friesen wrote:
>>> A more general solution might be to have the device driver also track
>>> the time of the last incoming packet that came from the external network
>>> (rather than a VF) and having the bond driver ignore those packets for
>>> the purpose of link health.  Doing this efficiently would likely require
>>> some kind of hardware support though--as an example the 82599 seems to
>>> support this with the "LB" bit in the rx descriptor.
>>
>> That should of course be reversed.  We want the bond driver to only use
>> the packets from the external network for the purpose of link health.
>>
>> Does anyone other than bonding actually care about dev->last_rx?  If not
>> then we could just change the drivers to only set it for external packets.
> That should of course be reversed.  We want the bond driver to only use
>> the packets from the external network for the purpose of link health.
>>
> 	I believe bonding is the main user of last_rx (a search shows a
> couple of drivers using it internally).  For bonding use, in current
> mainline last_rx is set by bonding itself, not in the network device
> driver.

Right, I was looking at older code.  In that case presumably the driver 
could set an skb flag (external vs VF loopback) that the bonding code 
could check?

Chris

^ permalink raw reply

* Re: [PATCH 2/3] net: Make skb->skb_iif always track skb->dev
From: Nicolas de Pesloüan @ 2012-07-24 21:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ja, Jiri Pirko
In-Reply-To: <20120723.164542.1890653324956444632.davem@davemloft.net>

Le 24/07/2012 01:45, David Miller a écrit :
>
> Make it follow device decapsulation, from things such as VLAN and
> bonding.
>
> The stuff that actually cares about pre-demuxed device pointers, is
> handled by the "orig_dev" variable in __netif_receive_skb().  And
> the only consumer of that is the po->origdev feature of AF_PACKET
> sockets.
>
> Signed-off-by: David S. Miller<davem@davemloft.net>

Jiri tried to remove this orig_dev usage in af_packet in march 2011, without success, by using the 
value of skb_iif instead :-)

In case my opinion might be relevant:

Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

	Nicolas.

^ permalink raw reply

* Crash in e1000e, 3.3.8+ (tainted)
From: Ben Greear @ 2012-07-24 21:46 UTC (permalink / raw)
  To: e1000-devel list, netdev

We have a somewhat reproducible crash using a 6-port NIC
with 3.3.8+ kernel.  This kernel is tainted with a proprietary
module, but the module is not in use.

The rx-all and related patches that were later accepted
upstream have been applied to this kernel.

It seems that buffer_info is NULL in the code below?


(gdb) list e1000_alloc_rx_buffers+0x5b
Junk at end of line specification.
(gdb) list *(e1000_alloc_rx_buffers+0x5b)
0x15822 is in e1000_alloc_rx_buffers (/home/greearb/git/linux-3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:611).
606	
607		i = rx_ring->next_to_use;
608		buffer_info = &rx_ring->buffer_info[i];
609	
610		while (cleaned_count--) {
611			skb = buffer_info->skb;
612			if (skb) {
613				skb_trim(skb, 0);
614				goto map_skb;
615			}
(gdb)



ADDRCONF(NETDEV_UP): rddVR1-p: link is not ready
ADDRCONF(NETDEV_UP): eth16: link is not ready
8021q: adding VLAN 0 to HW filter on device eth16
e1000e: eth17 NIC Link is Down
e1000e 0000:04:00.1: eth17: Reset adapter
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux-3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:3937 e1000_close+0x38/0x134 [e1000e]()
Hardware name: To be filled by O.E.M.
Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek 
snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd mei(C) microcode 
cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm i2c_algo_bit i2c_core 
video [last unloaded: scsi_wait_scan]
Pid: 2360, comm: ip Tainted: P         C O 3.3.8+ #51
Call Trace:
  [<ffffffff81055bd1>] warn_slowpath_common+0x80/0x98
  [<ffffffff81055bfe>] warn_slowpath_null+0x15/0x17
  [<ffffffffa0199f49>] e1000_close+0x38/0x134 [e1000e]
  [<ffffffff8141239f>] __dev_close_many+0x88/0xb9
  [<ffffffff81412401>] __dev_close+0x31/0x42
  [<ffffffff8140fd39>] __dev_change_flags+0xb9/0x13c
  [<ffffffff81412d48>] dev_change_flags+0x1c/0x52
  [<ffffffff8141dfac>] do_setlink+0x2b8/0x7ca
  [<ffffffff8141cfd7>] ? rtnl_fill_ifinfo+0x9f1/0xab1
  [<ffffffff8141e7f3>] rtnl_newlink+0x266/0x4b7
  [<ffffffff8141e630>] ? rtnl_newlink+0xa3/0x4b7
  [<ffffffff8141db55>] ? rtnl_dump_ifinfo+0x134/0x15d
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
  [<ffffffff811d7328>] ? security_capable+0x13/0x15
  [<ffffffff8141d78b>] rtnetlink_rcv_msg+0x21e/0x23b
  [<ffffffff8141d56d>] ? rtnetlink_rcv+0x28/0x28
  [<ffffffff8142fbb6>] netlink_rcv_skb+0x3e/0x8f
  [<ffffffff8141d566>] rtnetlink_rcv+0x21/0x28
  [<ffffffff8142f991>] netlink_unicast+0xe9/0x152
  [<ffffffff814300ea>] netlink_sendmsg+0x1f8/0x216
  [<ffffffff813fed37>] __sock_sendmsg_nosec+0x5f/0x6a
  [<ffffffff813fed7f>] __sock_sendmsg+0x3d/0x48
  [<ffffffff813ff61f>] sock_sendmsg+0xa3/0xbc
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
  [<ffffffff814c623b>] ? _raw_spin_unlock+0x28/0x33
  [<ffffffff810e73ae>] ? do_wp_page+0x548/0x5af
  [<ffffffff813fe77d>] ? copy_from_user+0x9/0xb
  [<ffffffff813ff2c7>] ? move_addr_to_kernel+0x2b/0x65
  [<ffffffff814099b1>] ? copy_from_user+0x9/0xb
  [<ffffffff81409cfe>] ? verify_iovec+0x4f/0xa3
  [<ffffffff813ffd81>] __sys_sendmsg+0x20f/0x29c
  [<ffffffff810e8241>] ? handle_mm_fault+0x1ac/0x1c4
  [<ffffffff814c9195>] ? do_page_fault+0x2de/0x350
  [<ffffffff810ebdd3>] ? do_brk+0x2b8/0x31a
  [<ffffffff813fff6b>] sys_sendmsg+0x3d/0x5b
  [<ffffffff814cb0f9>] system_call_fastpath+0x16/0x1b
---[ end trace 059af067cdc81b69 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek 
snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd mei(C) microcode 
cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm i2c_algo_bit i2c_core 
video [last unloaded: scsi_wait_scan]

Pid: 140, comm: kworker/2:1 Tainted: P        WC O 3.3.8+ #51 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffffa019a7fe>]  [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
RSP: 0018:ffff88021e185cc0  EFLAGS: 00010206
RAX: ffff8802203ae090 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000000000d0 RSI: 00000000000000ff RDI: ffff88021e8a4800
RBP: ffff88021e185d20 R08: ffff88021e184000 R09: ffffffff81a8f658
R10: ffff88021e185be0 R11: ffff88021e185fd8 R12: ffff88021e8a4800
R13: 0000000000000000 R14: ffff88021dda2360 R15: 00000000000000ff
FS:  0000000000000000(0000) GS:ffff88022bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001a05000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task ffff88021fc0dd00)
Stack:
  0000000000000000 ffffffffa0194ea7 000000d01e185d00 ffff88021e8a4000
  000005f21dda2360 ffff8802203ae090 ffff88021e185d00 ffff88021e8a4800
  ffff88021dda2360 0000000000001000 0000000004008002 ffff88021dda2960
Call Trace:
  [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
  [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
  [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
  [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
  [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
  [<ffffffff81069df7>] ? schedule_work+0x13/0x15
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
  [<ffffffff8106837e>] process_one_work+0x1a6/0x278
  [<ffffffff8106a3d1>] worker_thread+0x136/0x255
  [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
  [<ffffffff8106da7d>] kthread+0x84/0x8c
  [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
  [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
  [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
Code: 00 00 89 45 c4 41 0f b7 5e 18 48 8b 87 a8 04 00 00 41 89 dd 48 05 90 00 00 00 4d 6b ed 28 4d 03 6e 20 48 89 45 c8 e9 ea 00 00 00 <49> 8b 45 08 48 85 c0 74 
14 48 89 c7 31 f6 48 89 45 a8 e8 76 b1
RIP  [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
  RSP <ffff88021e185cc0>
CR2: 0000000000000008
---[ end trace 059af067cdc81b6a ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff8106d618>] kthread_data+0xb/0x11
PGD 1a07067 PUD 1a08067 PMD 0
Oops: 0000 [#2] PREEMPT SMP
CPU 2
Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek 
snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd mei(C) microcode 
cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm i2c_algo_bit i2c_core 
video [last unloaded: scsi_wait_scan]

Pid: 140, comm: kworker/2:1 Tainted: P      D WC O 3.3.8+ #51 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff8106d618>]  [<ffffffff8106d618>] kthread_data+0xb/0x11
RSP: 0018:ffff88021e1858b8  EFLAGS: 00010092
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
RDX: ffffffff81bee730 RSI: 0000000000000002 RDI: ffff88021fc0dd00
RBP: ffff88021e1858b8 R08: 0000000000000400 R09: ffff88021fc0e0b8
R10: ffff88021e185978 R11: 0000000000000000 R12: ffff88021fc0e0b8
R13: ffff88021e1859b8 R14: 0000000000000002 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88022bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffffffffffffff8 CR3: 0000000001a05000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task ffff88021fc0dd00)
Stack:
  ffff88021e1858d8 ffffffff81069e8f ffff88021e1858d8 ffff88022bd12340
  ffff88021e185978 ffffffff814c5041 ffff88021e185918 0000000000000246
  ffff88021e184010 ffff88021fc0dd00 ffff88021e185fd8 0000000000012340
Call Trace:
  [<ffffffff81069e8f>] wq_worker_sleeping+0x10/0x8a
  [<ffffffff814c5041>] __schedule+0x17f/0x562
  [<ffffffff814c54c9>] schedule+0x55/0x57
  [<ffffffff81059b09>] do_exit+0x73e/0x742
  [<ffffffff814c73c7>] oops_end+0xba/0xc2
  [<ffffffff8102df05>] no_context+0x25a/0x269
  [<ffffffff8107cee0>] ? load_balance+0x98/0x6b0
  [<ffffffff8102e0db>] __bad_area_nosemaphore+0x1c7/0x1e7
  [<ffffffff8102e109>] bad_area_nosemaphore+0xe/0x10
  [<ffffffff814c902d>] do_page_fault+0x176/0x350
  [<ffffffff81009785>] ? __switch_to+0x1cd/0x37c
  [<ffffffff814c62bc>] ? _raw_spin_unlock_irq+0x2f/0x3a
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
  [<ffffffff814c6925>] page_fault+0x25/0x30
  [<ffffffffa019a7fe>] ? e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
  [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
  [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
  [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
  [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
  [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
  [<ffffffff81069df7>] ? schedule_work+0x13/0x15
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
  [<ffffffff8106837e>] process_one_work+0x1a6/0x278
  [<ffffffff8106a3d1>] worker_thread+0x136/0x255
  [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
  [<ffffffff8106da7d>] kthread+0x84/0x8c
  [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
  [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
  [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
Code: ea ff ff ff eb 9d 90 55 65 48 8b 04 25 00 c7 00 00 48 8b 80 60 03 00 00 48 89 e5 8b 40 f0 c9 c3 48 8b 87 60 03 00 00 55 48 89 e5 <48> 8b 40 f8 c9 c3 48 3b 
3d 7b 10 b8 00 55 48 89 e5 75 09 0f bf
RIP  [<ffffffff8106d618>] kthread_data+0xb/0x11
  RSP <ffff88021e1858b8>
CR2: fffffffffffffff8
---[ end trace 059af067cdc81b6b ]---
Fixing recursive fault but reboot is needed!




-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* RE: Crash in e1000e, 3.3.8+ (tainted)
From: Allan, Bruce W @ 2012-07-24 22:02 UTC (permalink / raw)
  To: Ben Greear, e1000-devel list, netdev
In-Reply-To: <500F17A0.30906@candelatech.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Ben Greear
> Sent: Tuesday, July 24, 2012 2:46 PM
> To: e1000-devel list; netdev
> Subject: Crash in e1000e, 3.3.8+ (tainted)
> 
> We have a somewhat reproducible crash using a 6-port NIC
> with 3.3.8+ kernel.  This kernel is tainted with a proprietary
> module, but the module is not in use.
> 
> The rx-all and related patches that were later accepted
> upstream have been applied to this kernel.
> 
> It seems that buffer_info is NULL in the code below?
> 
> 
> (gdb) list e1000_alloc_rx_buffers+0x5b
> Junk at end of line specification.
> (gdb) list *(e1000_alloc_rx_buffers+0x5b)
> 0x15822 is in e1000_alloc_rx_buffers (/home/greearb/git/linux-
> 3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:611).
> 606
> 607		i = rx_ring->next_to_use;
> 608		buffer_info = &rx_ring->buffer_info[i];
> 609
> 610		while (cleaned_count--) {
> 611			skb = buffer_info->skb;
> 612			if (skb) {
> 613				skb_trim(skb, 0);
> 614				goto map_skb;
> 615			}
> (gdb)
> 
> 
> 
> ADDRCONF(NETDEV_UP): rddVR1-p: link is not ready
> ADDRCONF(NETDEV_UP): eth16: link is not ready
> 8021q: adding VLAN 0 to HW filter on device eth16
> e1000e: eth17 NIC Link is Down
> e1000e 0000:04:00.1: eth17: Reset adapter
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux-
> 3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:3937
> e1000_close+0x38/0x134 [e1000e]()
> Hardware name: To be filled by O.E.M.
> Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
> sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput
> snd_hda_codec_realtek
> snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
> ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e
> snd mei(C) microcode
> cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc
> iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper
> drm i2c_algo_bit i2c_core
> video [last unloaded: scsi_wait_scan]
> Pid: 2360, comm: ip Tainted: P         C O 3.3.8+ #51
> Call Trace:
>   [<ffffffff81055bd1>] warn_slowpath_common+0x80/0x98
>   [<ffffffff81055bfe>] warn_slowpath_null+0x15/0x17
>   [<ffffffffa0199f49>] e1000_close+0x38/0x134 [e1000e]
>   [<ffffffff8141239f>] __dev_close_many+0x88/0xb9
>   [<ffffffff81412401>] __dev_close+0x31/0x42
>   [<ffffffff8140fd39>] __dev_change_flags+0xb9/0x13c
>   [<ffffffff81412d48>] dev_change_flags+0x1c/0x52
>   [<ffffffff8141dfac>] do_setlink+0x2b8/0x7ca
>   [<ffffffff8141cfd7>] ? rtnl_fill_ifinfo+0x9f1/0xab1
>   [<ffffffff8141e7f3>] rtnl_newlink+0x266/0x4b7
>   [<ffffffff8141e630>] ? rtnl_newlink+0xa3/0x4b7
>   [<ffffffff8141db55>] ? rtnl_dump_ifinfo+0x134/0x15d
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>   [<ffffffff811d7328>] ? security_capable+0x13/0x15
>   [<ffffffff8141d78b>] rtnetlink_rcv_msg+0x21e/0x23b
>   [<ffffffff8141d56d>] ? rtnetlink_rcv+0x28/0x28
>   [<ffffffff8142fbb6>] netlink_rcv_skb+0x3e/0x8f
>   [<ffffffff8141d566>] rtnetlink_rcv+0x21/0x28
>   [<ffffffff8142f991>] netlink_unicast+0xe9/0x152
>   [<ffffffff814300ea>] netlink_sendmsg+0x1f8/0x216
>   [<ffffffff813fed37>] __sock_sendmsg_nosec+0x5f/0x6a
>   [<ffffffff813fed7f>] __sock_sendmsg+0x3d/0x48
>   [<ffffffff813ff61f>] sock_sendmsg+0xa3/0xbc
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>   [<ffffffff814c623b>] ? _raw_spin_unlock+0x28/0x33
>   [<ffffffff810e73ae>] ? do_wp_page+0x548/0x5af
>   [<ffffffff813fe77d>] ? copy_from_user+0x9/0xb
>   [<ffffffff813ff2c7>] ? move_addr_to_kernel+0x2b/0x65
>   [<ffffffff814099b1>] ? copy_from_user+0x9/0xb
>   [<ffffffff81409cfe>] ? verify_iovec+0x4f/0xa3
>   [<ffffffff813ffd81>] __sys_sendmsg+0x20f/0x29c
>   [<ffffffff810e8241>] ? handle_mm_fault+0x1ac/0x1c4
>   [<ffffffff814c9195>] ? do_page_fault+0x2de/0x350
>   [<ffffffff810ebdd3>] ? do_brk+0x2b8/0x31a
>   [<ffffffff813fff6b>] sys_sendmsg+0x3d/0x5b
>   [<ffffffff814cb0f9>] system_call_fastpath+0x16/0x1b
> ---[ end trace 059af067cdc81b69 ]---
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000008
> IP: [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
> PGD 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU 2
> Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
> sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput
> snd_hda_codec_realtek
> snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
> ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e
> snd mei(C) microcode
> cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc
> iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper
> drm i2c_algo_bit i2c_core
> video [last unloaded: scsi_wait_scan]
> 
> Pid: 140, comm: kworker/2:1 Tainted: P        WC O 3.3.8+ #51 To be filled by
> O.E.M. To be filled by O.E.M./To be filled by O.E.M.
> RIP: 0010:[<ffffffffa019a7fe>]  [<ffffffffa019a7fe>]
> e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
> RSP: 0018:ffff88021e185cc0  EFLAGS: 00010206
> RAX: ffff8802203ae090 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 00000000000000d0 RSI: 00000000000000ff RDI: ffff88021e8a4800
> RBP: ffff88021e185d20 R08: ffff88021e184000 R09: ffffffff81a8f658
> R10: ffff88021e185be0 R11: ffff88021e185fd8 R12: ffff88021e8a4800
> R13: 0000000000000000 R14: ffff88021dda2360 R15: 00000000000000ff
> FS:  0000000000000000(0000) GS:ffff88022bd00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 0000000001a05000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task
> ffff88021fc0dd00)
> Stack:
>   0000000000000000 ffffffffa0194ea7 000000d01e185d00 ffff88021e8a4000
>   000005f21dda2360 ffff8802203ae090 ffff88021e185d00 ffff88021e8a4800
>   ffff88021dda2360 0000000000001000 0000000004008002 ffff88021dda2960
> Call Trace:
>   [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
>   [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
>   [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>   [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
>   [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
>   [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
>   [<ffffffff81069df7>] ? schedule_work+0x13/0x15
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>   [<ffffffff8106837e>] process_one_work+0x1a6/0x278
>   [<ffffffff8106a3d1>] worker_thread+0x136/0x255
>   [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
>   [<ffffffff8106da7d>] kthread+0x84/0x8c
>   [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
>   [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
>   [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
> Code: 00 00 89 45 c4 41 0f b7 5e 18 48 8b 87 a8 04 00 00 41 89 dd 48 05 90 00 00
> 00 4d 6b ed 28 4d 03 6e 20 48 89 45 c8 e9 ea 00 00 00 <49> 8b 45 08 48 85 c0 74
> 14 48 89 c7 31 f6 48 89 45 a8 e8 76 b1
> RIP  [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
>   RSP <ffff88021e185cc0>
> CR2: 0000000000000008
> ---[ end trace 059af067cdc81b6a ]---
> BUG: unable to handle kernel paging request at fffffffffffffff8
> IP: [<ffffffff8106d618>] kthread_data+0xb/0x11
> PGD 1a07067 PUD 1a08067 PMD 0
> Oops: 0000 [#2] PREEMPT SMP
> CPU 2
> Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
> sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput
> snd_hda_codec_realtek
> snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
> ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e
> snd mei(C) microcode
> cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc
> iTCO_wdt iTCO_vendor_support parport_pc parport i915 drm_kms_helper
> drm i2c_algo_bit i2c_core
> video [last unloaded: scsi_wait_scan]
> 
> Pid: 140, comm: kworker/2:1 Tainted: P      D WC O 3.3.8+ #51 To be filled by
> O.E.M. To be filled by O.E.M./To be filled by O.E.M.
> RIP: 0010:[<ffffffff8106d618>]  [<ffffffff8106d618>] kthread_data+0xb/0x11
> RSP: 0018:ffff88021e1858b8  EFLAGS: 00010092
> RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
> RDX: ffffffff81bee730 RSI: 0000000000000002 RDI: ffff88021fc0dd00
> RBP: ffff88021e1858b8 R08: 0000000000000400 R09: ffff88021fc0e0b8
> R10: ffff88021e185978 R11: 0000000000000000 R12: ffff88021fc0e0b8
> R13: ffff88021e1859b8 R14: 0000000000000002 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff88022bd00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: fffffffffffffff8 CR3: 0000000001a05000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task
> ffff88021fc0dd00)
> Stack:
>   ffff88021e1858d8 ffffffff81069e8f ffff88021e1858d8 ffff88022bd12340
>   ffff88021e185978 ffffffff814c5041 ffff88021e185918 0000000000000246
>   ffff88021e184010 ffff88021fc0dd00 ffff88021e185fd8 0000000000012340
> Call Trace:
>   [<ffffffff81069e8f>] wq_worker_sleeping+0x10/0x8a
>   [<ffffffff814c5041>] __schedule+0x17f/0x562
>   [<ffffffff814c54c9>] schedule+0x55/0x57
>   [<ffffffff81059b09>] do_exit+0x73e/0x742
>   [<ffffffff814c73c7>] oops_end+0xba/0xc2
>   [<ffffffff8102df05>] no_context+0x25a/0x269
>   [<ffffffff8107cee0>] ? load_balance+0x98/0x6b0
>   [<ffffffff8102e0db>] __bad_area_nosemaphore+0x1c7/0x1e7
>   [<ffffffff8102e109>] bad_area_nosemaphore+0xe/0x10
>   [<ffffffff814c902d>] do_page_fault+0x176/0x350
>   [<ffffffff81009785>] ? __switch_to+0x1cd/0x37c
>   [<ffffffff814c62bc>] ? _raw_spin_unlock_irq+0x2f/0x3a
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>   [<ffffffff814c6925>] page_fault+0x25/0x30
>   [<ffffffffa019a7fe>] ? e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
>   [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
>   [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
>   [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>   [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
>   [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
>   [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
>   [<ffffffff81069df7>] ? schedule_work+0x13/0x15
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>   [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>   [<ffffffff8106837e>] process_one_work+0x1a6/0x278
>   [<ffffffff8106a3d1>] worker_thread+0x136/0x255
>   [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
>   [<ffffffff8106da7d>] kthread+0x84/0x8c
>   [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
>   [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
>   [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
> Code: ea ff ff ff eb 9d 90 55 65 48 8b 04 25 00 c7 00 00 48 8b 80 60 03 00 00 48
> 89 e5 8b 40 f0 c9 c3 48 8b 87 60 03 00 00 55 48 89 e5 <48> 8b 40 f8 c9 c3 48 3b
> 3d 7b 10 b8 00 55 48 89 e5 75 09 0f bf
> RIP  [<ffffffff8106d618>] kthread_data+0xb/0x11
>   RSP <ffff88021e1858b8>
> CR2: fffffffffffffff8
> ---[ end trace 059af067cdc81b6b ]---
> Fixing recursive fault but reboot is needed!

I believe this has already been fixed in 3.4 via commit bb9e44d0.  Please try patching
your kernel with that and let us know so we can have it back-ported to stable.

Thanks,
Bruce.

^ permalink raw reply

* Re: [PATCH 2/2] ipv4: Change rt->rt_iif encoding.
From: Nicolas de Pesloüan @ 2012-07-24 22:13 UTC (permalink / raw)
  To: David Miller; +Cc: ja, netdev
In-Reply-To: <20120723.161446.36265037346365173.davem@davemloft.net>

Le 24/07/2012 01:14, David Miller a écrit :
[...]
> I wonder if we should just get rid of all of that orig_dev logic and
> simply update skb->skb_iif every time we hit the code starting at
> label "another_round"

It clearly depends on the exact meaning of orig_dev.

When we studied the usage of orig_dev before removing it from bonding, it was clear that two 
different meanings existed:

- From the bonding point of view, is was "the device one level below current device" (the slave, 
from the master's point of view).

- From the af_packet point of view, is was "the real original device that received the packet".

As bonding don't use orig_dev anymore, the remaining meaning should logically be "the real original 
device that received the packet". But as __netif_receive_skb() is recursively called in many cases, 
setting orig_dev to something new every time, this meaning is probably mostly inconsistent. As such, 
it sounds appropriate to remove orig_dev and use skb_iif instead.

	Nicolas.

^ permalink raw reply

* RE: [PATCH 10/17] Tools: hv: Gather ipv[4,6] gateway information
From: KY Srinivasan @ 2012-07-24 22:13 UTC (permalink / raw)
  To: Dan Williams, Stephen Hemminger
  Cc: Olaf Hering, gregkh@linuxfoundation.org,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	virtualization@lists.osdl.org, apw@canonical.com,
	netdev@vger.kernel.org, ben@decadent.org.uk
In-Reply-To: <1343155010.29196.1.camel@dcbw.foobar.com>



> -----Original Message-----
> From: Dan Williams [mailto:dcbw@redhat.com]
> Sent: Tuesday, July 24, 2012 2:37 PM
> To: Stephen Hemminger
> Cc: Olaf Hering; KY Srinivasan; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; apw@canonical.com; netdev@vger.kernel.org;
> ben@decadent.org.uk
> Subject: Re: [PATCH 10/17] Tools: hv: Gather ipv[4,6] gateway information
> 
> On Tue, 2012-07-24 at 09:56 -0700, Stephen Hemminger wrote:
> > On Tue, 24 Jul 2012 18:53:59 +0200
> > Olaf Hering <olaf@aepfle.de> wrote:
> >
> > > On Tue, Jul 24, Stephen Hemminger wrote:
> > >
> > > > On Tue, 24 Jul 2012 09:01:34 -0700
> > > > "K. Y. Srinivasan" <kys@microsoft.com> wrote:
> > > >
> > > > > +	memset(cmd, 0, sizeof(cmd));
> > > > > +	strcat(cmd, "/sbin/ip -f inet  route | grep -w ");
> > > > > +	strcat(cmd, if_name);
> > > > > +	strcat(cmd, " | awk '/default/ {print $3 }'");
> > > >
> > > >
> > > > Much simpler method:
> > > >
> > > > ip route show match 0/0
> > >
> > > This also has the benefit that ip is not called with absolute path, now
> > > that distros move binaries around.
> > >
> > > Olaf
> >
> > It is also not hard to do the same thing with a little function
> > using libmnl
> 
> Yeah seriously, netlink anyone?  You'll even get nicer error reporting
> that way.

While I will be the first admit that using C API is always better (in C code),
in this particular instance I am not so sure. All I am doing is retrieving information
on default gateways. If there is an error, that is ok and this won't be reported
back to the host. Using the ip command significantly simplifies the code here.

Regards,

K. Y  


^ permalink raw reply

* Re: [PATCH 2/2] ipv4: Change rt->rt_iif encoding.
From: David Miller @ 2012-07-24 22:18 UTC (permalink / raw)
  To: nicolas.2p.debian; +Cc: ja, netdev
In-Reply-To: <500F1E23.7090803@gmail.com>

From: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com>
Date: Wed, 25 Jul 2012 00:13:55 +0200

> - From the af_packet point of view, is was "the real original device
> - that received the packet".
> 
> As bonding don't use orig_dev anymore, the remaining meaning should
> logically be "the real original device that received the packet". But
> as __netif_receive_skb() is recursively called in many cases, setting
> orig_dev to something new every time, this meaning is probably mostly
> inconsistent. As such, it sounds appropriate to remove orig_dev and
> use skb_iif instead.

I don't think we can, otherwise people who set po->origdev will no
longer get what they expect.

For the simpler cases of bonding and VLANs, it does currently behaved
as expected.

That's why I left it alone.

^ permalink raw reply

* Re: [PATCH 2/2] ipv4: Change rt->rt_iif encoding.
From: Nicolas de Pesloüan @ 2012-07-24 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: ja, netdev
In-Reply-To: <20120724.151801.1576915988616906722.davem@davemloft.net>

Le 25/07/2012 00:18, David Miller a écrit :
> From: Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>
> Date: Wed, 25 Jul 2012 00:13:55 +0200
>
>> - From the af_packet point of view, is was "the real original device
>> - that received the packet".
>>
>> As bonding don't use orig_dev anymore, the remaining meaning should
>> logically be "the real original device that received the packet". But
>> as __netif_receive_skb() is recursively called in many cases, setting
>> orig_dev to something new every time, this meaning is probably mostly
>> inconsistent. As such, it sounds appropriate to remove orig_dev and
>> use skb_iif instead.
>
> I don't think we can, otherwise people who set po->origdev will no
> longer get what they expect.
>
> For the simpler cases of bonding and VLANs, it does currently behaved
> as expected.
>
> That's why I left it alone.

Do they get what they expect when stacking interfaces?

__netif_receive_skb starts with orig_dev = skb->dev. So when calling __netif_receive_skb 
recursively, after changing skb->dev, they get the packet several times, with a different orig_dev 
value?

Any way, both looks good to me.

	Nicolas.

^ permalink raw reply

* Re: [PATCH] mlx4: Add support for EEH error recovery
From: Or Gerlitz @ 2012-07-24 22:30 UTC (permalink / raw)
  To: David Miller
  Cc: klebers, netdev, jackm, yevgenyp, ogerlitz, cascardo, brking,
	Shlomo Pongratz
In-Reply-To: <20120724.140353.1432900101600410863.davem@davemloft.net>

On Wed, Jul 25, 2012 at 12:03 AM, David Miller <davem@davemloft.net> wrote:

> Or, you promised an ACK today, I still haven't seen it.

It turned out that reacted we did, but not the ACK way.

Again, code review wise, we intended to ack it, but Shlomo has set
testing environment, under which he had some issues with the patch, as
such he preferred not to ACK it but rather bring up the issues on the
list and sort them out 1st. I thought it would be wrong to over-rule
this preference of him, and this way is fair-enough with the author
and your guide-lines, maybe I had to be more aggressive with ACKing
this, as of the merge window closing coming. So tomorrow.

Or.

> There is no reason Kleber should be penalized and have his work
> miss the merge window just because you guys can't be bothered
> to approve this patch in a reasonable amount of time.
>
> Therefore I'm just going to apply it later today, and don't do this
> with someone's submission ever again, it impedes progress and
> frustrates contributors.

^ permalink raw reply

* RE: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
From: Dave, Tushar N @ 2012-07-24 23:13 UTC (permalink / raw)
  To: Ben Greear, e1000-devel list, netdev
In-Reply-To: <500F17A0.30906@candelatech.com>

>-----Original Message-----
>From: Ben Greear [mailto:greearb@candelatech.com]
>Sent: Tuesday, July 24, 2012 2:46 PM
>To: e1000-devel list; netdev
>Subject: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
>
>We have a somewhat reproducible crash using a 6-port NIC with 3.3.8+
>kernel.  This kernel is tainted with a proprietary module, but the module
>is not in use.
>
>The rx-all and related patches that were later accepted upstream have been
>applied to this kernel.
>
>It seems that buffer_info is NULL in the code below?
>
>
>(gdb) list e1000_alloc_rx_buffers+0x5b
>Junk at end of line specification.
>(gdb) list *(e1000_alloc_rx_buffers+0x5b)
>0x15822 is in e1000_alloc_rx_buffers (/home/greearb/git/linux-
>3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:611).
>606
>607		i = rx_ring->next_to_use;
>608		buffer_info = &rx_ring->buffer_info[i];
>609
>610		while (cleaned_count--) {
>611			skb = buffer_info->skb;
>612			if (skb) {
>613				skb_trim(skb, 0);
>614				goto map_skb;
>615			}
>(gdb)
>
>
Ben,

This looks familiar to me, I believe this is due to race between adapter reset and e1000_close.
Let me check if we have fix upstream or not.


-Tushar
>
>ADDRCONF(NETDEV_UP): rddVR1-p: link is not ready
>ADDRCONF(NETDEV_UP): eth16: link is not ready
>8021q: adding VLAN 0 to HW filter on device eth16
>e1000e: eth17 NIC Link is Down
>e1000e 0000:04:00.1: eth17: Reset adapter ------------[ cut here ]--------
>----
>WARNING: at /home/greearb/git/linux-
>3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:3937
>e1000_close+0x38/0x134 [e1000e]() Hardware name: To be filled by O.E.M.
>Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
>sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek
>snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
>ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd
>mei(C) microcode
>cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt
>iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm
>i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
>Pid: 2360, comm: ip Tainted: P         C O 3.3.8+ #51
>Call Trace:
>  [<ffffffff81055bd1>] warn_slowpath_common+0x80/0x98
>  [<ffffffff81055bfe>] warn_slowpath_null+0x15/0x17
>  [<ffffffffa0199f49>] e1000_close+0x38/0x134 [e1000e]
>  [<ffffffff8141239f>] __dev_close_many+0x88/0xb9
>  [<ffffffff81412401>] __dev_close+0x31/0x42
>  [<ffffffff8140fd39>] __dev_change_flags+0xb9/0x13c
>  [<ffffffff81412d48>] dev_change_flags+0x1c/0x52
>  [<ffffffff8141dfac>] do_setlink+0x2b8/0x7ca
>  [<ffffffff8141cfd7>] ? rtnl_fill_ifinfo+0x9f1/0xab1
>  [<ffffffff8141e7f3>] rtnl_newlink+0x266/0x4b7
>  [<ffffffff8141e630>] ? rtnl_newlink+0xa3/0x4b7
>  [<ffffffff8141db55>] ? rtnl_dump_ifinfo+0x134/0x15d
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>  [<ffffffff811d7328>] ? security_capable+0x13/0x15
>  [<ffffffff8141d78b>] rtnetlink_rcv_msg+0x21e/0x23b
>  [<ffffffff8141d56d>] ? rtnetlink_rcv+0x28/0x28
>  [<ffffffff8142fbb6>] netlink_rcv_skb+0x3e/0x8f
>  [<ffffffff8141d566>] rtnetlink_rcv+0x21/0x28
>  [<ffffffff8142f991>] netlink_unicast+0xe9/0x152
>  [<ffffffff814300ea>] netlink_sendmsg+0x1f8/0x216
>  [<ffffffff813fed37>] __sock_sendmsg_nosec+0x5f/0x6a
>  [<ffffffff813fed7f>] __sock_sendmsg+0x3d/0x48
>  [<ffffffff813ff61f>] sock_sendmsg+0xa3/0xbc
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>  [<ffffffff814c623b>] ? _raw_spin_unlock+0x28/0x33
>  [<ffffffff810e73ae>] ? do_wp_page+0x548/0x5af
>  [<ffffffff813fe77d>] ? copy_from_user+0x9/0xb
>  [<ffffffff813ff2c7>] ? move_addr_to_kernel+0x2b/0x65
>  [<ffffffff814099b1>] ? copy_from_user+0x9/0xb
>  [<ffffffff81409cfe>] ? verify_iovec+0x4f/0xa3
>  [<ffffffff813ffd81>] __sys_sendmsg+0x20f/0x29c
>  [<ffffffff810e8241>] ? handle_mm_fault+0x1ac/0x1c4
>  [<ffffffff814c9195>] ? do_page_fault+0x2de/0x350
>  [<ffffffff810ebdd3>] ? do_brk+0x2b8/0x31a
>  [<ffffffff813fff6b>] sys_sendmsg+0x3d/0x5b
>  [<ffffffff814cb0f9>] system_call_fastpath+0x16/0x1b ---[ end trace
>059af067cdc81b69 ]---
>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>IP: [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e] PGD 0
>Oops: 0000 [#1] PREEMPT SMP
>CPU 2
>Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
>sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek
>snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
>ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd
>mei(C) microcode
>cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt
>iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm
>i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
>
>Pid: 140, comm: kworker/2:1 Tainted: P        WC O 3.3.8+ #51 To be filled
>by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
>RIP: 0010:[<ffffffffa019a7fe>]  [<ffffffffa019a7fe>]
>e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
>RSP: 0018:ffff88021e185cc0  EFLAGS: 00010206
>RAX: ffff8802203ae090 RBX: 0000000000000000 RCX: 0000000000000000
>RDX: 00000000000000d0 RSI: 00000000000000ff RDI: ffff88021e8a4800
>RBP: ffff88021e185d20 R08: ffff88021e184000 R09: ffffffff81a8f658
>R10: ffff88021e185be0 R11: ffff88021e185fd8 R12: ffff88021e8a4800
>R13: 0000000000000000 R14: ffff88021dda2360 R15: 00000000000000ff
>FS:  0000000000000000(0000) GS:ffff88022bd00000(0000)
>knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 0000000000000008 CR3: 0000000001a05000 CR4: 00000000000006e0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process
>kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task ffff88021fc0dd00)
>Stack:
>  0000000000000000 ffffffffa0194ea7 000000d01e185d00 ffff88021e8a4000
>  000005f21dda2360 ffff8802203ae090 ffff88021e185d00 ffff88021e8a4800
>  ffff88021dda2360 0000000000001000 0000000004008002 ffff88021dda2960 Call
>Trace:
>  [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
>  [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
>  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>  [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
>  [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
>  [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
>  [<ffffffff81069df7>] ? schedule_work+0x13/0x15
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>  [<ffffffff8106837e>] process_one_work+0x1a6/0x278
>  [<ffffffff8106a3d1>] worker_thread+0x136/0x255
>  [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
>  [<ffffffff8106da7d>] kthread+0x84/0x8c
>  [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
>  [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
>Code: 00 00 89 45 c4 41 0f b7 5e 18 48 8b 87 a8 04 00 00 41 89 dd 48 05 90
>00 00 00 4d 6b ed 28 4d 03 6e 20 48 89 45 c8 e9 ea 00 00 00 <49> 8b 45 08
>48 85 c0 74
>14 48 89 c7 31 f6 48 89 45 a8 e8 76 b1
>RIP  [<ffffffffa019a7fe>] e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
>  RSP <ffff88021e185cc0>
>CR2: 0000000000000008
>---[ end trace 059af067cdc81b6a ]---
>BUG: unable to handle kernel paging request at fffffffffffffff8
>IP: [<ffffffff8106d618>] kthread_data+0xb/0x11 PGD 1a07067 PUD 1a08067 PMD
>0
>Oops: 0000 [#2] PREEMPT SMP
>CPU 2
>Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(PO) pktgen
>sbs sbshc f71882fg coretemp hwmon sunrpc ipv6 uinput snd_hda_codec_realtek
>snd_hda_intel ath9k snd_hda_codec mac80211 joydev snd_hwdep snd_seq
>ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e snd
>mei(C) microcode
>cfg80211 ppdev i2c_i801 soundcore serio_raw pcspkr snd_page_alloc iTCO_wdt
>iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm
>i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
>
>Pid: 140, comm: kworker/2:1 Tainted: P      D WC O 3.3.8+ #51 To be filled
>by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
>RIP: 0010:[<ffffffff8106d618>]  [<ffffffff8106d618>] kthread_data+0xb/0x11
>RSP: 0018:ffff88021e1858b8  EFLAGS: 00010092
>RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
>RDX: ffffffff81bee730 RSI: 0000000000000002 RDI: ffff88021fc0dd00
>RBP: ffff88021e1858b8 R08: 0000000000000400 R09: ffff88021fc0e0b8
>R10: ffff88021e185978 R11: 0000000000000000 R12: ffff88021fc0e0b8
>R13: ffff88021e1859b8 R14: 0000000000000002 R15: 0000000000000001
>FS:  0000000000000000(0000) GS:ffff88022bd00000(0000)
>knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: fffffffffffffff8 CR3: 0000000001a05000 CR4: 00000000000006e0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process
>kworker/2:1 (pid: 140, threadinfo ffff88021e184000, task ffff88021fc0dd00)
>Stack:
>  ffff88021e1858d8 ffffffff81069e8f ffff88021e1858d8 ffff88022bd12340
>  ffff88021e185978 ffffffff814c5041 ffff88021e185918 0000000000000246
>  ffff88021e184010 ffff88021fc0dd00 ffff88021e185fd8 0000000000012340 Call
>Trace:
>  [<ffffffff81069e8f>] wq_worker_sleeping+0x10/0x8a
>  [<ffffffff814c5041>] __schedule+0x17f/0x562
>  [<ffffffff814c54c9>] schedule+0x55/0x57
>  [<ffffffff81059b09>] do_exit+0x73e/0x742
>  [<ffffffff814c73c7>] oops_end+0xba/0xc2
>  [<ffffffff8102df05>] no_context+0x25a/0x269
>  [<ffffffff8107cee0>] ? load_balance+0x98/0x6b0
>  [<ffffffff8102e0db>] __bad_area_nosemaphore+0x1c7/0x1e7
>  [<ffffffff8102e109>] bad_area_nosemaphore+0xe/0x10
>  [<ffffffff814c902d>] do_page_fault+0x176/0x350
>  [<ffffffff81009785>] ? __switch_to+0x1cd/0x37c
>  [<ffffffff814c62bc>] ? _raw_spin_unlock_irq+0x2f/0x3a
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff814c9382>] ? sub_preempt_count+0x92/0xa5
>  [<ffffffff814c6925>] page_fault+0x25/0x30
>  [<ffffffffa019a7fe>] ? e1000_alloc_rx_buffers+0x5b/0x162 [e1000e]
>  [<ffffffffa0194ea7>] ? e1000e_set_rx_mode+0xbc/0x260 [e1000e]
>  [<ffffffffa0195a6d>] e1000_configure+0x51c/0x525 [e1000e]
>  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>  [<ffffffffa0195a87>] e1000e_up+0x11/0xbc [e1000e]
>  [<ffffffffa01992b1>] e1000e_reinit_locked+0x3f/0x4c [e1000e]
>  [<ffffffffa0199a29>] e1000_reset_task+0x6dd/0x6ec [e1000e]
>  [<ffffffff81069df7>] ? schedule_work+0x13/0x15
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffff81077243>] ? get_parent_ip+0x11/0x42
>  [<ffffffffa019934c>] ? e1000_set_features+0x8e/0x8e [e1000e]
>  [<ffffffff8106837e>] process_one_work+0x1a6/0x278
>  [<ffffffff8106a3d1>] worker_thread+0x136/0x255
>  [<ffffffff8106a29b>] ? manage_workers+0x190/0x190
>  [<ffffffff8106da7d>] kthread+0x84/0x8c
>  [<ffffffff814cc4a4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8106d9f9>] ? __init_kthread_worker+0x37/0x37
>  [<ffffffff814cc4a0>] ? gs_change+0x13/0x13
>Code: ea ff ff ff eb 9d 90 55 65 48 8b 04 25 00 c7 00 00 48 8b 80 60 03 00
>00 48 89 e5 8b 40 f0 c9 c3 48 8b 87 60 03 00 00 55 48 89 e5 <48> 8b 40 f8
>c9 c3 48 3b 3d 7b 10 b8 00 55 48 89 e5 75 09 0f bf RIP
>[<ffffffff8106d618>] kthread_data+0xb/0x11
>  RSP <ffff88021e1858b8>
>CR2: fffffffffffffff8
>---[ end trace 059af067cdc81b6b ]---
>Fixing recursive fault but reboot is needed!
>
>
>
>
>--
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>
>
>
>--------------------------------------------------------------------------
>----
>Live Security Virtual Conference
>Exclusive live event will cover all the ways today's security and
>threat landscape has changed and how IT managers can respond. Discussions
>will include endpoint security, mobile security and the latest in malware
>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>_______________________________________________
>E1000-devel mailing list
>E1000-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/e1000-devel
>To learn more about Intel&#174; Ethernet, visit
>http://communities.intel.com/community/wired

^ permalink raw reply

* Re: Crash in e1000e, 3.3.8+ (tainted)
From: Ben Greear @ 2012-07-24 23:20 UTC (permalink / raw)
  To: Dave, Tushar N; +Cc: e1000-devel list, netdev, bruce.w.allan
In-Reply-To: <061C8A8601E8EE4CA8D8FD6990CEA89130D50549@ORSMSX102.amr.corp.intel.com>

On 07/24/2012 04:13 PM, Dave, Tushar N wrote:
>> -----Original Message-----
>> From: Ben Greear [mailto:greearb@candelatech.com]
>> Sent: Tuesday, July 24, 2012 2:46 PM
>> To: e1000-devel list; netdev
>> Subject: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
>>
>> We have a somewhat reproducible crash using a 6-port NIC with 3.3.8+
>> kernel.  This kernel is tainted with a proprietary module, but the module
>> is not in use.
>>
>> The rx-all and related patches that were later accepted upstream have been
>> applied to this kernel.
>>
>> It seems that buffer_info is NULL in the code below?
>>
>>
>> (gdb) list e1000_alloc_rx_buffers+0x5b
>> Junk at end of line specification.
>> (gdb) list *(e1000_alloc_rx_buffers+0x5b)
>> 0x15822 is in e1000_alloc_rx_buffers (/home/greearb/git/linux-
>> 3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:611).
>> 606
>> 607		i = rx_ring->next_to_use;
>> 608		buffer_info = &rx_ring->buffer_info[i];
>> 609
>> 610		while (cleaned_count--) {
>> 611			skb = buffer_info->skb;
>> 612			if (skb) {
>> 613				skb_trim(skb, 0);
>> 614				goto map_skb;
>> 615			}
>> (gdb)
>>
>>
> Ben,
>
> This looks familiar to me, I believe this is due to race between adapter reset and e1000_close.
> Let me check if we have fix upstream or not.

I'm testing Bruce Allen's suggestion now:  bb9e44d0 (from 3.4).

It applies with fuzz to my 3.3.8+ tree.

So far, so good...but need to do some more reboots to be sure.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* RE: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
From: Dave, Tushar N @ 2012-07-24 23:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: e1000-devel list, netdev, Allan, Bruce W
In-Reply-To: <500F2DD8.8030300@candelatech.com>

>-----Original Message-----
>From: Ben Greear [mailto:greearb@candelatech.com]
>Sent: Tuesday, July 24, 2012 4:21 PM
>To: Dave, Tushar N
>Cc: e1000-devel list; netdev; Allan, Bruce W
>Subject: Re: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
>
>On 07/24/2012 04:13 PM, Dave, Tushar N wrote:
>>> -----Original Message-----
>>> From: Ben Greear [mailto:greearb@candelatech.com]
>>> Sent: Tuesday, July 24, 2012 2:46 PM
>>> To: e1000-devel list; netdev
>>> Subject: [E1000-devel] Crash in e1000e, 3.3.8+ (tainted)
>>>
>>> We have a somewhat reproducible crash using a 6-port NIC with 3.3.8+
>>> kernel.  This kernel is tainted with a proprietary module, but the
>>> module is not in use.
>>>
>>> The rx-all and related patches that were later accepted upstream have
>>> been applied to this kernel.
>>>
>>> It seems that buffer_info is NULL in the code below?
>>>
>>>
>>> (gdb) list e1000_alloc_rx_buffers+0x5b Junk at end of line
>>> specification.
>>> (gdb) list *(e1000_alloc_rx_buffers+0x5b)
>>> 0x15822 is in e1000_alloc_rx_buffers (/home/greearb/git/linux-
>>> 3.3.dev.y/drivers/net/ethernet/intel/e1000e/netdev.c:611).
>>> 606
>>> 607		i = rx_ring->next_to_use;
>>> 608		buffer_info = &rx_ring->buffer_info[i];
>>> 609
>>> 610		while (cleaned_count--) {
>>> 611			skb = buffer_info->skb;
>>> 612			if (skb) {
>>> 613				skb_trim(skb, 0);
>>> 614				goto map_skb;
>>> 615			}
>>> (gdb)
>>>
>>>
>> Ben,
>>
>> This looks familiar to me, I believe this is due to race between adapter
>reset and e1000_close.
>> Let me check if we have fix upstream or not.
>
>I'm testing Bruce Allen's suggestion now:  bb9e44d0 (from 3.4).

Yep, commit bb9e44d0 the one.
>
>It applies with fuzz to my 3.3.8+ tree.
>
>So far, so good...but need to do some more reboots to be sure.
>
>Thanks,
>Ben
>
>--
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>
>

^ permalink raw reply

* Re: [PATCH 11/17] Tools: hv: Gather DNS information
From: Ben Hutchings @ 2012-07-24 23:38 UTC (permalink / raw)
  To: K. Y. Srinivasan
  Cc: gregkh, linux-kernel, devel, virtualization, olaf, apw, netdev
In-Reply-To: <1343145701-3691-11-git-send-email-kys@microsoft.com>

On Tue, Jul 24, 2012 at 09:01:35AM -0700, K. Y. Srinivasan wrote:
> Now gather DNS information. This information cannot be gathered in
> a distro independent fashion. Invoke an external script (that can be
> distro dependent) to gather the DNS information.
[...]
> +	memset(cmd, 0, sizeof(cmd));
> +	strcat(cmd, "/sbin/hv_get_dns_info ");
> +	strcat(cmd, if_name);
[...]

This is a weird way to build a string; why are you not using
snprintf()?  Not to mention that interface names can contain several
characters that are special to the shell - in fact the only disallowed
characters are / and whitespace.
 
Also, the external script will not be useful to anything other than
hv_kvp_daemon, so it probably belongs somewhere under /usr/share.
 
Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply

* [PATCH] be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
From: Anton Blanchard @ 2012-07-25  1:05 UTC (permalink / raw)
  To: sathya.perla, subbu.seetharaman, ajit.khaparde, netdev


We are seeing an oops in be_get_fw_log_level on ppc64 where we walk
off the end of memory.

commit 941a77d582c8 (be2net: Fix to allow get/set of debug levels in
the firmware.) requires byteswapping of num_modes and num_modules.

Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Anton Blanchard <anton@samba.org>
---

diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index 63e51d4..59ee51a 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -910,8 +910,9 @@ static void be_set_fw_log_level(struct be_adapter *adapter, u32 level)
 	if (!status) {
 		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
 					sizeof(struct be_cmd_resp_hdr));
-		for (i = 0; i < cfgs->num_modules; i++) {
-			for (j = 0; j < cfgs->module[i].num_modes; j++) {
+		for (i = 0; i < le32_to_cpu(cfgs->num_modules); i++) {
+			u32 num_modes = le32_to_cpu(cfgs->module[i].num_modes);
+			for (j = 0; j < num_modes; j++) {
 				if (cfgs->module[i].trace_lvl[j].mode ==
 								MODE_UART)
 					cfgs->module[i].trace_lvl[j].dbg_lvl =
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 501dfa9..bd5cf7e 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3479,7 +3479,7 @@ u32 be_get_fw_log_level(struct be_adapter *adapter)
 	if (!status) {
 		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
 						sizeof(struct be_cmd_resp_hdr));
-		for (j = 0; j < cfgs->module[0].num_modes; j++) {
+		for (j = 0; j < le32_to_cpu(cfgs->module[0].num_modes); j++) {
 			if (cfgs->module[0].trace_lvl[j].mode == MODE_UART)
 				level = cfgs->module[0].trace_lvl[j].dbg_lvl;
 		}

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox