Netdev List
 help / color / mirror / Atom feed
* [PATCH] Documentation: clarify phys_port_id
From: Dan Williams @ 2014-12-17 16:59 UTC (permalink / raw)
  To: netdev; +Cc: Joshua Watt, jpirko, Florian Fainelli
In-Reply-To: <1418834826.1160.35.camel@dcbw.local>

Signed-off-by: Dan Williams <dcbw@redhat.com>
---
 Documentation/ABI/testing/sysfs-class-net | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index e1b2e78..7fe823a 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -186,7 +186,12 @@ KernelVersion:	3.12
 Contact:	netdev@vger.kernel.org
 Description:
 		Indicates the interface unique physical port identifier within
-		the NIC, as a string.
+		the NIC, as a string.  If two net_device objects share physical
+		hardware or other resources, and/or do not operate independently
+		both net_device objects should be assigned the
+		same phys_port_id.  phys_port_id should be as globally unique
+		as possible to prevent conflicts between different drivers and
+		vendors, eg with MAC addresses or hardware GUIDs.
 
 What:		/sys/class/net/<iface>/speed
 Date:		October 2009
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH net 2/2] geneve: Fix races between socket add and release.
From: Thomas Graf @ 2014-12-17 16:54 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev, Andy Zhou
In-Reply-To: <1418783132-99230-2-git-send-email-jesse@nicira.com>

On 12/16/14 at 06:25pm, Jesse Gross wrote:
> Currently, searching for a socket to add a reference to is not
> synchronized with deletion of sockets. This can result in use
> after free if there is another operation that is removing a
> socket at the same time. Solving this requires both holding the
> appropriate lock and checking the refcount to ensure that it
> has not already hit zero.
> 
> Inspired by a related (but not exactly the same) issue in the
> VXLAN driver.
> 
> Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
> CC: Andy Zhou <azhou@nicira.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
>  net/ipv4/geneve.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
> index 5a47188..95e47c9 100644
> --- a/net/ipv4/geneve.c
> +++ b/net/ipv4/geneve.c
> @@ -296,6 +296,7 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
>  				    geneve_rcv_t *rcv, void *data,
>  				    bool no_share, bool ipv6)
>  {
> +	struct geneve_net *gn = net_generic(net, geneve_net_id);
>  	struct geneve_sock *gs;
>  
>  	gs = geneve_socket_create(net, port, rcv, data, ipv6);
> @@ -305,15 +306,15 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
>  	if (no_share)	/* Return error if sharing is not allowed. */
>  		return ERR_PTR(-EINVAL);
>  
> +	spin_lock(&gn->sock_lock);
>  	gs = geneve_find_sock(net, port);

Perhaps remove the _rcu of the iterator in the geneve_find_sock?
Also, the kfree_rcu() seems no longer needed as all read accesses
are protected by the spinlock.

> -	if (gs) {
> -		if (gs->rcv == rcv)
> -			atomic_inc(&gs->refcnt);
> -		else
> +	if (gs && ((gs->rcv != rcv) ||
> +		   !atomic_add_unless(&gs->refcnt, 1, 0)))
>  			gs = ERR_PTR(-EBUSY);

Since you are taking gn->sock_lock in geneve_sock_release()
anyway, all accesses to refcnt could eventually be converted
to non-atomic ops.

^ permalink raw reply

* Re: Question about phys_port_id
From: Dan Williams @ 2014-12-17 16:47 UTC (permalink / raw)
  To: Joshua Watt; +Cc: netdev
In-Reply-To: <CAEPrYjTZ8SU1y4TwKHYtZju+4_-O5mazAjaFKaKmR2cYuAKHnA@mail.gmail.com>

On Wed, 2014-12-17 at 10:09 -0600, Joshua Watt wrote:
> Hello,
> 
> I had a question regarding the phys_port_id attribute of net_device.
> Is that identifier supposed to be globally unique or just unique among
> devices that share a common device? For example, we have a single
> device that create two net_device s (one for each of it's macs). Would

Do the two net_device's share hardware or firmware resources?  Can they
be used independently at maximum capability, or when both are in use do
they have degraded capability?

> it be sufficient for this device to return a phys_port_id of 0 for the
> first net_device and 1 for the second? I noticed that the other

If the two netdevs share resources, then they should have the *same*
phys_port_id.  If they do not have the same physical hardware or shared
resources and are completely independent from each other at all levels,
then you can either skip phy_port_id altogether.

One good use for this (and why it was originally added) was to indicate
to userspace that it was pointless to bond two interfaces with the same
underlying hardware or resources, because that totally defeats the
purpose of both failover and aggregation.

> implementations that use phys_port_id copy their mac address into the
> phys_port_id, but I'm not sure if that is just because that is an easy
> way to get a unique number or if it is because the ID needs to be
> globally unique.

Say you have two netdevs that share the same hardware or resources.  You
assign them both a phys_port_id of "1" to indicate this.  What if
there's a second cpsw device on the system, do both of its netdevs also
get "1", or "2", or?  Or how about a card from another vendor, how do
you ensure that your device's phys_port_id won't conflict with that
vendor's device/driver?

That's why most drivers currently use the MAC address or a GUID.

Dan

> If you're wondering the driver in question is the TI cpsw driver
> (drivers/net/ethernet/ti/cpsw.c). We are running the device in
> dual-emac mode and need to uniquely identify which emac is which in
> userspace (specifically, udev rules). The physical port identifier
> seems to be the logical choice to me, but I'm not sure if I'm missing
> something.
> 
> Thanks,
> Joshua Watt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net 1/2] geneve: Remove socket and offload handlers at destruction.
From: Thomas Graf @ 2014-12-17 16:41 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev, Andy Zhou
In-Reply-To: <1418783132-99230-1-git-send-email-jesse@nicira.com>

On 12/16/14 at 06:25pm, Jesse Gross wrote:
> Sockets aren't currently removed from the the global list when
> they are destroyed. In addition, offload handlers need to be cleaned
> up as well.
> 
> Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
> CC: Andy Zhou <azhou@nicira.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Thomas Graf @ 2014-12-17 16:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, luto, torvalds, kaber, netdev
In-Reply-To: <5490C73C.8010405@redhat.com>

On 12/17/14 at 12:58am, Daniel Borkmann wrote:
> Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
> Acked-by: Daniel Borkmann <dborkman@redhat.com>

Nothing to add to Daniel's excellent feedback.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Thomas Graf @ 2014-12-17 16:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, dborkman, luto, torvalds, kaber, netdev
In-Reply-To: <1418774579.9773.69.camel@edumazet-glaptop2.roam.corp.google.com>

On 12/16/14 at 04:02pm, Eric Dumazet wrote:
> On Tue, 2014-12-16 at 17:58 -0500, David Miller wrote:
> 
> > +		__skb_put(skb, nm_len);
> > +		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
> > +		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> >  
> 
> Not related to this patch, but it looks like netlink_set_status()
> barrier is wrong ?
> 
> It seems we need smp_wmb() after the memcpy() and before the
> hdr->nm_status = status; 

Yes, definitely wrong as-is. I'll send a patch. For the particular
case we'd need a smp_rmb() after the memcpy() to complete the loads.
The skb destructor needs a smp_wmb() after setting nm_len. We could
get away with a smp_wmb() first thing in netlink_set_status() with
the code as-is but smp_mb() might be the less fragile thing to do.
Objections?

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Hubert Sokolowski @ 2014-12-17 16:18 UTC (permalink / raw)
  To: vyasevic
  Cc: John Fastabend, Jamal Hadi Salim, Roopa Prabhu,
	netdev@vger.kernel.org
In-Reply-To: <5491A3B5.9070601@redhat.com>

>
> I don't think we have to dump uc/mc lists unconditionally.  What we
> want is the lower diver's view of any fdb entries it things are appropriate.
> For simple cards, this becomes equivalent to uc/mc lists.  For smarter cards
> that override the default dumper, it makes sense for them to provide the info.
>

I am very glad to hear that :).

>
> Well, the bridge would have dumped any fdbs that pointer at the bridge
> device (port is NULL) as it's egress port.  Not sure about the vxlan.
> For stacked situations of complex devices this make sense.  For
> instance if you stack vxlan on top of bridge, then from the vxlan
> perspective, we want to see what macs the bridge will forward to the
> vxlan.  Here, the bridge is actually slightly broken as it wouldn't
> actually dump all the pertinent info, but that's more of a bridge problem.

I have just prepared a patch where I dump uc/mc for bridge devices
by looking at (dev->priv_flags & IFF_EBRIDGE), so I have same results
as without my changes. This should satisfy Jamal and Roopa.
I could send it as v3 of my patch along with the results if you are
interested.

>
> -vlad
>


thanks,
Hubert

--
Hubert Sokolowski    Intel Corporation

^ permalink raw reply

* Question about phys_port_id
From: Joshua Watt @ 2014-12-17 16:09 UTC (permalink / raw)
  To: netdev

Hello,

I had a question regarding the phys_port_id attribute of net_device.
Is that identifier supposed to be globally unique or just unique among
devices that share a common device? For example, we have a single
device that create two net_device s (one for each of it's macs). Would
it be sufficient for this device to return a phys_port_id of 0 for the
first net_device and 1 for the second? I noticed that the other
implementations that use phys_port_id copy their mac address into the
phys_port_id, but I'm not sure if that is just because that is an easy
way to get a unique number or if it is because the ID needs to be
globally unique.

If you're wondering the driver in question is the TI cpsw driver
(drivers/net/ethernet/ti/cpsw.c). We are running the device in
dual-emac mode and need to uniquely identify which emac is which in
userspace (specifically, udev rules). The physical port identifier
seems to be the logical choice to me, but I'm not sure if I'm missing
something.

Thanks,
Joshua Watt

^ permalink raw reply

* Re: [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer
From: David Ahern @ 2014-12-17 15:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Martin KaFai Lau
  Cc: netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <20141217153349.GG11607@kernel.org>

On 12/17/14 8:33 AM, Arnaldo Carvalho de Melo wrote:
>
> On a random RHEL7 kernel I had laying around on a test machine:
>
> [root@ssdandy ~]# perf probe -L tcp_sacktag_write_queue | head -20
> <tcp_sacktag_write_queue@/usr/src/debug/kernel-3.10.0-123.el7/linux-3.10.0-123.el7.x86_64/net/ipv4/tcp_input.c:0>
>        0  tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
>           			u32 prior_snd_una)
>        2  {
>           	struct tcp_sock *tp = tcp_sk(sk);
>        4  	const unsigned char *ptr = (skb_transport_header(ack_skb) +
>           				    TCP_SKB_CB(ack_skb)->sacked);
>           	struct tcp_sack_block_wire *sp_wire = (struct tcp_sack_block_wire *)(ptr+2);
>           	struct tcp_sack_block sp[TCP_NUM_SACKS];
>           	struct tcp_sack_block *cache;
>           	struct tcp_sacktag_state state;
>           	struct sk_buff *skb;
>       11  	int num_sacks = min(TCP_NUM_SACKS, (ptr[1] - TCPOLEN_SACK_BASE) >> 3);
>           	int used_sacks;
>           	bool found_dup_sack = false;
>           	int i, j;
>           	int first_sack_index;
>
>       17  	state.flag = 0;
>       18  	state.reord = tp->packets_out;

But there are limitations/hassles with this approach. For starters I 
believe it requires vmlinux on box. The products I work on do not have 
vmlinux available in the runtime environment. I recall someone (Masami?) 
suggesting the ability to write the probe data to a file (ie., create 
the probe definition off box) and load the file to create the probe, so 
yes a solvable problem.

But with this approach it could very be that the function name and 
variable names differ with kernel version and that makes it hard to 
impossible to create a set of analysis commands.

David

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Vlad Yasevich @ 2014-12-17 15:39 UTC (permalink / raw)
  To: John Fastabend, Jamal Hadi Salim
  Cc: Hubert Sokolowski, Roopa Prabhu, netdev@vger.kernel.org
In-Reply-To: <548F80B2.80408@gmail.com>

On 12/15/2014 07:45 PM, John Fastabend wrote:
> On 12/15/2014 06:29 AM, Jamal Hadi Salim wrote:
>> On 12/12/14 15:05, John Fastabend wrote:
>>> On 12/12/2014 06:35 AM, Jamal Hadi Salim wrote:
>>
>>
>>> I'll wake up ;)
>>
>>
>> Vlad made me go over those patches in a few iterations to make
>> sure that the use cases covered in the test case work. It is
>> holiday season, so he may be offline.
>>
> 
> Yep.

Sorry,  had HW/network issues as well as holiday season...  Been trying to
catch up.

> 
>>> First quick grep of code finds some strange uses of ndo_fdb_dump like
>>> this in macvlan,
>>>
>>>    ./drivers/net/macvlan.c
>>>          .ndo_fdb_dump           = ndo_dflt_fdb_dump,
>>>
>>> I'll be sending a patch once net-next opens up again to resolve it. Its
>>> harmless though so not really a fix for net.
>>>
>>> There seem to be a few places that have the potential to return
>>> different values then the uc/mc lists.
>>>
>>>      ./drivers/net/vxlan.c
>>>      ./drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>>>      ./drivers/net/ethernet/rocker/rocker.c
>>>
>>>      ./net/bridge/br_device.c
>>>
>>
>> Yes, thats my observation as well.
>> The question is: Are multi/unicast address unconditionally dumped?
> 
> hmm good question. When I implemented this on the host nics with SR-IOV,
> VMDQ, etc. The multi/unicast addresses were propagated into the FDB by
> the driver. My logic was if some netdev ethx has a set of MAC addresses
> above it well then any virtual function or virtual device also behind
> the hardware shouldn't be sending those addresses out the egress switch
> facing port. Otherwise the switch will see packets it knows are behind
> that port and drop them. Or flood them if it hasn't learned the address
> yet. Either way they will never get to the right netdev.
> 
> Admittedly I wasn't thinking about switches with many ports at the time.

Looking at the old code, we've always asked HW to dump it's state and if HW
didn't support the dumper, the default dumper of MC/UC lists was used.
This makes sense for most devices.

> 
>> Some of these drivers may be just doing the LinuxWay(aka cutnpaste what
>> the other driver did).
> 
> My original thinking here was... if it didn't implement fdb_add, fdb_del
> and fdb_dump then if you wanted to think of it as having forwarding
> database that was fine but it was really just a two port mac relay. In
> which case just dump all the mac addresses it knows about. In this case
> if it was something more fancy it could do its own dump like vxlan or
> macvlan.
> 
>> If you go over the original thread exchange with Vlad, you'll notice
>> i was kind of unsure why dumping of unicast/multicast had anything to
>> do with fdb dumping.
>> It is still my view that we shouldnt be treating these addresses as if
>> they were fdb entries. But: The problem is once you allow an API to
>> user space you cant take it back even if people are depending on bugs.
>>
> 
> For a host nic ucast/multicast and fdb are the same, I think? The
> code we had was just short-hand to allow the common case a host nic
> to work. Notice vxlan and bridge drivers didn't dump there addr lists from fdb_dump until
> your patch.

Right.  That patch added additional filtering code, but I guess we missed
the change that force it dump MC/UC lists from the master devices.  It did dump
those list from the slave devices.

> 
> Perhaps my implementation of macvlan fdb_{add|del|dump} is buggy. And
> I shouldn't overload the addr lists.
> 
>>
>>> So I guess we can walk through the list and analyse them a bit.
>>>
>>> vxlan:
>>>
>>> Try stacking devices on top of the vxlan device this will call a uc_add
>>> routine if you then change the mac addr on the vlan. This would get
>>> reported by the dflt fdb dump handlers but not the drivers fdb dump
>>> handlers. So removing the dflt dump handler from this patch at least
>>> changes things. We should either explain why this is OK or accept that
>>> the driver needs to be fixed. Or I guess that the patch is just wrong.
>>> My guess is one of the latter options.
>>>
>>> Also Jamal, your original patch seems like it might of changed this
>>> and Hubert's patch is reverting back to its original case. Was this
>>> specific part of your patch intentional?
>>>
>>
>> Yes.
>> This is based on the view that unicast/multicast must be dumped
>> *unconditionally*. If the view is that uni/mcast addresses are
>> dumped conditionally based on what the driver thinks, then Hubert's
>> one liner is good. But i really would like Vlad to comment. 80%
>> of the effort on my part if you look at the thread was the refactoring
>> of the code to meet the use case.

I don't think we have to dump uc/mc lists unconditionally.  What we
want is the lower diver's view of any fdb entries it things are appropriate.
For simple cards, this becomes equivalent to uc/mc lists.  For smarter cards
that override the default dumper, it makes sense for them to provide the info.

> 
> I'm interested to see what Vlad says as well. But the current situation
> is previously some drivers dumped their addr lists others didn't.
> Specifically, the more switch like devices (bridge, vxlan) didn't. Now
> every device will dump the addr lists. I'm not entirely convinced that
> is correct.
>

Well, the bridge would have dumped any fdbs that pointer at the bridge
device (port is NULL) as it's egress port.  Not sure about the vxlan.
For stacked situations of complex devices this make sense.  For
instance if you stack vxlan on top of bridge, then from the vxlan
perspective, we want to see what macs the bridge will forward to the
vxlan.  Here, the bridge is actually slightly broken as it wouldn't
actually dump all the pertinent info, but that's more of a bridge problem.

-vlad

>>
>> I thought the abstraction which requires that your own MAC addresses
>> are treated as fdb entries was broken - but it is too late to change
>> that.
>>
> 
> It works OK for host nics (NICS that can't forward between ports) and
> seems at best confusing for real switch asics. On a related question do
> you expect the switch asic to trap any packets with MAC addresses in
> the multi/unicast address lists and send them to the correct netdev? Or
> will the switch forward them using normal FDB tables?
> 
> Also I don't think its too late to fix it though. Maybe we had some
> buggy drivers is all.
> 
>> cheers,
>> jamal
> 
> 

^ permalink raw reply

* Re: [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 15:33 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <1418608606-1569264-4-git-send-email-kafai@fb.com>

Em Sun, Dec 14, 2014 at 05:56:44PM -0800, Martin KaFai Lau escreveu:
> The tcp tracer, which will be added in the later patch, depends
> on them to collect statistics.

> --- a/include/trace/events/tcp.h
<SNIP>
> +DECLARE_TRACE(tcp_sacks_rcv,
> +	     TP_PROTO(struct sock *sk, int num_sacks),
> +	     TP_ARGS(sk, num_sacks)
> +);

<SNIP>

> +++ b/net/ipv4/tcp_input.c
> @@ -1650,6 +1650,8 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
>  	int i, j;
>  	int first_sack_index;
>  
> +	trace_tcp_sacks_rcv(sk, num_sacks);
> +

In another message someone pointed out that we want some tracepoints, but
others would imply ABI, a drag on upstream to keep tons of set in stone
tracepoints, so what I was saying was like below, where one of the above
proposed tracepoints is implemented as a "wannabe tracepoint", i.e. a
dynamic probe, that will be as optimized as the kprobes_tracer can make it,
sometimes even using, IIRC, the ftrace mechanizms, if put on some suitable
place (function entry, etc, IIRC, Steven?).

On a random RHEL7 kernel I had laying around on a test machine:

[root@ssdandy ~]# perf probe -L tcp_sacktag_write_queue | head -20
<tcp_sacktag_write_queue@/usr/src/debug/kernel-3.10.0-123.el7/linux-3.10.0-123.el7.x86_64/net/ipv4/tcp_input.c:0>
      0  tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
         			u32 prior_snd_una)
      2  {
         	struct tcp_sock *tp = tcp_sk(sk);
      4  	const unsigned char *ptr = (skb_transport_header(ack_skb) +
         				    TCP_SKB_CB(ack_skb)->sacked);
         	struct tcp_sack_block_wire *sp_wire = (struct tcp_sack_block_wire *)(ptr+2);
         	struct tcp_sack_block sp[TCP_NUM_SACKS];
         	struct tcp_sack_block *cache;
         	struct tcp_sacktag_state state;
         	struct sk_buff *skb;
     11  	int num_sacks = min(TCP_NUM_SACKS, (ptr[1] - TCPOLEN_SACK_BASE) >> 3);
         	int used_sacks;
         	bool found_dup_sack = false;
         	int i, j;
         	int first_sack_index;
         
     17  	state.flag = 0;
     18  	state.reord = tp->packets_out;
[root@ssdandy ~]#

Available variables at tcp_sacktag_write_queue:17
        @<tcp_sacktag_write_queue+77>
                int     num_sacks
                struct sk_buff* ack_skb
                struct sock*    sk
                struct tcp_sack_block_wire*     sp_wire
                u32     prior_snd_una
                unsigned char*  ptr
[root@ssdandy ~]#

Ok, so we can insert a probe at that point and also we can collect the values of the
sk and num_sacks variables, so:

[root@ssdandy ~]# perf probe 'tcp_sacks_rcv=tcp_sacktag_write_queue:17 sk num_sacks'
Added new event:
  probe:tcp_sacks_rcv  (on tcp_sacktag_write_queue:17 with sk num_sacks)

You can now use it in all perf tools, such as:

	perf record -e probe:tcp_sacks_rcv -aR sleep 1

[root@ssdandy ~]

There you go, you have your wannabe tracepoint, dynamic:

[root@ssdandy ~]# perf record -a -g -e probe:tcp_sacks_rcv
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.362 MB perf.data (~15799 samples) ]

[root@ssdandy ~]# perf script
swapper 0 [000] 184175.932790: probe:tcp_sacks_rcv: (ffffffff8151e59d) sk=ffff880425761e00 num_sacks=0
              71e59e tcp_sacktag_write_queue (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              72216e tcp_ack (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              723cf8 tcp_rcv_state_process (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              72d158 tcp_v4_do_rcv (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              72f5c7 tcp_v4_rcv (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              709584 ip_local_deliver_finish (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              709858 ip_local_deliver (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              7091fd ip_rcv_finish (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              709ac4 ip_rcv (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              6cfdb6 __netif_receive_skb_core (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              6cffc8 __netif_receive_skb (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              6d0050 netif_receive_skb (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              6d0aa8 napi_gro_receive (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
               1b5bf e1000_receive_skb (/lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko)
               1cbda e1000_clean_rx_irq (/lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko)
               247dc e1000e_poll (/lib/modules/3.10.0-123.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko)
              6d041a net_rx_action (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              267047 __do_softirq (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              7f3a5c call_softirq (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              214d25 do_softirq (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              2673e5 irq_exit (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              7f4358 do_IRQ (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              7e94ad ret_from_intr (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              7c3927 rest_init (/usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux)
              e06fa7 start_kernel ([kernel.vmlinux].init.text)
              e065ee x86_64_start_reservations ([kernel.vmlinux].init.text)
              e06742 x86_64_start_kernel ([kernel.vmlinux].init.text)
<SNIP>

[root@ssdandy ~]# perf script -g python
generated Python script: perf-script.py
[root@ssdandy ~]# vim perf-script.py # Edit it to remove callchain printing, simplify some stuff
[root@ssdandy ~]# mv perf-script.py tcp_sack_rcv.py
[root@ssdandy ~]# cat tcp_sack_rcv.py 
import os, sys

sys.path.append(os.environ['PERF_EXEC_PATH'] + \
	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')

from perf_trace_context import *
from Core import *

def probe__tcp_sacks_rcv(event_name, context, common_cpu,
	common_secs, common_nsecs, common_pid, common_comm,
	common_callchain, __probe_ip, sk, num_sacks):
		print_header(event_name, common_cpu, common_secs, common_nsecs,
			common_pid, common_comm)

		print "__probe_ip=%#x, sk=%#x, num_sacks=%d" % \
		(__probe_ip, sk, num_sacks)

def print_header(event_name, cpu, secs, nsecs, pid, comm):
	print "%-18s %3u %05u.%09u %1u %-8s " % \
	(event_name, cpu, secs, nsecs, pid, comm),
[root@ssdandy ~]#

[root@ssdandy ~]# perf script -s tcp_sack_rcv.py  | head -10
Failed to open 64/libfreebl3.so, continuing without symbols
probe__tcp_sacks_rcv   0 184175.932790461 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff880425761e00, num_sacks=0
probe__tcp_sacks_rcv   0 184177.487455369 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804047e0780, num_sacks=0
probe__tcp_sacks_rcv   0 184177.588593040 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256af800, num_sacks=0
probe__tcp_sacks_rcv   0 184178.741298627 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256acb00, num_sacks=0
probe__tcp_sacks_rcv   0 184179.902089365 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256ad280, num_sacks=0
probe__tcp_sacks_rcv   0 184180.802761942 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256acb00, num_sacks=0
probe__tcp_sacks_rcv   0 184180.961373503 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256af800, num_sacks=0
probe__tcp_sacks_rcv   0 184182.123660739 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256ad280, num_sacks=0
probe__tcp_sacks_rcv   0 184182.387640636 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256acb00, num_sacks=0
probe__tcp_sacks_rcv   0 184182.859420892 0 swapper   __probe_ip=0xffffffff8151e59d, sk=0xffff8804256af800, num_sacks=0
[root@ssdandy ~]#


>  	state.flag = 0;
>  	state.reord = tp->packets_out;
>  	state.rtt_us = -1L;
> @@ -2932,6 +2934,9 @@ static inline bool tcp_ack_update_rtt(struct sock *sk, const int flag,
>  
>  	/* RFC6298: only reset backoff on valid RTT measurement. */
>  	inet_csk(sk)->icsk_backoff = 0;
> +
> +	trace_tcp_rtt_sample(sk, seq_rtt_us);
> +
>  	return true;
>  }
>  
> @@ -4232,6 +4237,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
>  	SOCK_DEBUG(sk, "out of order segment: rcv_next %X seq %X - %X\n",
>  		   tp->rcv_nxt, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq);
> +	trace_tcp_ooo_rcv(sk);
>  
>  	skb1 = skb_peek_tail(&tp->out_of_order_queue);
>  	if (!skb1) {
> -- 
> 1.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs
From: David Ahern @ 2014-12-17 15:08 UTC (permalink / raw)
  To: Martin KaFai Lau, netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team
In-Reply-To: <1418608606-1569264-2-git-send-email-kafai@fb.com>

On 12/14/14 6:56 PM, Martin KaFai Lau wrote:
> +DECLARE_EVENT_CLASS(tcp,
> +	TP_PROTO(struct sock *sk),
> +	TP_ARGS(sk),
> +	TP_STRUCT__entry(
> +		__field(u8, ipv6)
> +		__array(u8, laddr, 16)
> +		__array(u8, raddr, 16)

You could store the addresses as
union {
     struct sockaddr_in      v4;
     struct sockaddr_in6     v6;
} sa;


and then use:

> +	TP_printk("local=%s:%d remote=%s:%d snd_cwnd=%u mss_cache=%u "
> +		  "ssthresh=%u srtt_us=%llu rto_ms=%u",

%pIS to print the addresses in a more readable format than what 
__print_hex will show.

I have a patch to perf (and by extension it applies to trace-cmd) to 
handle pI4, pI6 and pI6c. It readily extends to pIS.

David

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 15:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQJ+8mtB8LD=U7XbxOC2hxhDChxOELhZ3NEYeoTk1G3LYg@mail.gmail.com>

Em Sun, Dec 14, 2014 at 10:55:55PM -0800, Alexei Starovoitov escreveu:
> On Sun, Dec 14, 2014 at 5:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> > Hi,
> >
> > We have been using the kernel ftrace infra to collect TCP per-flow statistics.
> > The following patch set is a first slim-down version of our
> > existing implementation. We would like to get some early feedback
> > and make it useful for others.
> >
> > [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
> > Defines some basic tracepoints (by TRACE_EVENT).
> >
> > [RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
> > A sample perf script with simple ip/port filtering and summary output.
> >
> > [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
> > Declares a few more tracepoints (by DECLARE_TRACE) which are
> > used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.
> >
> > [RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
> > Defines a few tcp_trace structs which are used to collect statistics
> > on each tcp_sock.
> >
> > [RFC PATCH net-next 5/5] tcp: Add TCP tracer:
> > It introduces a tcp_tracer which hooks onto the tracepoints defined in the
> > patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
> > use this tracer to collect per-flow statistics.  The commit log has
> > some more details.
> 
> I think patches 1 and 3 are good additions, since they establish
> few permanent points of instrumentation in tcp stack.
> Patches 4-5 look more like use cases of tracepoints established
> before. They may feel like simple additions and, no doubt,
> they are useful, but since they expose things via tracing
> infra they become part of api and cannot be changed later,
> when more stats would be needed.
> I think systemtap like scripting on top of patches 1 and 3
> should solve your use case ?

I guess even just using 'perf probe' to set those wannabe tracepoints
should be enough, no? Then he can refer to those in his perf record
call, etc and process it just like with the real tracepoints.

> Also, have you looked at recent eBPF work?
> Though it's not completely ready yet, soon it should
> be able to do the same stats collection as you have
> in 4/5 without adding permanent pieces to the kernel.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Richard Laager @ 2014-12-17 14:52 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20141217095552.GB27966@unicorn.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On Wed, 2014-12-17 at 10:55 +0100, Michal Kubecek wrote:
> Would it be possible that the kernel you are using has
> 
>   da08143b8520 ("vlan: more careful checksum features handling")
> 
> (and possibly also a9b3ace44c7d and 3625920b62c3) but not
> 
>   db115037bb57 ("net: fix checksum features handling in netif_skb_features()")

Ubuntu's 3.13.0 has none of these changes.
Ubuntu's 3.16.0 has all four changes.

The problem occurs on both kernels.

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: net: integer overflow in ip_idents_reserve
From: Eric Dumazet @ 2014-12-17 14:11 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Hannes Frederic Sowa, David S. Miller, LKML, netdev,
	Andrey Ryabinin, Dave Jones
In-Reply-To: <5490D920.5000104@oracle.com>

On Tue, 2014-12-16 at 20:15 -0500, Sasha Levin wrote:

> I reported this one because there's usually some code to handle overflow
> in code that expects that and here there was none (I could see).

IP ID are best effort.

When sending one million IPv4 frames per second to a particular
destination, the 16bit ID space is recycled so fast that really their
precise values do not matter anymore.
You pray that IP fragments wont be needed at all.

(One of the idea I had was to detect this kind of stress and fallback to
a random generation, reducing false sharing, but this seemed a micro
optimization targeting synthetic benchmarks )

Thanks

^ permalink raw reply

* Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms
From: David Vrabel @ 2014-12-17 14:00 UTC (permalink / raw)
  To: David Vrabel, John
  Cc: netdev@vger.kernel.org, Wei Liu, Ian Campbell,
	Xen-devel@lists.xen.org
In-Reply-To: <548854C3.7060008@citrix.com>

On 10/12/14 14:12, David Vrabel wrote:
> On 10/12/14 13:42, John wrote:
>> David,
>>
>> This patch you put into 3.18.0 appears to break the latest version of
>> stubdomains. I found this out today when I tried to update a machine to
>> 3.18.0 and all of the domUs crashed on start with the dmesg output like
>> this:
> 
> Cc'ing the lists and relevant netback maintainers.
> 
> I guess the stubdoms are using minios's netfront?  This is something I
> forgot about when deciding if it was ok to make this feature mandatory.
> 
> The patch cannot be reverted as it's a prerequisite for a critical
> (security) bug fix.  I am also unconvinced that the no-feature-rx-notify
> support worked correctly anyway.
> 
> This can be resolved by:
> 
> - Fixing minios's netfront to support feature-rx-notify. This should be
> easy but wouldn't help existing Xen deployments.
> 
> Or:
> 
> - Reimplement feature-rx-notify support.  I think the easiest way is to
> queue packets on the guest Rx internal queue with a short expiry time.

This patch works for me.  I tested it with a hacked Linux frontend that
disabled feature-rx-notify, but not with a stubdom.

Can you give it a try, please?

David

8<--------------------------------------------------------------
xen-netback: support frontends without feature-rx-notify again

Commit bc96f648df1bbc2729abbb84513cf4f64273a1f1 (xen-netback: make
feature-rx-notify mandatory) incorrectly assumed that there were no
frontends in use that did not support this feature.  But the frontend
driver in MiniOS does not and since this is used by (qemu) stubdoms,
these stopped working.

Netback sort of works as-is in this mode except:

- If there are no Rx requests and the internal Rx queue fills, only the
  drain timeout will wake the thread.  The default drain timeout of 10 s
  would give unacceptable pauses.

- If an Rx stall was detected and the internal Rx queue is drained, then
  the Rx thread would never wake.

Handle these two cases (when feature-rx-notify is disabled) by:

- Reducing the drain timeout to 30 ms.

- Disabling Rx stall detection.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/common.h    |    4 +++-
 drivers/net/xen-netback/interface.c |    4 +++-
 drivers/net/xen-netback/netback.c   |   27 ++++++++++++++-------------
 drivers/net/xen-netback/xenbus.c    |   12 +++++++++---
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 083ecc9..5f1fda4 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -230,6 +230,8 @@ struct xenvif {
 	 */
 	bool disabled;
 	unsigned long status;
+	unsigned long drain_timeout;
+	unsigned long stall_timeout;
 
 	/* Queues */
 	struct xenvif_queue *queues;
@@ -328,7 +330,7 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id);
 extern bool separate_tx_rx_irq;
 
 extern unsigned int rx_drain_timeout_msecs;
-extern unsigned int rx_drain_timeout_jiffies;
+extern unsigned int rx_stall_timeout_msecs;
 extern unsigned int xenvif_max_queues;
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index a6a32d3..9259a73 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -166,7 +166,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 
 	cb = XENVIF_RX_CB(skb);
-	cb->expires = jiffies + rx_drain_timeout_jiffies;
+	cb->expires = jiffies + vif->drain_timeout;
 
 	xenvif_rx_queue_tail(queue, skb);
 	xenvif_kick_thread(queue);
@@ -414,6 +414,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->ip_csum = 1;
 	vif->dev = dev;
 	vif->disabled = false;
+	vif->drain_timeout = msecs_to_jiffies(rx_drain_timeout_msecs);
+	vif->stall_timeout = msecs_to_jiffies(rx_stall_timeout_msecs);
 
 	/* Start out with no queues. */
 	vif->queues = NULL;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4a509f7..b0292e4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -60,14 +60,12 @@ module_param(separate_tx_rx_irq, bool, 0644);
  */
 unsigned int rx_drain_timeout_msecs = 10000;
 module_param(rx_drain_timeout_msecs, uint, 0444);
-unsigned int rx_drain_timeout_jiffies;
 
 /* The length of time before the frontend is considered unresponsive
  * because it isn't providing Rx slots.
  */
-static unsigned int rx_stall_timeout_msecs = 60000;
+unsigned int rx_stall_timeout_msecs = 60000;
 module_param(rx_stall_timeout_msecs, uint, 0444);
-static unsigned int rx_stall_timeout_jiffies;
 
 unsigned int xenvif_max_queues;
 module_param_named(max_queues, xenvif_max_queues, uint, 0644);
@@ -2020,7 +2018,7 @@ static bool xenvif_rx_queue_stalled(struct xenvif_queue *queue)
 	return !queue->stalled
 		&& prod - cons < XEN_NETBK_RX_SLOTS_MAX
 		&& time_after(jiffies,
-			      queue->last_rx_time + rx_stall_timeout_jiffies);
+			      queue->last_rx_time + queue->vif->stall_timeout);
 }
 
 static bool xenvif_rx_queue_ready(struct xenvif_queue *queue)
@@ -2038,8 +2036,9 @@ static bool xenvif_have_rx_work(struct xenvif_queue *queue)
 {
 	return (!skb_queue_empty(&queue->rx_queue)
 		&& xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX))
-		|| xenvif_rx_queue_stalled(queue)
-		|| xenvif_rx_queue_ready(queue)
+		|| (queue->vif->stall_timeout &&
+		    (xenvif_rx_queue_stalled(queue)
+		     || xenvif_rx_queue_ready(queue)))
 		|| kthread_should_stop()
 		|| queue->vif->disabled;
 }
@@ -2092,6 +2091,9 @@ int xenvif_kthread_guest_rx(void *data)
 	struct xenvif_queue *queue = data;
 	struct xenvif *vif = queue->vif;
 
+	if (!vif->stall_timeout)
+		xenvif_queue_carrier_on(queue);
+
 	for (;;) {
 		xenvif_wait_for_rx_work(queue);
 
@@ -2118,10 +2120,12 @@ int xenvif_kthread_guest_rx(void *data)
 		 * while it's probably not responsive, drop the
 		 * carrier so packets are dropped earlier.
 		 */
-		if (xenvif_rx_queue_stalled(queue))
-			xenvif_queue_carrier_off(queue);
-		else if (xenvif_rx_queue_ready(queue))
-			xenvif_queue_carrier_on(queue);
+		if (queue->vif->stall_timeout) {
+			if (xenvif_rx_queue_stalled(queue))
+				xenvif_queue_carrier_off(queue);
+			else if (xenvif_rx_queue_ready(queue))
+				xenvif_queue_carrier_on(queue);
+		}
 
 		/* Queued packets may have foreign pages from other
 		 * domains.  These cannot be queued indefinitely as
@@ -2192,9 +2196,6 @@ static int __init netback_init(void)
 	if (rc)
 		goto failed_init;
 
-	rx_drain_timeout_jiffies = msecs_to_jiffies(rx_drain_timeout_msecs);
-	rx_stall_timeout_jiffies = msecs_to_jiffies(rx_stall_timeout_msecs);
-
 #ifdef CONFIG_DEBUG_FS
 	xen_netback_dbg_root = debugfs_create_dir("xen-netback", NULL);
 	if (IS_ERR_OR_NULL(xen_netback_dbg_root))
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d44cd19..efbaf2a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -887,9 +887,15 @@ static int read_xenbus_vif_flags(struct backend_info *be)
 		return -EOPNOTSUPP;
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend,
-			 "feature-rx-notify", "%d", &val) < 0 || val == 0) {
-		xenbus_dev_fatal(dev, -EINVAL, "feature-rx-notify is mandatory");
-		return -EINVAL;
+			 "feature-rx-notify", "%d", &val) < 0)
+		val = 0;
+	if (!val) {
+		/* - Reduce drain timeout to poll more frequently for
+		 *   Rx requests.
+		 * - Disable Rx stall detection.
+		 */
+		be->vif->drain_timeout = msecs_to_jiffies(30);
+		be->vif->stall_timeout = 0;
 	}
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net] cxgb4: Fix decoding QSA module for ethtool get settings
From: Hariprasad Shenai @ 2014-12-17 12:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, Hariprasad Shenai

QSA module was getting decoded as QSFP module in ethtool get settings, this
patch fixes it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    |    2 +-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 28d0415..c132d90 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2376,7 +2376,7 @@ const char *t4_get_port_type_description(enum fw_port_type port_type)
 		"KR/KX",
 		"KR/KX/KX4",
 		"R QSFP_10G",
-		"",
+		"R QSA",
 		"R QSFP",
 		"R BP40_BA",
 	};
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 291b6f2..7c0aec8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2470,8 +2470,8 @@ enum fw_port_type {
 	FW_PORT_TYPE_BP_AP,
 	FW_PORT_TYPE_BP4_AP,
 	FW_PORT_TYPE_QSFP_10G,
-	FW_PORT_TYPE_QSFP,
 	FW_PORT_TYPE_QSA,
+	FW_PORT_TYPE_QSFP,
 	FW_PORT_TYPE_BP40_BA,
 
 	FW_PORT_TYPE_NONE = FW_PORT_CMD_PTYPE_M
-- 
1.7.1

^ permalink raw reply related

* GOOD DAY
From: Sage Mothibi @ 2014-12-17 10:25 UTC (permalink / raw)
  To: sagemothibi

[-- Attachment #1: Type: text/plain, Size: 66 bytes --]



Please view the attachment for more details
Thanks
Engr. Mothibi

[-- Attachment #2: Hello.pdf --]
[-- Type: application/pdf, Size: 180137 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: Bjørn Mork @ 2014-12-17 11:01 UTC (permalink / raw)
  To: rapier; +Cc: netdev
In-Reply-To: <549070CF.1010506@psc.edu>

rapier <rapier@psc.edu> writes:

> + * The Web10Gig project.  See http://www.web10gig.org

URL is already outdated?


Bjørn

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Michal Kubecek @ 2014-12-17  9:55 UTC (permalink / raw)
  To: Richard Laager; +Cc: netdev
In-Reply-To: <1418805852.5277.25.camel@watermelon.coderich.net>

On Wed, Dec 17, 2014 at 02:44:12AM -0600, Richard Laager wrote:
> Previous history of this thread:
> http://thread.gmane.org/gmane.linux.network/326672
> 
> On 2014-11-04 22:57:19, Tom Herbert wrote:
> > Using vlan and bonding? vlan_dev_hard_start_xmit called. A possible
> > cause is that bonding interface is out of sync with slave interface
> > w.r.t. GSO features. Do we know if this worked in 3.14, 3.15?
> 
> I'm seeing the same sort of crash/warning (skb_war_bad_offload). It's
> happening on Intel 10 Gig NICs using the ixgbe driver. I'm using bridges
> (for virtual machines) on top of VLANs on top of 802.3ad bonding. I'm
> using an MTU of 9000 on the bond0 interface, but 1500 everywhere else.
> 
> I'm always bonding two ports: one one system, I'm bonding two ports on
> identical one-port NICs; on another system, I'm bonding two ports on a
> single two-port NIC. Both systems exhibit the same behavior.
> 
> Everything has worked fine for a couple years on Ubuntu 12.04 Precise
> (Linux 3.2.0). It immediately broke when I upgraded to Ubuntu 14.04
> Trusty (Linux 3.13.0). I can also reproduce this using the packaged
> version of Linux 3.16.0 on Trusty.

Would it be possible that the kernel you are using has

  da08143b8520 ("vlan: more careful checksum features handling")

(and possibly also a9b3ace44c7d and 3625920b62c3) but not

  db115037bb57 ("net: fix checksum features handling in netif_skb_features()")

?

Michal Kubecek

^ permalink raw reply

* Re: [Question]The benefit of weight_p in __qdisc_run
From: Dennis Chen @ 2014-12-17  9:32 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <54914AEC.7070600@redhat.com>

On Wed, Dec 17, 2014 at 5:20 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 12/17/2014 09:54 AM, Dennis Chen wrote:
>>
>> weight_p is used as the burst xmit packet quota in the while loop of
>> the __qdisc_run function,
>> does anybody can elaborate the benefit of the weight_p introduced
>> here? what's the consequence without it?
>
>
> It acts as a quota to introduce fairness among qdiscs. See also slide 7
> onwards for experiments with/without it:
>
>   http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf

Dan, I really do appreciate for your answer :)

-- 
Den

^ permalink raw reply

* Re: [Question]The benefit of weight_p in __qdisc_run
From: Daniel Borkmann @ 2014-12-17  9:20 UTC (permalink / raw)
  To: Dennis Chen; +Cc: netdev
In-Reply-To: <CA+U0gVhk+4-T=XuCumnpRnRyQ=OqVfegGxF3ZZQy07++PDOW3g@mail.gmail.com>

On 12/17/2014 09:54 AM, Dennis Chen wrote:
> weight_p is used as the burst xmit packet quota in the while loop of
> the __qdisc_run function,
> does anybody can elaborate the benefit of the weight_p introduced
> here? what's the consequence without it?

It acts as a quota to introduce fairness among qdiscs. See also slide 7
onwards for experiments with/without it:

   http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Richard Laager @ 2014-12-17  8:44 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 24114 bytes --]

Previous history of this thread:
http://thread.gmane.org/gmane.linux.network/326672

On 2014-11-04 22:57:19, Tom Herbert wrote:
> Using vlan and bonding? vlan_dev_hard_start_xmit called. A possible
> cause is that bonding interface is out of sync with slave interface
> w.r.t. GSO features. Do we know if this worked in 3.14, 3.15?

I'm seeing the same sort of crash/warning (skb_war_bad_offload). It's
happening on Intel 10 Gig NICs using the ixgbe driver. I'm using bridges
(for virtual machines) on top of VLANs on top of 802.3ad bonding. I'm
using an MTU of 9000 on the bond0 interface, but 1500 everywhere else.

I'm always bonding two ports: one one system, I'm bonding two ports on
identical one-port NICs; on another system, I'm bonding two ports on a
single two-port NIC. Both systems exhibit the same behavior.

Everything has worked fine for a couple years on Ubuntu 12.04 Precise
(Linux 3.2.0). It immediately broke when I upgraded to Ubuntu 14.04
Trusty (Linux 3.13.0). I can also reproduce this using the packaged
version of Linux 3.16.0 on Trusty.

In contrast to other reports of this bug, disabling scatter gather on
the physical interfaces (e.g. eth0) does *not* stop the crashes
(assuming I disabled it correctly).

I currently have two systems (one with Precise, one with Trusty)
available to do any testing that you'd find helpful.

Here's a first pass at getting some debugging data.

The broken system (Ubuntu 14.04 Trusty):

rlaager@BROKEN:~$ uname -a
Linux BROKEN 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

rlaager@BROKEN:~$ ethtool -k p6p1
Features for p6p1:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: on [fixed]
	tx-checksum-sctp: on
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off

rlaager@BROKEN:~$ ethtool -k bond0
Features for bond0:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]

rlaager@BROKEN:~$ ethtool -k br7
Features for br7:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [requested on]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]

rlaager@BROKEN:~$ lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0d.0 Host bridge: Intel Corporation Device 343a (rev 22)
00:0d.1 Host bridge: Intel Corporation Device 343b (rev 22)
00:0d.2 Host bridge: Intel Corporation Device 343c (rev 22)
00:0d.3 Host bridge: Intel Corporation Device 343d (rev 22)
00:0d.4 Host bridge: Intel Corporation 7500/5520/5500/X58 Physical Layer Port 0 (rev 22)
00:0d.5 Host bridge: Intel Corporation 7500/5520/5500 Physical Layer Port 1 (rev 22)
00:0d.6 Host bridge: Intel Corporation Device 341a (rev 22)
00:0e.0 Host bridge: Intel Corporation Device 341c (rev 22)
00:0e.1 Host bridge: Intel Corporation Device 341d (rev 22)
00:0e.2 Host bridge: Intel Corporation Device 341e (rev 22)
00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 22)
00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
fe:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
fe:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
fe:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
fe:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
fe:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
fe:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
fe:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
fe:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
fe:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
fe:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
fe:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
fe:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
fe:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
fe:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
fe:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
fe:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
fe:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
fe:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
fe:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
fe:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
fe:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
fe:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
fe:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
fe:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)
ff:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
ff:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
ff:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
ff:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
ff:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
ff:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
ff:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
ff:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
ff:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
ff:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
ff:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
ff:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
ff:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
ff:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
ff:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
ff:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
ff:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
ff:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
ff:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
ff:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
ff:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
ff:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
ff:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
ff:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)


The working system (Ubuntu 12.04 Precise):

rlaager@WORKING:~$ uname -a
Linux WORKING 3.2.0-74-generic #109-Ubuntu SMP Tue Dec 9 16:45:49 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

rlaager@WORKING:~$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on

rlaager@WORKING:~$ ethtool -k bond0
Offload parameters for bond0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

rlaager@WORKING:~$ ethtool -k br7
Offload parameters for br7:
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on
ntuple-filters: off




A stack trace from 3.13.0 (the default kernel in Ubuntu Trusty):

[ 1161.275007] WARNING: CPU: 7 PID: 0 at /build/buildd/linux-3.13.0/net/core/dev.c:2224 skb_warn_bad_offload+0xcd/0xda()
[ 1161.275011] : caps=(0x00000022000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[ 1161.275012] Modules linked in: nfsv3 ipmi_devintf ipmi_si vhost_net vhost macvtap macvlan bridge ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_mul
 mrp xt_addrtype llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_
ch intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw joydev i7core_eda
id nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ixgbe raid6_pq dca hid_generic raid1 ptp mpt2sas
smouse hid libahci scsi_transport_sas mdio linear
[ 1161.275077] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W     3.13.0-43-generic #72-Ubuntu
[ 1161.275079] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a    09/14/2010
[ 1161.275080]  0000000000000009 ffff880c3fc239d8 ffffffff81720bf6 ffff880c3fc23a20
[ 1161.275085]  ffff880c3fc23a10 ffffffff810677cd ffff880c1d3b9600 ffff880618e08000
[ 1161.275089]  0000000000000001 0000000000000001 ffff880c1d3b9600 ffff880c3fc23a70
[ 1161.275092] Call Trace:
[ 1161.275094]  <IRQ>  [<ffffffff81720bf6>] dump_stack+0x45/0x56
[ 1161.275101]  [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
[ 1161.275105]  [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
[ 1161.275109]  [<ffffffff8136a0a3>] ? ___ratelimit+0x93/0x100
[ 1161.275113]  [<ffffffff81723afe>] skb_warn_bad_offload+0xcd/0xda
[ 1161.275118]  [<ffffffff81626489>] __skb_gso_segment+0x79/0xb0
[ 1161.275122]  [<ffffffff8162677a>] dev_hard_start_xmit+0x18a/0x560
[ 1161.275126]  [<ffffffff81098209>] ? ttwu_do_wakeup+0x19/0xc0
[ 1161.275129]  [<ffffffff8164594e>] sch_direct_xmit+0xee/0x1c0
[ 1161.275133]  [<ffffffff81626d80>] __dev_queue_xmit+0x230/0x500
[ 1161.275137]  [<ffffffff81627060>] dev_queue_xmit+0x10/0x20
[ 1161.275143]  [<ffffffffa04ab31b>] br_dev_queue_push_xmit+0x7b/0xc0 [bridge]
[ 1161.275149]  [<ffffffffa04ab532>] br_forward_finish+0x22/0x60 [bridge]
[ 1161.275155]  [<ffffffffa04ab710>] __br_forward+0x80/0xf0 [bridge]
[ 1161.275161]  [<ffffffffa04ab9bb>] br_forward+0x8b/0xa0 [bridge]
[ 1161.275167]  [<ffffffffa04ac6d9>] br_handle_frame_finish+0x149/0x3d0 [bridge]
[ 1161.275173]  [<ffffffffa04acad5>] br_handle_frame+0x175/0x250 [bridge]
[ 1161.275177]  [<ffffffff81624ac2>] __netif_receive_skb_core+0x262/0x840
[ 1161.275181]  [<ffffffff8101b700>] ? check_tsc_unstable+0x10/0x10
[ 1161.275184]  [<ffffffff816250b8>] __netif_receive_skb+0x18/0x60
[ 1161.275188]  [<ffffffff81625123>] netif_receive_skb+0x23/0x90
[ 1161.275192]  [<ffffffff81625b70>] napi_gro_receive+0x80/0xb0
[ 1161.275202]  [<ffffffffa014009c>] ixgbe_clean_rx_irq+0x7ac/0xb10 [ixgbe]
[ 1161.275211]  [<ffffffffa0141140>] ixgbe_poll+0x460/0x800 [ixgbe]
[ 1161.275216]  [<ffffffff816254a2>] net_rx_action+0x152/0x250
[ 1161.275220]  [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
[ 1161.275223]  [<ffffffff8106d165>] irq_exit+0x105/0x110
[ 1161.275227]  [<ffffffff817339e6>] do_IRQ+0x56/0xc0
[ 1161.275231]  [<ffffffff817290ed>] common_interrupt+0x6d/0x6d
[ 1161.275232]  <EOI>  [<ffffffff815d361f>] ? cpuidle_enter_state+0x4f/0xc0
[ 1161.275240]  [<ffffffff815d3749>] cpuidle_idle_call+0xb9/0x1f0
[ 1161.275244]  [<ffffffff8101d35e>] arch_cpu_idle+0xe/0x30
[ 1161.275247]  [<ffffffff810bef35>] cpu_startup_entry+0xc5/0x290
[ 1161.275251]  [<ffffffff810413ed>] start_secondary+0x21d/0x2d0


A stack trace from 3.16.0 (still on Ubuntu Trusty):

[  120.376026] WARNING: CPU: 6 PID: 0 at /build/buildd/linux-lts-utopic-3.16.0/net/core/dev.c:2246 skb_warn_bad_offload+0xcd/0xda()
[  120.376029] : caps=(0x00000080000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[  120.376030] Modules linked in: nfsv3 ipmi_devintf ipmi_si ipmi_msghandler vhost_net vhost macvtap macvlan bridge 8021q garp stp mrp llc bonding ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_multiport xt_recent xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_powerclamp coretemp kvm_intel gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lpc_ich joydev i7core_edac ioatdma edac_core nfsd auth_rpcgss mac_hid nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq ixgbe usbhid raid1 mpt2sas dca ahci raid0 ptp raid_class pps_core scsi_transport_sas multipath hid mdio libahci linear
[  120.376085] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.16.0-28-generic #37-Ubuntu
[  120.376086] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a    09/14/2010
[  120.376088]  0000000000000009 ffff880c3fc039b8 ffffffff81762220 ffff880c3fc03a00
[  120.376090]  ffff880c3fc039f0 ffffffff8106dd2d ffff880c1ac99a00 ffff88061c2fc000
[  120.376092]  0000000000000001 0000000000000001 ffff880c1ac99a00 ffff880c3fc03a50
[  120.376094] Call Trace:
[  120.376096]  <IRQ>  [<ffffffff81762220>] dump_stack+0x45/0x56
[  120.376105]  [<ffffffff8106dd2d>] warn_slowpath_common+0x7d/0xa0
[  120.376107]  [<ffffffff8106dd9c>] warn_slowpath_fmt+0x4c/0x50
[  120.376111]  [<ffffffff8138b153>] ? ___ratelimit+0x93/0x100
[  120.376114]  [<ffffffff817654da>] skb_warn_bad_offload+0xcd/0xda
[  120.376119]  [<ffffffff81661d29>] __skb_gso_segment+0x79/0xb0
[  120.376122]  [<ffffffff81662052>] dev_hard_start_xmit+0x182/0x5c0
[  120.376125]  [<ffffffff8168337e>] sch_direct_xmit+0xee/0x1c0
[  120.376127]  [<ffffffff81662690>] __dev_queue_xmit+0x200/0x4d0
[  120.376129]  [<ffffffff81662970>] dev_queue_xmit+0x10/0x20
[  120.376135]  [<ffffffffc0796ac8>] br_dev_queue_push_xmit+0x68/0xa0 [bridge]
[  120.376138]  [<ffffffffc0796cd2>] br_forward_finish+0x22/0x60 [bridge]
[  120.376142]  [<ffffffffc0796e90>] __br_forward+0x80/0xf0 [bridge]
[  120.376145]  [<ffffffffc079713b>] br_forward+0x8b/0xa0 [bridge]
[  120.376149]  [<ffffffffc0797fb9>] br_handle_frame_finish+0x139/0x3c0 [bridge]
[  120.376153]  [<ffffffffc079838e>] br_handle_frame+0x14e/0x240 [bridge]
[  120.376155]  [<ffffffff81660102>] __netif_receive_skb_core+0x1b2/0x790
[  120.376158]  [<ffffffff8101bcd9>] ? read_tsc+0x9/0x20
[  120.376161]  [<ffffffff816606f8>] __netif_receive_skb+0x18/0x60
[  120.376163]  [<ffffffff81660763>] netif_receive_skb_internal+0x23/0x90
[  120.376165]  [<ffffffff816612c0>] napi_gro_receive+0xc0/0xf0
[  120.376174]  [<ffffffffc03007ac>] ixgbe_clean_rx_irq+0x7bc/0xb40 [ixgbe]
[  120.376180]  [<ffffffffc03018a2>] ixgbe_poll+0x482/0x850 [ixgbe]
[  120.376183]  [<ffffffff8109e9e9>] ? ttwu_do_wakeup+0x19/0xc0
[  120.376186]  [<ffffffff81660b52>] net_rx_action+0x152/0x250
[  120.376189]  [<ffffffff81073055>] __do_softirq+0xf5/0x2e0
[  120.376191]  [<ffffffff81073515>] irq_exit+0x105/0x110
[  120.376194]  [<ffffffff8176d748>] do_IRQ+0x58/0xf0
[  120.376198]  [<ffffffff8176b5ed>] common_interrupt+0x6d/0x6d
[  120.376199]  <EOI>  [<ffffffff815fb83f>] ? cpuidle_enter_state+0x4f/0xc0
[  120.376204]  [<ffffffff815fb838>] ? cpuidle_enter_state+0x48/0xc0
[  120.376206]  [<ffffffff815fb967>] cpuidle_enter+0x17/0x20
[  120.376209]  [<ffffffff810b527d>] cpu_startup_entry+0x31d/0x450
[  120.376213]  [<ffffffff810e028d>] ? tick_check_new_device+0xdd/0xf0
[  120.376216]  [<ffffffff8104520d>] start_secondary+0x21d/0x2e0
[  120.376217] ---[ end trace 90d53a2c9c47f360 ]---

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [Question]The benefit of weight_p in __qdisc_run
From: Dennis Chen @ 2014-12-17  8:54 UTC (permalink / raw)
  To: netdev

weight_p is used as the burst xmit packet quota in the while loop of
the __qdisc_run function,
does anybody can elaborate the benefit of the weight_p introduced
here? what's the consequence without it?

Thanks!

-- 
Den

^ permalink raw reply

* Re: [PATCH netfilter-next] xt_osf: Use continue to reduce indentation
From: Evgeniy Polyakov @ 2014-12-17  8:51 UTC (permalink / raw)
  To: Joe Perches
  Cc: Pablo Neira Ayuso, Patrick McHardy, Jozsef Kadlecsik,
	netfilter-devel, netdev, LKML
In-Reply-To: <1418761033.14140.5.camel@perches.com>

Hi everyone

16.12.2014, 23:17, "Joe Perches" <joe@perches.com>:
> Invert logic in test to use continue.
>
> This routine already uses continue, use it a bit more to
> minimize > 80 column long lines and unnecessary indentation.
>
> No change in compiled object file.

Looks good. Thank you.
Which tree should this patch go through? Please pull it in.

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox