Netdev List
 help / color / mirror / Atom feed
* [PATCH net 2/2] geneve: Fix races between socket add and release.
From: Jesse Gross @ 2014-12-17  2:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Andy Zhou
In-Reply-To: <1418783132-99230-1-git-send-email-jesse@nicira.com>

Currently, searching for a socket to add a reference to is not
synchronized with deletion of sockets. This can result in use
after free if there is another operation that is removing a
socket at the same time. Solving this requires both holding the
appropriate lock and checking the refcount to ensure that it
has not already hit zero.

Inspired by a related (but not exactly the same) issue in the
VXLAN driver.

Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 net/ipv4/geneve.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
index 5a47188..95e47c9 100644
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -296,6 +296,7 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
 				    geneve_rcv_t *rcv, void *data,
 				    bool no_share, bool ipv6)
 {
+	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_sock *gs;
 
 	gs = geneve_socket_create(net, port, rcv, data, ipv6);
@@ -305,15 +306,15 @@ struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
 	if (no_share)	/* Return error if sharing is not allowed. */
 		return ERR_PTR(-EINVAL);
 
+	spin_lock(&gn->sock_lock);
 	gs = geneve_find_sock(net, port);
-	if (gs) {
-		if (gs->rcv == rcv)
-			atomic_inc(&gs->refcnt);
-		else
+	if (gs && ((gs->rcv != rcv) ||
+		   !atomic_add_unless(&gs->refcnt, 1, 0)))
 			gs = ERR_PTR(-EBUSY);
-	} else {
+	spin_unlock(&gn->sock_lock);
+
+	if (!gs)
 		gs = ERR_PTR(-EINVAL);
-	}
 
 	return gs;
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 1/2] geneve: Remove socket and offload handlers at destruction.
From: Jesse Gross @ 2014-12-17  2:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Andy Zhou

Sockets aren't currently removed from the the global list when
they are destroyed. In addition, offload handlers need to be cleaned
up as well.

Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 net/ipv4/geneve.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
index a457232..5a47188 100644
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -159,6 +159,15 @@ static void geneve_notify_add_rx_port(struct geneve_sock *gs)
 	}
 }
 
+static void geneve_notify_del_rx_port(struct geneve_sock *gs)
+{
+	struct sock *sk = gs->sock->sk;
+	sa_family_t sa_family = sk->sk_family;
+
+	if (sa_family == AF_INET)
+		udp_del_offload(&gs->udp_offloads);
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -312,9 +321,17 @@ EXPORT_SYMBOL_GPL(geneve_sock_add);
 
 void geneve_sock_release(struct geneve_sock *gs)
 {
+	struct net *net = sock_net(gs->sock->sk);
+	struct geneve_net *gn = net_generic(net, geneve_net_id);
+
 	if (!atomic_dec_and_test(&gs->refcnt))
 		return;
 
+	spin_lock(&gn->sock_lock);
+	hlist_del_rcu(&gs->hlist);
+	geneve_notify_del_rx_port(gs);
+	spin_unlock(&gn->sock_lock);
+
 	queue_work(geneve_wq, &gs->del_work);
 }
 EXPORT_SYMBOL_GPL(geneve_sock_release);
-- 
1.9.1

^ permalink raw reply related

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Alexei Starovoitov @ 2014-12-17  3:06 UTC (permalink / raw)
  To: Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 5:30 PM, Martin Lau <kafai@fb.com> wrote:
>> >> >> I think systemtap like scripting on top of patches 1 and 3
>> >> >> should solve your use case ?
>> > We have quite a few different versions running in the production.  It may not
>> > be operationally easy.
>>
>> different versions of kernel or different versions of tcp_tracer ?
> Former and we are releasing new kernel pretty often.

I see. So for dynamic tracer to be useful in such environment,
the scripts should be compatible across different kernel version
without recompilation. All makes sense.

> How does the current TRACE_EVENT do it when it wants to printf more data?

tracepoints, like any other user interface, shouldn't
break compatibility. With printf it's practically impossible.
Some subsystems may be breaking this rule arguing that
tracepoints is a debug facility, but networking tracepoints don't change.

>> It feels that for stats collection only, tracepoints+tcp_trace
>> do not add much additional value vs extending tcp_info
>> and using ss.
> I think we are on the same page. Once 'this should cost nothing if not
> activated' proposition was cleared out.  It was what I meant that doing the
> collection part in the TCP itself (instead of tracepoints) would be nice.

agree.

> I think going forward, as others have suggested, it may be better to come
> together and reach a common ground on what to collect first before I re-work
> patch 1 to 3 and repost.

I think as a minimum it will be discussed at netdev01 in Feb,
but I suspect not everyone on this list can(want) go to Ottawa,
so would be nice to have a meetup for bay area folks to
discuss this sooner with public g+ hangout.
Thoughts?

^ permalink raw reply

* Re: [PATCH 0/5] tun/macvtap: TUNSETIFF fixes
From: Jason Wang @ 2014-12-17  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, David Miller, netdev, Dan Carpenter
In-Reply-To: <1418732988-3535-1-git-send-email-mst@redhat.com>



On Tue, Dec 16, 2014 at 9:04 PM, Michael S. Tsirkin <mst@redhat.com> 
wrote:
> Dan Carpenter reported the following:
> 	static checker warning:
> 
> 		drivers/net/tun.c:1694 tun_set_iff()
> 		warn: 0x17100 is larger than 16 bits
> 
> 	drivers/net/tun.c
> 	  1692
> 	  1693          tun->flags = (tun->flags & ~TUN_FEATURES) |
> 	  1694                  (ifr->ifr_flags & TUN_FEATURES);
> 	  1695
> 
> 	It's complaining because the "ifr->ifr_flags" variable is a short
> 	(should it be unsigned?).  The new define:
> 
> 	#define IFF_VNET_LE    0x10000
> 
> 	doesn't fit in two bytes.  Other suspect looking code could be:
> 
> 		return __virtio16_to_cpu(q->flags & IFF_VNET_LE, val);
> 
> And that's true: we have run out of IFF flags in tun.

I don't have objections on this series.
Just note that we still have several bits available.
> 
> So let's not try to add more: add simple GET/SET ioctls
> instead. Easy to test, leads to clear semantics.
> 
> Alternatively we'll have to revert the whole thing for 3.19,
> but that seems more work as this has dependencies
> in other places.
> 
> While here, I noticed that macvtap was actually reading
> ifreq flags as a 32 bit field.
> Fix that up as well.
> 
> Michael S. Tsirkin (5):
>   macvtap: fix uninitialized access on TUNSETIFF
>   if_tun: add TUNSETVNETLE/TUNGETVNETLE
>   tun: drop broken IFF_VNET_LE
>   macvtap: drop broken IFF_VNET_LE
>   if_tun: drop broken IFF_VNET_LE
> 
>  include/uapi/linux/if_tun.h |  3 ++-
>  drivers/net/macvtap.c       | 30 ++++++++++++++++++++++++------
>  drivers/net/tun.c           | 26 +++++++++++++++++++++++---
>  3 files changed, 49 insertions(+), 10 deletions(-)
> 
> -- 
> MST
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/5] tun: drop broken IFF_VNET_LE
From: Jason Wang @ 2014-12-17  3:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, David Miller, netdev, Dan Carpenter, Herbert Xu,
	Tom Herbert, Ben Hutchings, Xi Wang, Masatake YAMATO
In-Reply-To: <1418732988-3535-4-git-send-email-mst@redhat.com>



On Tue, Dec 16, 2014 at 9:05 PM, Michael S. Tsirkin <mst@redhat.com> 
wrote:
> Use TUNSETVNETLE/TUNGETVNETLE instead.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/tun.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index c052bd6b..e3e8a0e 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -109,9 +109,11 @@ do {								\
>   * overload it to mean fasync when stored there.
>   */
>  #define TUN_FASYNC	IFF_ATTACH_QUEUE
> +/* High bits in flags field are unused. */
> +#define TUN_VNET_LE     0x80000000
>  
>  #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
> -		      IFF_VNET_LE | IFF_MULTI_QUEUE)
> +		      IFF_MULTI_QUEUE)
>  #define GOODCOPY_LEN 128
>  
>  #define FLT_EXACT_COUNT 8
> @@ -207,12 +209,12 @@ struct tun_struct {
>  
>  static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 
> val)
>  {
> -	return __virtio16_to_cpu(tun->flags & IFF_VNET_LE, val);
> +	return __virtio16_to_cpu(tun->flags & TUN_VNET_LE, val);
>  }
>  
>  static inline __virtio16 cpu_to_tun16(struct tun_struct *tun, u16 
> val)
>  {
> -	return __cpu_to_virtio16(tun->flags & IFF_VNET_LE, val);
> +	return __cpu_to_virtio16(tun->flags & TUN_VNET_LE, val);
>  }
>  
>  static inline u32 tun_hashfn(u32 rxhash)
> @@ -1853,6 +1855,7 @@ static long __tun_chr_ioctl(struct file *file, 
> unsigned int cmd,
>  	int sndbuf;
>  	int vnet_hdr_sz;
>  	unsigned int ifindex;
> +	int le;
>  	int ret;
>  
>  	if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 
> 0x89) {
> @@ -2052,6 +2055,23 @@ static long __tun_chr_ioctl(struct file *file, 
> unsigned int cmd,
>  		tun->vnet_hdr_sz = vnet_hdr_sz;
>  		break;
>  
> +	case TUNGETVNETLE:
> +		le = !!(tun->flags & TUN_VNET_LE);
> +		if (put_user(le, (int __user *)argp))
> +			ret = -EFAULT;
> +		break;
> +
> +	case TUNSETVNETLE:
> +		if (get_user(le, (int __user *)argp)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +		if (le)
> +			tun->flags |= TUN_VNET_LE;
> +		else
> +			tun->flags &= ~TUN_VNET_LE;
> +		break;
> +

A little bit different from persistent devices:

- TUNSETPERSIST check argp instead
- Userspace may check persist flags through TUNGETIFF

Probably this patch may needs more modifications on userspace.
> 
>  	case TUNATTACHFILTER:
>  		/* Can be set only for TAPs */
>  		ret = -EINVAL;
> -- 
> MST
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: Andi Kleen @ 2014-12-17  3:44 UTC (permalink / raw)
  To: rapier; +Cc: netdev
In-Reply-To: <549070D3.5050808@psc.edu>

rapier <rapier@psc.edu> writes:
> +
> +void tcp_estats_update_rtt(struct sock *sk, unsigned long rtt_sample)
> +{
> +	struct tcp_estats *stats = tcp_sk(sk)->tcp_stats;
> +	struct tcp_estats_path_table *path_table = stats->tables.path_table;
> +	unsigned long rtt_sample_msec = rtt_sample/1000;
> +	u32 rto;
> +
> +	if (path_table == NULL)
> +		return;
> +
> +	path_table->SampleRTT = rtt_sample_msec;
> +
> +	if (rtt_sample_msec > path_table->MaxRTT)
> +		path_table->MaxRTT = rtt_sample_msec;
> +	if (rtt_sample_msec < path_table->MinRTT)
> +		path_table->MinRTT = rtt_sample_msec;
> +
> +	path_table->CountRTT++;
> +	path_table->SumRTT += rtt_sample_msec;
> +
> +	rto = jiffies_to_msecs(inet_csk(sk)->icsk_rto);
> +	if (rto > path_table->MaxRTO)
> +		path_table->MaxRTO = rto;
> +	if (rto < path_table->MinRTO)
> +		path_table->MinRTO = rto;

Looking through your hooks it seem that many basically do simple
value profiling in a very open coded way.

Perhaps you could simplify things a lot by just having a couple of trace
points for these values (e.g. trace_change_rtt). Then have a library
of different data profiling types.

Then you could register a new value oriented trace point type with
different backend for whatever you currently need from the value: like
min/max/avg/ or full histogram or even reservoir sampling or EWMA.

I guess such a generic infrastructure would be useful elsewhere too.

One challenge would be how to associate such value profiles with
sockets, but I'm sure this could be done in some nice generic
way too.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* panic related to b44_poll
From: Shivaram Lingamneni @ 2014-12-17  5:20 UTC (permalink / raw)
  To: netdev; +Cc: zambrano

I'm experiencing a kernel panic, which I believe is caused by the b44
driver and triggered by the b44_poll call. Here are some pictures of
the panic on the 3.18 mainline kernel:

http://i.imgur.com/v4YPMei.jpg
http://i.imgur.com/8b6Sttw.jpg

Some additional information, including `lshw` output and a theory
about the conditions that cause the panic, is on the kernel bugzilla
issue here:

https://bugzilla.kernel.org/show_bug.cgi?id=89611

I have some additional photos with varying call traces on the RH
bugzilla issue here:

https://bugzilla.redhat.com/show_bug.cgi?id=1147321

however, the traces linked there are against the Fedora kernel, not
against the mainline kernel.

I'm not subscribed to the netdev list, so please cc me on responses.
Thanks very much for your time!

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Roopa Prabhu @ 2014-12-17  5:51 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: John Fastabend, Hubert Sokolowski, netdev@vger.kernel.org,
	Vlad Yasevich
In-Reply-To: <54902E5E.2070405@mojatatu.com>

On 12/16/14, 5:06 AM, Jamal Hadi Salim wrote:
> On 12/15/14 19:45, John Fastabend wrote:
>> On 12/15/2014 06:29 AM, Jamal Hadi Salim wrote:
>
>>
>> hmm good question. When I implemented this on the host nics with SR-IOV,
>> VMDQ, etc. The multi/unicast addresses were propagated into the FDB by
>> the driver.
>
> So if i understand correctly, this is a NIC with an FDB. And there is no
> concept of a bridge to which it is attached. To the point of
> classical uni/multicast addresses on a netdev abstraction; these
> are typically stored in *much simpler tables* (used to be IO
> registers back in the day)
> Do these NICs not have such a concept?
> An fdb entry has an egress port column; I have seen cases where the
> port is labeled as "Cpu port" which would mean it belongs to the host;
> but in this case it just seems there is no such concept and as Or
> brought up in another email - what does "VLANid" mean in such a case?
> If we go with a CPU port concept,
> We could then use the concept of a vlan filter on a port basis
> but then what happens when you dont have an fdb (majority of cases)?
>
>> My logic was if some netdev ethx has a set of MAC addresses
>> above it well then any virtual function or virtual device also behind
>> the hardware shouldn't be sending those addresses out the egress switch
>> facing port. Otherwise the switch will see packets it knows are behind
>> that port and drop them. Or flood them if it hasn't learned the address
>> yet. Either way they will never get to the right netdev.
>>
>> Admittedly I wasn't thinking about switches with many ports at the time.
>>
>
> I often struggle with trying to "box" SRIOV into some concept of a
> switch abstraction and sometimes i am puzzled.
> Would exposing the SRIOV underlay as a switch not have solved this
> problem? Then the virtual ports essentially are bridge ports.
> Maybe what we need is a concept of a "edge relay" extended netdev?
> These things would have an fdb as well down and uplink relay ports that
> can be attached to them.
>
>
>>> Some of these drivers may be just doing the LinuxWay(aka cutnpaste what
>>> the other driver did).
>>
>> My original thinking here was... if it didn't implement fdb_add, fdb_del
>> and fdb_dump then if you wanted to think of it as having forwarding
>> database that was fine but it was really just a two port mac relay. In
>> which case just dump all the mac addresses it knows about. In this case
>> if it was something more fancy it could do its own dump like vxlan or
>> macvlan.
>>
>
> The challenge here is lack of separation between a NICs uni/multicast
> ports which it owns - which is a traditional operation regardless of
> what capabilities the NIC has; vs an fdb which has may have many
> other capabilities. Probably all NICs capable of many MACs implement
> fdbs?
>
>> For a host nic ucast/multicast and fdb are the same, I think? The
>> code we had was just short-hand to allow the common case a host nic
>> to work. Notice vxlan and bridge drivers didn't dump there addr lists
>> from fdb_dump until your patch.
>>
>> Perhaps my implementation of macvlan fdb_{add|del|dump} is buggy. And
>> I shouldn't overload the addr lists.
>>
>
> Not just those - I am wondering about the general utility of what
> Hubert was trying to do if all the driver does is call the default
> dumper based on some flags presence and the default dumper
> does a dump of uni/multicast host entries. Those are not really fdb
> entries in the traditional sense.
> Is there no way to get the unicast/multicast mac addresses for such
> a driver?
> I think that would help bring clarity to my confusion.
>
>
>>
>> I'm interested to see what Vlad says as well. But the current situation
>> is previously some drivers dumped their addr lists others didn't.
>> Specifically, the more switch like devices (bridge, vxlan) didn't. Now
>> every device will dump the addr lists. I'm not entirely convinced that
>> is correct.
>>
>
> I am glad this happened ;-> Otherwise we wouldnt be having this
> discussion. When Vlad was asking me I was in a rush to get the patch
> out and didnt question because i thought this was something some crazy
> virtualization people needed.
> If Vlad's use case goes away, then Hubert's little restoration is fine.
>
>
>> It works OK for host nics (NICS that can't forward between ports) and
>> seems at best confusing for real switch asics.
>
> So if these NICs have fdb entries and i programmed it (meaning setting
> which port a given MAC should be sent to), would it not work?
>
>> On a related question do
>> you expect the switch asic to trap any packets with MAC addresses in
>> the multi/unicast address lists and send them to the correct netdev? Or
>> will the switch forward them using normal FDB tables?
>>
>
> I think there would be a separate table for that. Roopa, can you check
> with the ASICs you guys work on? 
Jamal, yes, AFAICS, we do have a separate table where we add some static 
entries
indicating send to  CPU (example IPV4 and IPV6 link local multicast) and 
such
packets are sent to the correct netdev

> The point i was trying to make above
> is today there is a uni/multicast list or table of sorts that all NICs
> expose.
> There's always the hack of a "cpu port". I have also seen the "cpu port"
> being conceptualized in L3 tables to imply "next hop is cpu" where you
> have an IP address owned by the host; so maybe we need a concept of a
> cpu port or again the revival of TheThing class device.
>
> cheers,
> jamal
>

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Roopa Prabhu @ 2014-12-17  5:54 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: John Fastabend, Jamal Hadi Salim, Hubert Sokolowski,
	netdev@vger.kernel.org, Vlad Yasevich
In-Reply-To: <549091DB.6050600@intel.com>

On 12/16/14, 12:11 PM, Samudrala, Sridhar wrote:
>
> On 12/16/2014 11:30 AM, Roopa Prabhu wrote:
>> On 12/16/14, 9:21 AM, Samudrala, Sridhar wrote:
>>>
>>> On 12/16/2014 8:35 AM, John Fastabend wrote:
>>>>
>>>>> Is there no way to get the unicast/multicast mac addresses for such
>>>>> a driver?
>>>>
>>>> You can almost infer it from ip link by looking at all the stacked
>>>> drivers and figuring out how the address are propagated down. Then
>>>> look at the routes and figure out multicast address. But other than
>>>> the fdb dump mechanism I don't think there is anything.
>>>
>>> It looks like we can get the device specific unicast/multicast mac 
>>> addresses via 'ip maddr' too.
>> if i remember correctly, 'ip maddr' was only for multicast list. And 
>> there was no way to dump the unicast list until bridge self was 
>> introduced.
>> the only way to dump unicast addresses today is by using the `bridge 
>> fdb show self`
> Yes. 'ip maddr show' only lists the multicast macs as the name 
> suggests. I stand corrected.
> May be we need 'ip uaddr show' to list unicast macs instead of 
> overloading 'bridge fdb show' to show unicast lists.
>
maybe too late for that ..., 'bridge fdb show self' has been out for 
sometime now and in use

^ permalink raw reply

* Re: [PATCH] dm9000: Add regulator and reset support to dm9000
From: Sascha Hauer @ 2014-12-17  6:19 UTC (permalink / raw)
  To: Zubair Lutfullah Kakakhel
  Cc: davem, devicetree, linux-kernel, netdev, paul.burton
In-Reply-To: <1418747624-2682-1-git-send-email-Zubair.Kakakhel@imgtec.com>

Hi Zubair,

Several comments inline.

On Tue, Dec 16, 2014 at 04:33:44PM +0000, Zubair Lutfullah Kakakhel wrote:
> In boards, the dm9000 chip's power and reset can be controlled by gpio.
> 
> It makes sense to add them to the dm9000 driver and let dt be used to
> enable power and reset the phy.
> 
> Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> ---
>  .../devicetree/bindings/net/davicom-dm9000.txt     |  4 +++
>  drivers/net/ethernet/davicom/dm9000.c              | 33 ++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/davicom-dm9000.txt b/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> index 28767ed..dba19a2 100644
> --- a/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> +++ b/Documentation/devicetree/bindings/net/davicom-dm9000.txt
> @@ -11,6 +11,8 @@ Required properties:
>  Optional properties:
>  - davicom,no-eeprom : Configuration EEPROM is not available
>  - davicom,ext-phy : Use external PHY
> +- reset-gpio : phandle of gpio that will be used to reset chip during probe
> +- vcc-supply : phandle of regulator that will be used to enable power to chip
>  
>  Example:
>  
> @@ -21,4 +23,6 @@ Example:
>  		interrupts = <7 4>;
>  		local-mac-address = [00 00 de ad be ef];
>  		davicom,no-eeprom;
> +		reset-gpio = <&gpf 12 GPIO_ACTIVE_LOW>;
> +		vcc-supply = <&eth0_power>;
>  	};
> diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
> index ef0bb58..7333b8d 100644
> --- a/drivers/net/ethernet/davicom/dm9000.c
> +++ b/drivers/net/ethernet/davicom/dm9000.c
> @@ -36,6 +36,9 @@
>  #include <linux/platform_device.h>
>  #include <linux/irq.h>
>  #include <linux/slab.h>
> +#include <linux/regulator/consumer.h>
> +#include <linux/gpio.h>
> +#include <linux/of_gpio.h>
>  
>  #include <asm/delay.h>
>  #include <asm/irq.h>
> @@ -1426,11 +1429,41 @@ dm9000_probe(struct platform_device *pdev)
>  	struct dm9000_plat_data *pdata = dev_get_platdata(&pdev->dev);
>  	struct board_info *db;	/* Point a board information structure */
>  	struct net_device *ndev;
> +	struct device *dev = &pdev->dev;
>  	const unsigned char *mac_src;
>  	int ret = 0;
>  	int iosize;
>  	int i;
>  	u32 id_val;
> +	int reset_gpio;
> +	enum of_gpio_flags flags;
> +	struct regulator *power;
> +
> +	power = devm_regulator_get(dev, "vcc");
> +	if (IS_ERR(power)) {
> +		dev_dbg(dev, "no regulator provided\n");

You have to check for errors here. The return value can be -EPROBE_DEFER
in which case you have to return -EPROBE_DEFER from the driver and try
again later.

> +	} else if (!regulator_is_enabled(power)) {

You must enable the regulator unconditionally to increase the reference
counter. When this regulator is turned on because another consumer
enabled it then this other consumer can turn it off and the dm9000 stops
working.

> +		ret = regulator_enable(power);
> +		dev_dgb(dev, "regulator enabled\n");
> +	}
> +
> +	reset_gpio = of_get_named_gpio_flags(dev->of_node, "reset-gpio", 0,
> +					     &flags);

Should be reset-gpios (plural). For some reason this is the established
binding.

> +	if (gpio_is_valid(reset_gpio)) {
> +		ret = devm_gpio_request_one(dev, reset_gpio, flags,
> +					    "dm9000_reset");
> +		if (ret) {
> +			dev_err(dev, "failed to request reset gpio %d: %d\n",
> +				reset_gpio, ret);

I think this is fatal. When A gpio is not registered for this device
then this is fine, but when it's registered and you can't get it then
it's fatal.

> +		} else {
> +			gpio_direction_output(reset_gpio, 0);
> +			/* According to manual PWRST# Low Period Min 1ms */
> +			msleep(2);
> +			gpio_direction_output(reset_gpio, 1);

No need to set the direction again. gpio_set_value should be suffice
here.

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* [PATCH iproute2] ip link: use addattr_nest()/addattr_nest_end()
From: Duan Jiong @ 2014-12-17  7:28 UTC (permalink / raw)
  To: stephen hemminger; +Cc: netdev


Use addattr_nest() and addattr_nest_end() to simplify the code.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 ip/iplink.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index ce6eb3e..3ce5e39 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -706,11 +706,11 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 	}
 
 	if (type) {
-		struct rtattr *linkinfo = NLMSG_TAIL(&req.n);
+		struct rtattr *linkinfo;
 		char slavebuf[128], *ulinep = strchr(type, '_');
 		int iflatype;
 
-		addattr_l(&req.n, sizeof(req), IFLA_LINKINFO, NULL, 0);
+		linkinfo = addattr_nest(&req.n, sizeof(req), IFLA_LINKINFO);
 		addattr_l(&req.n, sizeof(req), IFLA_INFO_KIND, type,
 			 strlen(type));
 
@@ -728,14 +728,13 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 			iflatype = IFLA_INFO_DATA;
 		}
 		if (lu && argc) {
-			struct rtattr * data = NLMSG_TAIL(&req.n);
-			addattr_l(&req.n, sizeof(req), iflatype, NULL, 0);
+			struct rtattr *data = addattr_nest(&req.n, sizeof(req), iflatype);
 
 			if (lu->parse_opt &&
 			    lu->parse_opt(lu, argc, argv, &req.n))
 				return -1;
 
-			data->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)data;
+			addattr_nest_end(&req.n, data);
 		} else if (argc) {
 			if (matches(*argv, "help") == 0)
 				usage();
@@ -743,7 +742,7 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 					"Try \"ip link help\".\n", *argv);
 			return -1;
 		}
-		linkinfo->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)linkinfo;
+		addattr_nest_end(&req.n, linkinfo);
 	} else if (flags & NLM_F_CREATE) {
 		fprintf(stderr, "Not enough information: \"type\" argument "
 				"is required\n");
-- 
1.8.3.1

^ permalink raw reply related

* RE: Please update van uw Nederlandse e-mail
From: Hickey_Patrick @ 2014-12-17  7:48 UTC (permalink / raw)
  To: Hickey_Patrick
In-Reply-To: <4D40948C871CEF439CFB261A8DE84D5BD47B2863@ITEXCH02.asdk12.org>


________________________________
From: Hickey_Patrick
Sent: Tuesday, December 16, 2014 10:00 PM
To: Hickey_Patrick
Subject: Please update van uw Nederlandse e-mail

Uw postvak heeft overschreden opslaglimiet die wordt ingesteld door de beheerder, en u zal niet zitten kundig voor opkomend posterijen ontvangen totdat u opnieuw te valideren. Om te valideren opnieuw-> Klik hier
http://andre-post7.wix.com/verificatie-team

^ permalink raw reply

* Re: [PATCH netfilter-next] xt_osf: Use continue to reduce indentation
From: Evgeniy Polyakov @ 2014-12-17  8:51 UTC (permalink / raw)
  To: Joe Perches
  Cc: Pablo Neira Ayuso, Patrick McHardy, Jozsef Kadlecsik,
	netfilter-devel, netdev, LKML
In-Reply-To: <1418761033.14140.5.camel@perches.com>

Hi everyone

16.12.2014, 23:17, "Joe Perches" <joe@perches.com>:
> Invert logic in test to use continue.
>
> This routine already uses continue, use it a bit more to
> minimize > 80 column long lines and unnecessary indentation.
>
> No change in compiled object file.

Looks good. Thank you.
Which tree should this patch go through? Please pull it in.

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

^ permalink raw reply

* [Question]The benefit of weight_p in __qdisc_run
From: Dennis Chen @ 2014-12-17  8:54 UTC (permalink / raw)
  To: netdev

weight_p is used as the burst xmit packet quota in the while loop of
the __qdisc_run function,
does anybody can elaborate the benefit of the weight_p introduced
here? what's the consequence without it?

Thanks!

-- 
Den

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Richard Laager @ 2014-12-17  8:44 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 24114 bytes --]

Previous history of this thread:
http://thread.gmane.org/gmane.linux.network/326672

On 2014-11-04 22:57:19, Tom Herbert wrote:
> Using vlan and bonding? vlan_dev_hard_start_xmit called. A possible
> cause is that bonding interface is out of sync with slave interface
> w.r.t. GSO features. Do we know if this worked in 3.14, 3.15?

I'm seeing the same sort of crash/warning (skb_war_bad_offload). It's
happening on Intel 10 Gig NICs using the ixgbe driver. I'm using bridges
(for virtual machines) on top of VLANs on top of 802.3ad bonding. I'm
using an MTU of 9000 on the bond0 interface, but 1500 everywhere else.

I'm always bonding two ports: one one system, I'm bonding two ports on
identical one-port NICs; on another system, I'm bonding two ports on a
single two-port NIC. Both systems exhibit the same behavior.

Everything has worked fine for a couple years on Ubuntu 12.04 Precise
(Linux 3.2.0). It immediately broke when I upgraded to Ubuntu 14.04
Trusty (Linux 3.13.0). I can also reproduce this using the packaged
version of Linux 3.16.0 on Trusty.

In contrast to other reports of this bug, disabling scatter gather on
the physical interfaces (e.g. eth0) does *not* stop the crashes
(assuming I disabled it correctly).

I currently have two systems (one with Precise, one with Trusty)
available to do any testing that you'd find helpful.

Here's a first pass at getting some debugging data.

The broken system (Ubuntu 14.04 Trusty):

rlaager@BROKEN:~$ uname -a
Linux BROKEN 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

rlaager@BROKEN:~$ ethtool -k p6p1
Features for p6p1:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: on [fixed]
	tx-checksum-sctp: on
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off

rlaager@BROKEN:~$ ethtool -k bond0
Features for bond0:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]

rlaager@BROKEN:~$ ethtool -k br7
Features for br7:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [requested on]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]

rlaager@BROKEN:~$ lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0d.0 Host bridge: Intel Corporation Device 343a (rev 22)
00:0d.1 Host bridge: Intel Corporation Device 343b (rev 22)
00:0d.2 Host bridge: Intel Corporation Device 343c (rev 22)
00:0d.3 Host bridge: Intel Corporation Device 343d (rev 22)
00:0d.4 Host bridge: Intel Corporation 7500/5520/5500/X58 Physical Layer Port 0 (rev 22)
00:0d.5 Host bridge: Intel Corporation 7500/5520/5500 Physical Layer Port 1 (rev 22)
00:0d.6 Host bridge: Intel Corporation Device 341a (rev 22)
00:0e.0 Host bridge: Intel Corporation Device 341c (rev 22)
00:0e.1 Host bridge: Intel Corporation Device 341d (rev 22)
00:0e.2 Host bridge: Intel Corporation Device 341e (rev 22)
00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 22)
00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
fe:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
fe:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
fe:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
fe:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
fe:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
fe:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
fe:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
fe:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
fe:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
fe:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
fe:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
fe:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
fe:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
fe:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
fe:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
fe:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
fe:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
fe:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
fe:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
fe:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
fe:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
fe:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
fe:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
fe:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)
ff:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
ff:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
ff:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
ff:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
ff:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
ff:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
ff:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
ff:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
ff:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
ff:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
ff:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
ff:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
ff:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
ff:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
ff:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
ff:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
ff:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
ff:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
ff:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
ff:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
ff:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
ff:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
ff:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
ff:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)


The working system (Ubuntu 12.04 Precise):

rlaager@WORKING:~$ uname -a
Linux WORKING 3.2.0-74-generic #109-Ubuntu SMP Tue Dec 9 16:45:49 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

rlaager@WORKING:~$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on

rlaager@WORKING:~$ ethtool -k bond0
Offload parameters for bond0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

rlaager@WORKING:~$ ethtool -k br7
Offload parameters for br7:
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on
ntuple-filters: off




A stack trace from 3.13.0 (the default kernel in Ubuntu Trusty):

[ 1161.275007] WARNING: CPU: 7 PID: 0 at /build/buildd/linux-3.13.0/net/core/dev.c:2224 skb_warn_bad_offload+0xcd/0xda()
[ 1161.275011] : caps=(0x00000022000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[ 1161.275012] Modules linked in: nfsv3 ipmi_devintf ipmi_si vhost_net vhost macvtap macvlan bridge ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_mul
 mrp xt_addrtype llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_
ch intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw joydev i7core_eda
id nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ixgbe raid6_pq dca hid_generic raid1 ptp mpt2sas
smouse hid libahci scsi_transport_sas mdio linear
[ 1161.275077] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W     3.13.0-43-generic #72-Ubuntu
[ 1161.275079] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a    09/14/2010
[ 1161.275080]  0000000000000009 ffff880c3fc239d8 ffffffff81720bf6 ffff880c3fc23a20
[ 1161.275085]  ffff880c3fc23a10 ffffffff810677cd ffff880c1d3b9600 ffff880618e08000
[ 1161.275089]  0000000000000001 0000000000000001 ffff880c1d3b9600 ffff880c3fc23a70
[ 1161.275092] Call Trace:
[ 1161.275094]  <IRQ>  [<ffffffff81720bf6>] dump_stack+0x45/0x56
[ 1161.275101]  [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
[ 1161.275105]  [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
[ 1161.275109]  [<ffffffff8136a0a3>] ? ___ratelimit+0x93/0x100
[ 1161.275113]  [<ffffffff81723afe>] skb_warn_bad_offload+0xcd/0xda
[ 1161.275118]  [<ffffffff81626489>] __skb_gso_segment+0x79/0xb0
[ 1161.275122]  [<ffffffff8162677a>] dev_hard_start_xmit+0x18a/0x560
[ 1161.275126]  [<ffffffff81098209>] ? ttwu_do_wakeup+0x19/0xc0
[ 1161.275129]  [<ffffffff8164594e>] sch_direct_xmit+0xee/0x1c0
[ 1161.275133]  [<ffffffff81626d80>] __dev_queue_xmit+0x230/0x500
[ 1161.275137]  [<ffffffff81627060>] dev_queue_xmit+0x10/0x20
[ 1161.275143]  [<ffffffffa04ab31b>] br_dev_queue_push_xmit+0x7b/0xc0 [bridge]
[ 1161.275149]  [<ffffffffa04ab532>] br_forward_finish+0x22/0x60 [bridge]
[ 1161.275155]  [<ffffffffa04ab710>] __br_forward+0x80/0xf0 [bridge]
[ 1161.275161]  [<ffffffffa04ab9bb>] br_forward+0x8b/0xa0 [bridge]
[ 1161.275167]  [<ffffffffa04ac6d9>] br_handle_frame_finish+0x149/0x3d0 [bridge]
[ 1161.275173]  [<ffffffffa04acad5>] br_handle_frame+0x175/0x250 [bridge]
[ 1161.275177]  [<ffffffff81624ac2>] __netif_receive_skb_core+0x262/0x840
[ 1161.275181]  [<ffffffff8101b700>] ? check_tsc_unstable+0x10/0x10
[ 1161.275184]  [<ffffffff816250b8>] __netif_receive_skb+0x18/0x60
[ 1161.275188]  [<ffffffff81625123>] netif_receive_skb+0x23/0x90
[ 1161.275192]  [<ffffffff81625b70>] napi_gro_receive+0x80/0xb0
[ 1161.275202]  [<ffffffffa014009c>] ixgbe_clean_rx_irq+0x7ac/0xb10 [ixgbe]
[ 1161.275211]  [<ffffffffa0141140>] ixgbe_poll+0x460/0x800 [ixgbe]
[ 1161.275216]  [<ffffffff816254a2>] net_rx_action+0x152/0x250
[ 1161.275220]  [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
[ 1161.275223]  [<ffffffff8106d165>] irq_exit+0x105/0x110
[ 1161.275227]  [<ffffffff817339e6>] do_IRQ+0x56/0xc0
[ 1161.275231]  [<ffffffff817290ed>] common_interrupt+0x6d/0x6d
[ 1161.275232]  <EOI>  [<ffffffff815d361f>] ? cpuidle_enter_state+0x4f/0xc0
[ 1161.275240]  [<ffffffff815d3749>] cpuidle_idle_call+0xb9/0x1f0
[ 1161.275244]  [<ffffffff8101d35e>] arch_cpu_idle+0xe/0x30
[ 1161.275247]  [<ffffffff810bef35>] cpu_startup_entry+0xc5/0x290
[ 1161.275251]  [<ffffffff810413ed>] start_secondary+0x21d/0x2d0


A stack trace from 3.16.0 (still on Ubuntu Trusty):

[  120.376026] WARNING: CPU: 6 PID: 0 at /build/buildd/linux-lts-utopic-3.16.0/net/core/dev.c:2246 skb_warn_bad_offload+0xcd/0xda()
[  120.376029] : caps=(0x00000080000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[  120.376030] Modules linked in: nfsv3 ipmi_devintf ipmi_si ipmi_msghandler vhost_net vhost macvtap macvlan bridge 8021q garp stp mrp llc bonding ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_multiport xt_recent xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_powerclamp coretemp kvm_intel gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lpc_ich joydev i7core_edac ioatdma edac_core nfsd auth_rpcgss mac_hid nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq ixgbe usbhid raid1 mpt2sas dca ahci raid0 ptp raid_class pps_core scsi_transport_sas multipath hid mdio libahci linear
[  120.376085] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.16.0-28-generic #37-Ubuntu
[  120.376086] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a    09/14/2010
[  120.376088]  0000000000000009 ffff880c3fc039b8 ffffffff81762220 ffff880c3fc03a00
[  120.376090]  ffff880c3fc039f0 ffffffff8106dd2d ffff880c1ac99a00 ffff88061c2fc000
[  120.376092]  0000000000000001 0000000000000001 ffff880c1ac99a00 ffff880c3fc03a50
[  120.376094] Call Trace:
[  120.376096]  <IRQ>  [<ffffffff81762220>] dump_stack+0x45/0x56
[  120.376105]  [<ffffffff8106dd2d>] warn_slowpath_common+0x7d/0xa0
[  120.376107]  [<ffffffff8106dd9c>] warn_slowpath_fmt+0x4c/0x50
[  120.376111]  [<ffffffff8138b153>] ? ___ratelimit+0x93/0x100
[  120.376114]  [<ffffffff817654da>] skb_warn_bad_offload+0xcd/0xda
[  120.376119]  [<ffffffff81661d29>] __skb_gso_segment+0x79/0xb0
[  120.376122]  [<ffffffff81662052>] dev_hard_start_xmit+0x182/0x5c0
[  120.376125]  [<ffffffff8168337e>] sch_direct_xmit+0xee/0x1c0
[  120.376127]  [<ffffffff81662690>] __dev_queue_xmit+0x200/0x4d0
[  120.376129]  [<ffffffff81662970>] dev_queue_xmit+0x10/0x20
[  120.376135]  [<ffffffffc0796ac8>] br_dev_queue_push_xmit+0x68/0xa0 [bridge]
[  120.376138]  [<ffffffffc0796cd2>] br_forward_finish+0x22/0x60 [bridge]
[  120.376142]  [<ffffffffc0796e90>] __br_forward+0x80/0xf0 [bridge]
[  120.376145]  [<ffffffffc079713b>] br_forward+0x8b/0xa0 [bridge]
[  120.376149]  [<ffffffffc0797fb9>] br_handle_frame_finish+0x139/0x3c0 [bridge]
[  120.376153]  [<ffffffffc079838e>] br_handle_frame+0x14e/0x240 [bridge]
[  120.376155]  [<ffffffff81660102>] __netif_receive_skb_core+0x1b2/0x790
[  120.376158]  [<ffffffff8101bcd9>] ? read_tsc+0x9/0x20
[  120.376161]  [<ffffffff816606f8>] __netif_receive_skb+0x18/0x60
[  120.376163]  [<ffffffff81660763>] netif_receive_skb_internal+0x23/0x90
[  120.376165]  [<ffffffff816612c0>] napi_gro_receive+0xc0/0xf0
[  120.376174]  [<ffffffffc03007ac>] ixgbe_clean_rx_irq+0x7bc/0xb40 [ixgbe]
[  120.376180]  [<ffffffffc03018a2>] ixgbe_poll+0x482/0x850 [ixgbe]
[  120.376183]  [<ffffffff8109e9e9>] ? ttwu_do_wakeup+0x19/0xc0
[  120.376186]  [<ffffffff81660b52>] net_rx_action+0x152/0x250
[  120.376189]  [<ffffffff81073055>] __do_softirq+0xf5/0x2e0
[  120.376191]  [<ffffffff81073515>] irq_exit+0x105/0x110
[  120.376194]  [<ffffffff8176d748>] do_IRQ+0x58/0xf0
[  120.376198]  [<ffffffff8176b5ed>] common_interrupt+0x6d/0x6d
[  120.376199]  <EOI>  [<ffffffff815fb83f>] ? cpuidle_enter_state+0x4f/0xc0
[  120.376204]  [<ffffffff815fb838>] ? cpuidle_enter_state+0x48/0xc0
[  120.376206]  [<ffffffff815fb967>] cpuidle_enter+0x17/0x20
[  120.376209]  [<ffffffff810b527d>] cpu_startup_entry+0x31d/0x450
[  120.376213]  [<ffffffff810e028d>] ? tick_check_new_device+0xdd/0xf0
[  120.376216]  [<ffffffff8104520d>] start_secondary+0x21d/0x2e0
[  120.376217] ---[ end trace 90d53a2c9c47f360 ]---

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [Question]The benefit of weight_p in __qdisc_run
From: Daniel Borkmann @ 2014-12-17  9:20 UTC (permalink / raw)
  To: Dennis Chen; +Cc: netdev
In-Reply-To: <CA+U0gVhk+4-T=XuCumnpRnRyQ=OqVfegGxF3ZZQy07++PDOW3g@mail.gmail.com>

On 12/17/2014 09:54 AM, Dennis Chen wrote:
> weight_p is used as the burst xmit packet quota in the while loop of
> the __qdisc_run function,
> does anybody can elaborate the benefit of the weight_p introduced
> here? what's the consequence without it?

It acts as a quota to introduce fairness among qdiscs. See also slide 7
onwards for experiments with/without it:

   http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf

^ permalink raw reply

* Re: [Question]The benefit of weight_p in __qdisc_run
From: Dennis Chen @ 2014-12-17  9:32 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <54914AEC.7070600@redhat.com>

On Wed, Dec 17, 2014 at 5:20 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 12/17/2014 09:54 AM, Dennis Chen wrote:
>>
>> weight_p is used as the burst xmit packet quota in the while loop of
>> the __qdisc_run function,
>> does anybody can elaborate the benefit of the weight_p introduced
>> here? what's the consequence without it?
>
>
> It acts as a quota to introduce fairness among qdiscs. See also slide 7
> onwards for experiments with/without it:
>
>   http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf

Dan, I really do appreciate for your answer :)

-- 
Den

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Michal Kubecek @ 2014-12-17  9:55 UTC (permalink / raw)
  To: Richard Laager; +Cc: netdev
In-Reply-To: <1418805852.5277.25.camel@watermelon.coderich.net>

On Wed, Dec 17, 2014 at 02:44:12AM -0600, Richard Laager wrote:
> Previous history of this thread:
> http://thread.gmane.org/gmane.linux.network/326672
> 
> On 2014-11-04 22:57:19, Tom Herbert wrote:
> > Using vlan and bonding? vlan_dev_hard_start_xmit called. A possible
> > cause is that bonding interface is out of sync with slave interface
> > w.r.t. GSO features. Do we know if this worked in 3.14, 3.15?
> 
> I'm seeing the same sort of crash/warning (skb_war_bad_offload). It's
> happening on Intel 10 Gig NICs using the ixgbe driver. I'm using bridges
> (for virtual machines) on top of VLANs on top of 802.3ad bonding. I'm
> using an MTU of 9000 on the bond0 interface, but 1500 everywhere else.
> 
> I'm always bonding two ports: one one system, I'm bonding two ports on
> identical one-port NICs; on another system, I'm bonding two ports on a
> single two-port NIC. Both systems exhibit the same behavior.
> 
> Everything has worked fine for a couple years on Ubuntu 12.04 Precise
> (Linux 3.2.0). It immediately broke when I upgraded to Ubuntu 14.04
> Trusty (Linux 3.13.0). I can also reproduce this using the packaged
> version of Linux 3.16.0 on Trusty.

Would it be possible that the kernel you are using has

  da08143b8520 ("vlan: more careful checksum features handling")

(and possibly also a9b3ace44c7d and 3625920b62c3) but not

  db115037bb57 ("net: fix checksum features handling in netif_skb_features()")

?

Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next 1/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: Bjørn Mork @ 2014-12-17 11:01 UTC (permalink / raw)
  To: rapier; +Cc: netdev
In-Reply-To: <549070CF.1010506@psc.edu>

rapier <rapier@psc.edu> writes:

> + * The Web10Gig project.  See http://www.web10gig.org

URL is already outdated?


Bjørn

^ permalink raw reply

* GOOD DAY
From: Sage Mothibi @ 2014-12-17 10:25 UTC (permalink / raw)
  To: sagemothibi

[-- Attachment #1: Type: text/plain, Size: 66 bytes --]



Please view the attachment for more details
Thanks
Engr. Mothibi

[-- Attachment #2: Hello.pdf --]
[-- Type: application/pdf, Size: 180137 bytes --]

^ permalink raw reply

* [PATCH net] cxgb4: Fix decoding QSA module for ethtool get settings
From: Hariprasad Shenai @ 2014-12-17 12:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, Hariprasad Shenai

QSA module was getting decoded as QSFP module in ethtool get settings, this
patch fixes it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    |    2 +-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 28d0415..c132d90 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2376,7 +2376,7 @@ const char *t4_get_port_type_description(enum fw_port_type port_type)
 		"KR/KX",
 		"KR/KX/KX4",
 		"R QSFP_10G",
-		"",
+		"R QSA",
 		"R QSFP",
 		"R BP40_BA",
 	};
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 291b6f2..7c0aec8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2470,8 +2470,8 @@ enum fw_port_type {
 	FW_PORT_TYPE_BP_AP,
 	FW_PORT_TYPE_BP4_AP,
 	FW_PORT_TYPE_QSFP_10G,
-	FW_PORT_TYPE_QSFP,
 	FW_PORT_TYPE_QSA,
+	FW_PORT_TYPE_QSFP,
 	FW_PORT_TYPE_BP40_BA,
 
 	FW_PORT_TYPE_NONE = FW_PORT_CMD_PTYPE_M
-- 
1.7.1

^ permalink raw reply related

* Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms
From: David Vrabel @ 2014-12-17 14:00 UTC (permalink / raw)
  To: David Vrabel, John
  Cc: netdev@vger.kernel.org, Wei Liu, Ian Campbell,
	Xen-devel@lists.xen.org
In-Reply-To: <548854C3.7060008@citrix.com>

On 10/12/14 14:12, David Vrabel wrote:
> On 10/12/14 13:42, John wrote:
>> David,
>>
>> This patch you put into 3.18.0 appears to break the latest version of
>> stubdomains. I found this out today when I tried to update a machine to
>> 3.18.0 and all of the domUs crashed on start with the dmesg output like
>> this:
> 
> Cc'ing the lists and relevant netback maintainers.
> 
> I guess the stubdoms are using minios's netfront?  This is something I
> forgot about when deciding if it was ok to make this feature mandatory.
> 
> The patch cannot be reverted as it's a prerequisite for a critical
> (security) bug fix.  I am also unconvinced that the no-feature-rx-notify
> support worked correctly anyway.
> 
> This can be resolved by:
> 
> - Fixing minios's netfront to support feature-rx-notify. This should be
> easy but wouldn't help existing Xen deployments.
> 
> Or:
> 
> - Reimplement feature-rx-notify support.  I think the easiest way is to
> queue packets on the guest Rx internal queue with a short expiry time.

This patch works for me.  I tested it with a hacked Linux frontend that
disabled feature-rx-notify, but not with a stubdom.

Can you give it a try, please?

David

8<--------------------------------------------------------------
xen-netback: support frontends without feature-rx-notify again

Commit bc96f648df1bbc2729abbb84513cf4f64273a1f1 (xen-netback: make
feature-rx-notify mandatory) incorrectly assumed that there were no
frontends in use that did not support this feature.  But the frontend
driver in MiniOS does not and since this is used by (qemu) stubdoms,
these stopped working.

Netback sort of works as-is in this mode except:

- If there are no Rx requests and the internal Rx queue fills, only the
  drain timeout will wake the thread.  The default drain timeout of 10 s
  would give unacceptable pauses.

- If an Rx stall was detected and the internal Rx queue is drained, then
  the Rx thread would never wake.

Handle these two cases (when feature-rx-notify is disabled) by:

- Reducing the drain timeout to 30 ms.

- Disabling Rx stall detection.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/common.h    |    4 +++-
 drivers/net/xen-netback/interface.c |    4 +++-
 drivers/net/xen-netback/netback.c   |   27 ++++++++++++++-------------
 drivers/net/xen-netback/xenbus.c    |   12 +++++++++---
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 083ecc9..5f1fda4 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -230,6 +230,8 @@ struct xenvif {
 	 */
 	bool disabled;
 	unsigned long status;
+	unsigned long drain_timeout;
+	unsigned long stall_timeout;
 
 	/* Queues */
 	struct xenvif_queue *queues;
@@ -328,7 +330,7 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id);
 extern bool separate_tx_rx_irq;
 
 extern unsigned int rx_drain_timeout_msecs;
-extern unsigned int rx_drain_timeout_jiffies;
+extern unsigned int rx_stall_timeout_msecs;
 extern unsigned int xenvif_max_queues;
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index a6a32d3..9259a73 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -166,7 +166,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 
 	cb = XENVIF_RX_CB(skb);
-	cb->expires = jiffies + rx_drain_timeout_jiffies;
+	cb->expires = jiffies + vif->drain_timeout;
 
 	xenvif_rx_queue_tail(queue, skb);
 	xenvif_kick_thread(queue);
@@ -414,6 +414,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->ip_csum = 1;
 	vif->dev = dev;
 	vif->disabled = false;
+	vif->drain_timeout = msecs_to_jiffies(rx_drain_timeout_msecs);
+	vif->stall_timeout = msecs_to_jiffies(rx_stall_timeout_msecs);
 
 	/* Start out with no queues. */
 	vif->queues = NULL;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4a509f7..b0292e4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -60,14 +60,12 @@ module_param(separate_tx_rx_irq, bool, 0644);
  */
 unsigned int rx_drain_timeout_msecs = 10000;
 module_param(rx_drain_timeout_msecs, uint, 0444);
-unsigned int rx_drain_timeout_jiffies;
 
 /* The length of time before the frontend is considered unresponsive
  * because it isn't providing Rx slots.
  */
-static unsigned int rx_stall_timeout_msecs = 60000;
+unsigned int rx_stall_timeout_msecs = 60000;
 module_param(rx_stall_timeout_msecs, uint, 0444);
-static unsigned int rx_stall_timeout_jiffies;
 
 unsigned int xenvif_max_queues;
 module_param_named(max_queues, xenvif_max_queues, uint, 0644);
@@ -2020,7 +2018,7 @@ static bool xenvif_rx_queue_stalled(struct xenvif_queue *queue)
 	return !queue->stalled
 		&& prod - cons < XEN_NETBK_RX_SLOTS_MAX
 		&& time_after(jiffies,
-			      queue->last_rx_time + rx_stall_timeout_jiffies);
+			      queue->last_rx_time + queue->vif->stall_timeout);
 }
 
 static bool xenvif_rx_queue_ready(struct xenvif_queue *queue)
@@ -2038,8 +2036,9 @@ static bool xenvif_have_rx_work(struct xenvif_queue *queue)
 {
 	return (!skb_queue_empty(&queue->rx_queue)
 		&& xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX))
-		|| xenvif_rx_queue_stalled(queue)
-		|| xenvif_rx_queue_ready(queue)
+		|| (queue->vif->stall_timeout &&
+		    (xenvif_rx_queue_stalled(queue)
+		     || xenvif_rx_queue_ready(queue)))
 		|| kthread_should_stop()
 		|| queue->vif->disabled;
 }
@@ -2092,6 +2091,9 @@ int xenvif_kthread_guest_rx(void *data)
 	struct xenvif_queue *queue = data;
 	struct xenvif *vif = queue->vif;
 
+	if (!vif->stall_timeout)
+		xenvif_queue_carrier_on(queue);
+
 	for (;;) {
 		xenvif_wait_for_rx_work(queue);
 
@@ -2118,10 +2120,12 @@ int xenvif_kthread_guest_rx(void *data)
 		 * while it's probably not responsive, drop the
 		 * carrier so packets are dropped earlier.
 		 */
-		if (xenvif_rx_queue_stalled(queue))
-			xenvif_queue_carrier_off(queue);
-		else if (xenvif_rx_queue_ready(queue))
-			xenvif_queue_carrier_on(queue);
+		if (queue->vif->stall_timeout) {
+			if (xenvif_rx_queue_stalled(queue))
+				xenvif_queue_carrier_off(queue);
+			else if (xenvif_rx_queue_ready(queue))
+				xenvif_queue_carrier_on(queue);
+		}
 
 		/* Queued packets may have foreign pages from other
 		 * domains.  These cannot be queued indefinitely as
@@ -2192,9 +2196,6 @@ static int __init netback_init(void)
 	if (rc)
 		goto failed_init;
 
-	rx_drain_timeout_jiffies = msecs_to_jiffies(rx_drain_timeout_msecs);
-	rx_stall_timeout_jiffies = msecs_to_jiffies(rx_stall_timeout_msecs);
-
 #ifdef CONFIG_DEBUG_FS
 	xen_netback_dbg_root = debugfs_create_dir("xen-netback", NULL);
 	if (IS_ERR_OR_NULL(xen_netback_dbg_root))
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d44cd19..efbaf2a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -887,9 +887,15 @@ static int read_xenbus_vif_flags(struct backend_info *be)
 		return -EOPNOTSUPP;
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend,
-			 "feature-rx-notify", "%d", &val) < 0 || val == 0) {
-		xenbus_dev_fatal(dev, -EINVAL, "feature-rx-notify is mandatory");
-		return -EINVAL;
+			 "feature-rx-notify", "%d", &val) < 0)
+		val = 0;
+	if (!val) {
+		/* - Reduce drain timeout to poll more frequently for
+		 *   Rx requests.
+		 * - Disable Rx stall detection.
+		 */
+		be->vif->drain_timeout = msecs_to_jiffies(30);
+		be->vif->stall_timeout = 0;
 	}
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
-- 
1.7.10.4

^ permalink raw reply related

* Re: net: integer overflow in ip_idents_reserve
From: Eric Dumazet @ 2014-12-17 14:11 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Hannes Frederic Sowa, David S. Miller, LKML, netdev,
	Andrey Ryabinin, Dave Jones
In-Reply-To: <5490D920.5000104@oracle.com>

On Tue, 2014-12-16 at 20:15 -0500, Sasha Levin wrote:

> I reported this one because there's usually some code to handle overflow
> in code that expects that and here there was none (I could see).

IP ID are best effort.

When sending one million IPv4 frames per second to a particular
destination, the 16bit ID space is recycled so fast that really their
precise values do not matter anymore.
You pray that IP fragments wont be needed at all.

(One of the idea I had was to detect this kind of stress and fallback to
a random generation, reducing false sharing, but this seemed a micro
optimization targeting synthetic benchmarks )

Thanks

^ permalink raw reply

* Re: Fw: [Bug 82471] New: net/core/dev.c skb_war_bad_offload
From: Richard Laager @ 2014-12-17 14:52 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20141217095552.GB27966@unicorn.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On Wed, 2014-12-17 at 10:55 +0100, Michal Kubecek wrote:
> Would it be possible that the kernel you are using has
> 
>   da08143b8520 ("vlan: more careful checksum features handling")
> 
> (and possibly also a9b3ace44c7d and 3625920b62c3) but not
> 
>   db115037bb57 ("net: fix checksum features handling in netif_skb_features()")

Ubuntu's 3.13.0 has none of these changes.
Ubuntu's 3.16.0 has all four changes.

The problem occurs on both kernels.

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Arnaldo Carvalho de Melo @ 2014-12-17 15:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team
In-Reply-To: <CAADnVQJ+8mtB8LD=U7XbxOC2hxhDChxOELhZ3NEYeoTk1G3LYg@mail.gmail.com>

Em Sun, Dec 14, 2014 at 10:55:55PM -0800, Alexei Starovoitov escreveu:
> On Sun, Dec 14, 2014 at 5:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> > Hi,
> >
> > We have been using the kernel ftrace infra to collect TCP per-flow statistics.
> > The following patch set is a first slim-down version of our
> > existing implementation. We would like to get some early feedback
> > and make it useful for others.
> >
> > [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
> > Defines some basic tracepoints (by TRACE_EVENT).
> >
> > [RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
> > A sample perf script with simple ip/port filtering and summary output.
> >
> > [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
> > Declares a few more tracepoints (by DECLARE_TRACE) which are
> > used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.
> >
> > [RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
> > Defines a few tcp_trace structs which are used to collect statistics
> > on each tcp_sock.
> >
> > [RFC PATCH net-next 5/5] tcp: Add TCP tracer:
> > It introduces a tcp_tracer which hooks onto the tracepoints defined in the
> > patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
> > use this tracer to collect per-flow statistics.  The commit log has
> > some more details.
> 
> I think patches 1 and 3 are good additions, since they establish
> few permanent points of instrumentation in tcp stack.
> Patches 4-5 look more like use cases of tracepoints established
> before. They may feel like simple additions and, no doubt,
> they are useful, but since they expose things via tracing
> infra they become part of api and cannot be changed later,
> when more stats would be needed.
> I think systemtap like scripting on top of patches 1 and 3
> should solve your use case ?

I guess even just using 'perf probe' to set those wannabe tracepoints
should be enough, no? Then he can refer to those in his perf record
call, etc and process it just like with the real tracepoints.

> Also, have you looked at recent eBPF work?
> Though it's not completely ready yet, soon it should
> be able to do the same stats collection as you have
> in 4/5 without adding permanent pieces to the kernel.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox