Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Josh Triplett @ 2013-10-09 22:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1381359077.4971.37.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 09, 2013 at 03:51:17PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-09 at 15:36 -0700, Paul E. McKenney wrote:
> 
> > That would work, though it would probably give sparse complaints.
> 
> No sparse error here, as I said types are correct and SPARSE_RCU ready :
> 
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index 583b77e..28f8495 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -245,7 +245,7 @@ ip6_tnl_unlink(struct ip6_tnl_net *ip6n, struct ip6_tnl *t)
>              (iter = rtnl_dereference(*tp)) != NULL;
>              tp = &iter->next) {
>                 if (t == iter) {
> -                       rcu_assign_pointer(*tp, t->next);
> +                       ACCESS_ONCE(*tp) = t->next;
>                         break;
>                 }
>         }

I'd be really hesitant to introduce that type of direct assignment to an
__rcu pointer without wrapping it in some appropriately named macro, or
at the very least adding a comment.

- Josh Triplett

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 05/13] decnet: Apply rcu_access_pointer() to avoid sparse false positive
From: Josh Triplett @ 2013-10-09 22:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, David S. Miller, Thomas Graf, Gao feng, Stephen Hemminger,
	linux-decnet-user, netdev
In-Reply-To: <20131009224604.GF5790@linux.vnet.ibm.com>

On Wed, Oct 09, 2013 at 03:46:04PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 09, 2013 at 03:28:42PM -0700, Josh Triplett wrote:
> > On Wed, Oct 09, 2013 at 02:29:38PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > The sparse checking for rcu_assign_pointer() was recently upgraded
> > > to reject non-__kernel address spaces.  This also rejects __rcu,
> > > which is almost always the right thing to do.  However, the use in
> > > dn_insert_route() is legitimate: It is assigning a pointer to an element
> > 
> > Nit: "uses ... are", not "use ... is". :)
> 
> Don't I already have "use ... is"?

I was suggesting that it needed to change from "use ... is" to "uses ...
are", not the other way around.

- Josh Triplett

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 05/13] decnet: Apply rcu_access_pointer() to avoid sparse false positive
From: Dhaval Giani @ 2013-10-09 22:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, mingo, laijs, Dipankar, Andrew Morton, Mathieu Desnoyers,
	josh, Nivedita Singhvi, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, David Howells, edumazet, Darren Hart,
	Frederic Weisbecker, sbw, David S. Miller, Thomas Graf, Gao feng,
	Stephen Hemminger, linux-decnet-user, netdev
In-Reply-To: <1381354186-16285-5-git-send-email-paulmck@linux.vnet.ibm.com>

On Wed, Oct 9, 2013 at 5:29 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> The sparse checking for rcu_assign_pointer() was recently upgraded
> to reject non-__kernel address spaces.  This also rejects __rcu,
> which is almost always the right thing to do.  However, the use in
> dn_insert_route() is legitimate: It is assigning a pointer to an element
> from an RCU-protected list, and all elements of this list are already
> visible to caller.
>
> This commit therefore silences this false positive by laundering the
> pointer using rcu_access_pointer() as suggested by Josh Triplett.
>
> Reported-by: kbuild test robot <fengguang.wu@intel.com>


I did not realize that we were allowed to rename people :-)

Thanks!
Dhaval

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Eric Dumazet @ 2013-10-09 23:17 UTC (permalink / raw)
  To: Josh Triplett
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <20131009225617.GH11709@jtriplet-mobl1>

On Wed, 2013-10-09 at 15:56 -0700, Josh Triplett wrote:

> I'd be really hesitant to introduce that type of direct assignment to an
> __rcu pointer without wrapping it in some appropriately named macro, or
> at the very least adding a comment.

Well, there is no special magic here, in this specific case :

- deleting an item in an rcu list

Check list_del_rcu(), and you'll notice there is no _barrier_

Adding correct barriers is good, but please do not add them when not
needed.

It makes code hard to understand.


ACCESS(*ptr) = value;

is clear and autodocumented, because it highlights the potential
problem, that is *ptr can be read without any barrier from another cpu.
So we ask the compiler to not write temporary garbage in it.

rcu_assign_pointer(*ptr, rcu_access_pointer(value)) 

is very confusing, because it hides the _real_ problem and add defensive
programming tricks.

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Josh Triplett @ 2013-10-09 23:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1381360675.4971.45.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 09, 2013 at 04:17:55PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-09 at 15:56 -0700, Josh Triplett wrote:
> 
> > I'd be really hesitant to introduce that type of direct assignment to an
> > __rcu pointer without wrapping it in some appropriately named macro, or
> > at the very least adding a comment.
> 
> Well, there is no special magic here, in this specific case :
> 
> - deleting an item in an rcu list
> 
> Check list_del_rcu(), and you'll notice there is no _barrier_
> 
> Adding correct barriers is good, but please do not add them when not
> needed.
> 
> It makes code hard to understand.

I'm not arguing for the inclusion of an unnecessary barrier.  I'm
arguing for something more self-documenting than:

> ACCESS(*ptr) = value;

that.  Constructs like list_del_rcu are much clearer, and not
open-coded.  Open-coding synchronization code is almost always a Bad
Idea.

- Josh Triplett

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 05/13] decnet: Apply rcu_access_pointer() to avoid sparse false positive
From: Paul E. McKenney @ 2013-10-09 23:54 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: LKML, mingo, laijs, Dipankar, Andrew Morton, Mathieu Desnoyers,
	josh, Nivedita Singhvi, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, David Howells, edumazet, Darren Hart,
	Frederic Weisbecker, sbw, David S. Miller, Thomas Graf, Gao feng,
	Stephen Hemminger, linux-decnet-user, netdev
In-Reply-To: <CAPhKKr8kLDNd7CdksHriyMzsqcj0oSf7mAfLceviMjobYGy7oA@mail.gmail.com>

On Wed, Oct 09, 2013 at 06:58:47PM -0400, Dhaval Giani wrote:
> On Wed, Oct 9, 2013 at 5:29 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > The sparse checking for rcu_assign_pointer() was recently upgraded
> > to reject non-__kernel address spaces.  This also rejects __rcu,
> > which is almost always the right thing to do.  However, the use in
> > dn_insert_route() is legitimate: It is assigning a pointer to an element
> > from an RCU-protected list, and all elements of this list are already
> > visible to caller.
> >
> > This commit therefore silences this false positive by laundering the
> > pointer using rcu_access_pointer() as suggested by Josh Triplett.
> >
> > Reported-by: kbuild test robot <fengguang.wu@intel.com>
> 
> I did not realize that we were allowed to rename people :-)

Copied and pasted directly from the email I received.  Perhaps strange,
but true!  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 05/13] decnet: Apply rcu_access_pointer() to avoid sparse false positive
From: Paul E. McKenney @ 2013-10-09 23:57 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, David S. Miller, Thomas Graf, Gao feng, Stephen Hemminger,
	linux-decnet-user, netdev
In-Reply-To: <20131009225716.GI11709@jtriplet-mobl1>

On Wed, Oct 09, 2013 at 03:57:16PM -0700, Josh Triplett wrote:
> On Wed, Oct 09, 2013 at 03:46:04PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 09, 2013 at 03:28:42PM -0700, Josh Triplett wrote:
> > > On Wed, Oct 09, 2013 at 02:29:38PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > The sparse checking for rcu_assign_pointer() was recently upgraded
> > > > to reject non-__kernel address spaces.  This also rejects __rcu,
> > > > which is almost always the right thing to do.  However, the use in
> > > > dn_insert_route() is legitimate: It is assigning a pointer to an element
> > > 
> > > Nit: "uses ... are", not "use ... is". :)
> > 
> > Don't I already have "use ... is"?
> 
> I was suggesting that it needed to change from "use ... is" to "uses ...
> are", not the other way around.

I guess there really are two uses rather than just one.  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Eric Dumazet @ 2013-10-10  0:12 UTC (permalink / raw)
  To: Josh Triplett
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <20131009234040.GB14055@jtriplet-mobl1>

On Wed, 2013-10-09 at 16:40 -0700, Josh Triplett wrote:

> that.  Constructs like list_del_rcu are much clearer, and not
> open-coded.  Open-coding synchronization code is almost always a Bad
> Idea.

OK, so you think there is synchronization code.

I will shut up then, no need to waste time.

^ permalink raw reply

* [PATCH net-next] tcp: use ACCESS_ONCE() in tcp_update_pacing_rate()
From: Eric Dumazet @ 2013-10-10  0:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

sk_pacing_rate is read by sch_fq packet scheduler at any time,
with no synchronization, so make sure we update it in a
sensible way. ACCESS_ONCE() is how we instruct compiler
to not do stupid things, like using the memory location
as a temporary variable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 47b8ab7..b441fe6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -755,7 +755,12 @@ static void tcp_update_pacing_rate(struct sock *sk)
 	if (tp->srtt > 8 + 2)
 		do_div(rate, tp->srtt);
 
-	sk->sk_pacing_rate = min_t(u64, rate, sk->sk_max_pacing_rate);
+	/* ACCESS_ONCE() is needed because sch_fq fetches sk_pacing_rate
+	 * without any lock. We want to make sure compiler wont store
+	 * intermediate values in this location.
+	 */
+	ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
+						sk->sk_max_pacing_rate);
 }
 
 /* Calculate rto without backoff.  This is the second half of Van Jacobson's

^ permalink raw reply related

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Eric W. Biederman @ 2013-10-10  0:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, yamato, netdev
In-Reply-To: <20131009165254.2e1c8332@nehalam.linuxnetplumber.net>

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Tue, 8 Oct 2013 14:13:37 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> On Tue, 08 Oct 2013 15:23:49 -0400 (EDT)
>> David Miller <davem@davemloft.net> wrote:
>> 
>> > From: Masatake YAMATO <yamato@redhat.com>
>> > Date: Fri,  4 Oct 2013 11:34:21 +0900
>> > 
>> > > ip link has ability to show extra information of net work device if
>> > > kernel provides sunh information. With this patch veth driver can
>> > > provide its peer ifindex information to ip command via netlink
>> > > interface.
>> > > 
>> > > Signed-off-by: Masatake YAMATO <yamato@redhat.com>
>> > 
>> > Applied to net-next, thank you.
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> Please revert this. It is incorrect.
>> The info returned by any netlink message should be equal to the message
>> for setting.
>> 
>> I think the correct patch would be something like this (compile tested only).
>> 
>> --- a/drivers/net/veth.c	2013-10-06 14:48:23.806461177 -0700
>> +++ b/drivers/net/veth.c	2013-10-08 14:11:42.434074690 -0700
>> @@ -434,6 +434,35 @@ static const struct nla_policy veth_poli
>>  	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
>>  };
>>  
>> +static size_t veth_get_size(const struct net_device *dev)
>> +{
>> +	return nla_total_size(sizeof(struct ifinfomsg)) + /* VETH_INFO_PEER */
>> +		0;
>> +}
>> +
>> +static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
>> +{
>> +	struct veth_priv *priv = netdev_priv(dev);
>> +	struct net_device *peer = rtnl_dereference(priv->peer);
>> +
>> +	if (peer) {
>> +		struct ifinfomsg ifi = {
>> +			.ifi_family = AF_UNSPEC,
>> +			.ifi_type = peer->type,
>> +			.ifi_index = peer->ifindex,
>> +			.ifi_flags = dev_get_flags(peer),
>> +		};
>> +
>> +		if (nla_put(skb, VETH_INFO_PEER, sizeof(ifi), &ifi))
>> +			goto nla_put_failure;
>> +	}
>> +
>> +	return 0;
>> +
>> +nla_put_failure:
>> +	return -EMSGSIZE;
>> +}
>> +
>>  static struct rtnl_link_ops veth_link_ops = {
>>  	.kind		= DRV_NAME,
>>  	.priv_size	= sizeof(struct veth_priv),
>> @@ -443,6 +472,8 @@ static struct rtnl_link_ops veth_link_op
>>  	.dellink	= veth_dellink,
>>  	.policy		= veth_policy,
>>  	.maxtype	= VETH_INFO_MAX,
>> +	.get_size	= veth_get_size,
>> +	.fill_info	= veth_fill_info,
>>  };
>>  
>>  /*
>> 
>> 
>
> This patch is ok as RFC starting point but the full implementation needs to
> add on IFLA_NAME and other attributes such that the full peer can be reconstructed.
>
> Ideally, the output of 'ip link' command can be in a format that can be used
> to recreate the same veth pair.
>
> One issue is that veth has the ability to make a peer in a different namespace
> and the network namespace code does not appear to have the ability to be invertable.
> I.e it is not possible to construct IFLA_NET_NS_PID or IFLA_NET_NS_FD attributes
> from an existing network device namespace.

Right.

IFLA_NET_NS_PID is not invertible as there may be no processes running
in a pid namespace.

IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
invertible for broadcast messages, but for unicast it looks like a bit
of a pain but there are no fundamental problems.

I don't know if we care enough yet to write the code for the
IFLA_NET_NS_FD attribute but it is doable.

Eric

^ permalink raw reply

* Re: [PATCH net-next] net: Separate the close_list and the unreg_list
From: Francesco Ruggeri @ 2013-10-10  0:22 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev
In-Reply-To: <CA+HUmGg2LCX0zZAXgNhWwafBgi1HuePTLmxDu5VTUm7tX4TYMA@mail.gmail.com>

On Sun, Oct 6, 2013 at 10:13 AM, Francesco Ruggeri
<fruggeri@aristanetworks.com> wrote:
>>
>> Can you verify this incremental change fixes it for you?
>>
>
> Sure, I will.
> Thanks,
>
> Francesco

Hi Eric,
I have been running this patch for a few days and I have not had any problems.
Thanks,
Francesco

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Paul E. McKenney @ 2013-10-10  0:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1381363960.4971.55.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 09, 2013 at 05:12:40PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-09 at 16:40 -0700, Josh Triplett wrote:
> 
> > that.  Constructs like list_del_rcu are much clearer, and not
> > open-coded.  Open-coding synchronization code is almost always a Bad
> > Idea.
> 
> OK, so you think there is synchronization code.
> 
> I will shut up then, no need to waste time.

As you said earlier, we should at least get rid of the memory barrier
as long as we are changing the code.

Josh, what would you suggest as the best way to avoid the memory barrier,
keep sparse happy, and not be too ugly?

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Simon Horman @ 2013-10-10  0:31 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer
In-Reply-To: <1381132847-12589-1-git-send-email-horms@verge.net.au>

On Mon, Oct 07, 2013 at 05:00:42PM +0900, Simon Horman wrote:
> Hi,
> 
> This series implements MPLS actions and matches based on work by
> Ravi K, Leo Alterman, Yamahata-san and Joe Stringer.
> 
> This series provides two changes
> 
> * Patches 1 - 3
> 
>   Provide user-space support for the VLAN/MPLS tag insertion order
>   up to and including OpenFlow 1.2, and the different ordering
>   specified from OpenFlow 1.3. In a nutshell the datapath always
>   uses the OpenFlow 1.3 ordering, which is to always insert tags
>   immediately after the L2 header, regardless of the presence of other
>   tags. And ovs-vswtichd provides compatibility for the behaviour up
>   to OpenFlow 1.2, which is that MPLS tags should follow VLAN tags
>   if present.
> 
>   These patches have been updated since v2.42.
> 
>   Ben, these are for you to review.
> 
> * Patches 4 and 5
> 
>   Adding basic MPLS action and match support to the kernel datapath
> 
>   These patches were last updated in v2.41.
> 
>   Jesse, these are for you to review.

Hi Jesse, Pravin and Ben,

I believe this series addresses the feedback that each of you
gave with regards to recent previous postings of this patch-set.
I'm wondering if you could find some time to review it.

> 
> 
> Differences between v2.43 and v2.42:
> 
> * As suggested by Ben Pfaff
>   Move vlan state into struct xlate_ctx
>   1. Add final_vlan_tci member to struct xlate_ctx instead of vlan_tci member
>      struct xlate_xin.  This seems to be a better palace for it as it does
>      not need to be accessible from the caller.
>   2. Move local vlan_tci variable of do_xlate_actions() to be the
>      next_vlan_tci member of strict xlate_ctx.  This allows for it to work
>      across resubmit actions and goto table instructions.
> * Code contributed by Ben Pfaff
>   + Use enum for to control order of MPLS LSE insertion
>     This makes the logic somewhat clearer
> * Add a helper push_mpls_from_openflow() to consolidate
>   the same logic that appears in three locations.
> 
> 
> Differences between v2.42 and v2.41:
> 
> * Rebase for:
>   + 0585f7a ("datapath: Simplify mega-flow APIs.")
>   + a097c0b ("datapath: Restructure datapath.c and flow.c")
> * As suggested by Jesse Gross
>   + Take into account that push_mpls() will have freed the skb on error
>   + Remove dubious !eth_p_mpls(skb->protocol) condition from push_mpls
>     The !eth_p_mpls(skb->protocol) condition on setting inner_protocol
>     has no effect. Its motivation was to ensure that inner_protocol was
>     only set the first time that mpls_push occured. However this is already
>     ensured by the !ovs_skb_get_inner_protocol(skb) condition.
>   + Return -EINVAL instead of -ENOMEM from pop_mpls() if the skb is too short
>   + Do not add @inner_protocol to kernel doc for struct ovs_skb_cb.
>     The patch no longer adds an inner_protocol member to struct ovs_skb_cb
>   + Do not add and set otherwise unsued inner_protocol variable in
>     rpl_dev_queue_xmit()
> * As suggested by Pravin Shelar
>   + Implement compatibility code in existing rpl_skb_gso_segment
>     rather than introducing to use rpl___skb_gso_segment
> 
> 
> Differences between v2.41 and v2.40:
> 
> * As suggested by Ben Pfaff
>   + Expand struct ofpact_reg_load to include a mpls_before_vlan field
>     rather than using the compat field of the ofpact field of
>     struct ofpact_reg_load.
>   + Pass version to  ofpacts_pull_openflow11_actions and
>     ofpacts_pull_openflow11_instructions.  This removes the invalid
>     assumption that that these functions are passed a full message and are
>     thus able to deduce the OpenFlow version.
> 
> 
> Differences between v2.40 and v2.39:
> 
> * Rebase for:
>   + New dev_queue_xmit compat code
>   + Updated put_vlan()
>   + Removal of mpls_depth field from struct flow
> * As suggested by Jesse Gross
>   + Remove bogus mac_len update from push_mpls()
>   + Slightly simplify push_mpls() by using eth_hdr()
>   + Remove dubious condition !eth_p_mpls(inner_protocol) on
>     an skb being considered to be MPLS in netdev_send()
>   + Only use compatibility code for MPLS GSO segmentation on kernels
>     older than 3.11
>   + Revamp setting of inner_protocol
>     1. Do not unconditionally set inner_protocol to the value of
>        skb->protocol in ovs_execute_actions().
>     2. Initialise inner_protocol it to zero only if compatibility code is in
>        use. In the case where compatibility code is not in use it will either
>        be zero due since the allocation of the skb or some other value set
>        by some other user.
>     3. Conditionally set the inner_protocol in push_mpls() to the value of
>        skb->protocol when entering push_mpls(). The condition is that
>        inner_protocol is zero and the value of skb->protocol is not an MPLS
>        ethernet type.
>     - This new scheme:
>       + Pushes logic to set inner_protocol closer to the case where it is
> 	needed.
>       + Avoids over-writing values set by other users.
> * As suggested by Pravin Shelar
>   + Only set and restore skb->protocol in rpl___skb_gso_segment() in the
>     case of MPLS
>   + Add inner_protocol field to struct ovs_gso_cb instead of ovs_skb_cb.
>     This moves compatibility code closer to where it is used
>     and creates fewer differences with mainline.
> * Update comment on mac_len updates in datapath/actions.c
> * Remove HAVE_INNER_PROCOTOL and instead just check
>   against kernel version 3.11 directly.
>   HAVE_INNER_PROCOTOL is a hang-over from work done prior
>   to the merge of inner_protocol into the kernel.
> * Remove dubious condition !eth_p_mpls(inner_protocol) on
>   using inner_protocol as the type in rpl_skb_network_protocol()
> * Do not update type of features in rpl_dev_queue_xmit.
>   Though arguably correct this is not an inherent part of
>   the changes made by this patch.
> * Use skb_cow_head() in push_mpls()
>   + Call skb_cow_head(skb, MPLS_HLEN) instead of
>     make_writable(skb, skb->mac_len) to ensure that there is enough head
>     room to push an MPLS LSE regardless of whether the skb is cloned or not.
>   + This is consistent with the behaviour of rpl__vlan_put_tag().
>   + This is a fix for crashes reported when performing mpls_push
>     with headroom less than 4. This problem was introduced in v3.36.
> * Skip popping in mpls_pop if the skb is too short to contain an MPLS LSE
> 
> 
> Differences between v2.39 and v2.38:
> 
> * Rebase for removal of vlan, checksum and skb->mark compat code
>   - This includes adding adding a new patch,
>     "[PATCH v2.39 6/7] datapath: Break out deacceleration portion of
>     vlan_push" to allow re-use of some existing code.
> 
> 
> Differences between v2.38 and v2.37:
> 
> * Rebase for SCTP support
> * Refactor validate_tp_port() to iterate over eth_types rather
>   than open-coding the loop. With the addition of SCTP this logic
>   is now used three times.
> 
> 
> Differences between v2.37 and v2.36:
> 
> * Rebase
> 
> 
> Differences between v2.36 and v2.35:
> 
> * Rebase
> 
> * Do not add set_ethertype() to datapath/actions.c.
>   As this patch has evolved this function had devolved into
>   to sets of functionality wrapped into a single function with
>   only one line of common code. Refactor things to simply
>   open-code setting the ether type in the two locations where
>   set_ethertype() was previously used. The aim here is to improve
>   readability.
> 
> * Update setting skb->ethertype after mpls push and pop.
>   - In the case of push_mpls it should be set unconditionally
>     as in v2.35 the behaviour of this function to always push
>     an MPLS LSE before any VLAN tags.
>   - In the case of mpls_pop eth_p_mpls(skb->protocol) is a better
>     test than skb->protocol != htons(ETH_P_8021Q) as it will give the
>     correct behaviour in the presence of other VLAN ethernet types,
>     for example 0x88a8 which is used by 802.1ad. Moreover, it seems
>     correct to update the ethernet type if it was previously set
>     according to the top-most MPLS LSE.
> 
> * Deaccelerate VLANs when pushing MPLS tags the
>   - Since v2.35 MPLS push will insert an MPLS LSE before any VLAN tags.
>     This means that if an accelerated tag is present it should be
>     deaccelerated to ensure it ends up in the correct position.
> 
> * Update skb->mac_len in push_mpls() so that it will be correct
>   when used by a subsequent call to pop_mpls().
> 
>   As things stand I do not believe this is strictly necessary as
>   ovs-vswitchd will not send a pop MPLS action after a push MPLS action.
>   However, I have added this in order to code more defensively as I believe
>   that if such a sequence did occur it would be rather unobvious why
>   it didn't work.
> 
> * Do not add skb_cow_head() call in push_mpls().
>   It is unnecessary as there is a make_writable() call.
>   This change was also made in v2.30 but some how the
>   code regressed between then and v2.35.
> 
> 
> Differences between v2.35 and v2.34:
> 
> * Add support for the tag ordering specified up until OpenFlow 1.2 and
>   the ordering specified from OpenFlow 1.3.
> 
> * Correct error in datapath patch's handling of GSO in the presence
>   of MPLS and absence of VLANs.
> 
> 
> To aid review this series is available in git at:
> 
> git://github.com/horms/openvswitch.git devel/mpls-v2.43
> 
> 
> Patch list and overall diffstat:
> 
> Joe Stringer (3):
>   odp: Allow VLAN actions after MPLS actions
>   ofp-actions: Add separate OpenFlow 1.3 action parser
>   lib: Support pushing of MPLS LSE before or after VLAN tag
> 
> Simon Horman (2):
>   datapath: Break out deacceleration portion of vlan_push
>   datapath: Add basic MPLS support to kernel
> 
>  datapath/Modules.mk                             |   1 +
>  datapath/actions.c                              | 158 ++++++++-
>  datapath/datapath.c                             |   4 +-
>  datapath/flow.c                                 |  29 ++
>  datapath/flow.h                                 |  17 +-
>  datapath/flow_netlink.c                         | 286 +++++++++++++--
>  datapath/flow_netlink.h                         |   2 +-
>  datapath/linux/compat/gso.c                     |  70 +++-
>  datapath/linux/compat/gso.h                     |  41 +++
>  datapath/linux/compat/include/linux/netdevice.h |   6 +-
>  datapath/linux/compat/netdevice.c               |  10 +-
>  datapath/mpls.h                                 |  15 +
>  include/linux/openvswitch.h                     |   7 +-
>  lib/flow.c                                      |   2 +-
>  lib/odp-util.c                                  |   9 +-
>  lib/odp-util.h                                  |   2 +-
>  lib/ofp-actions.c                               |  91 ++++-
>  lib/ofp-actions.h                               |  16 +
>  lib/ofp-print.c                                 |   2 +-
>  lib/ofp-util.c                                  |  24 +-
>  lib/ofp-util.h                                  |   2 +-
>  lib/packets.c                                   |  10 +-
>  lib/packets.h                                   |   2 +-
>  ofproto/ofproto-dpif-xlate.c                    | 132 +++++--
>  tests/ofproto-dpif.at                           | 446 ++++++++++++++++++++++++
>  utilities/ovs-ofctl.c                           |   8 +-
>  26 files changed, 1272 insertions(+), 120 deletions(-)
>  create mode 100644 datapath/mpls.h
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Jesse Gross @ 2013-10-10  1:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Pravin B Shelar, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <1381288051-3607-1-git-send-email-ast@plumgrid.com>

On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> The combination of two commits
>
> commit 8e4e1713e4
> ("openvswitch: Simplify datapath locking.")
>
> and
>
> commit 2537b4dd0a
> ("openvswitch:: link upper device for port devices")
>
> introduced a bug where upper_dev wasn't unlinked upon
> netdev_unregister notification
>
> The following steps:
>
>   modprobe openvswitch
>   ovs-dpctl add-dp test
>   ip tuntap add dev tap1 mode tap
>   ovs-dpctl add-if test tap1
>   ip tuntap del dev tap1 mode tap
>
> are causing multiple warnings:

[...]

Hmm, you are right. Pravin mentioned some concerns about layering with
this patch so I believe he's trying to think of a way to minimize
that.

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Hannes Frederic Sowa @ 2013-10-10  2:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Josh Triplett, linux-kernel, mingo, laijs, dipankar,
	akpm, mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells,
	edumazet, darren, fweisbec, sbw, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev
In-Reply-To: <20131010002833.GJ5790@linux.vnet.ibm.com>

On Wed, Oct 09, 2013 at 05:28:33PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 09, 2013 at 05:12:40PM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-09 at 16:40 -0700, Josh Triplett wrote:
> > 
> > > that.  Constructs like list_del_rcu are much clearer, and not
> > > open-coded.  Open-coding synchronization code is almost always a Bad
> > > Idea.
> > 
> > OK, so you think there is synchronization code.
> > 
> > I will shut up then, no need to waste time.
> 
> As you said earlier, we should at least get rid of the memory barrier
> as long as we are changing the code.

Interesting thread!

Sorry to chime in and asking a question:

Why do we need an ACCESS_ONCE here if rcu_assign_pointer can do without one?
In other words I wonder why rcu_assign_pointer is not a static inline function
to use the sequence point in argument evaluation (if I remember correctly this
also holds for inline functions) to not allow something like this:

E.g. we want to publish which lock to take first to prevent an ABBA problem
(extreme example):

rcu_assign_pointer(lockptr, min(lptr1, lptr2));

Couldn't a compiler spill the lockptr memory location as a temporary buffer
if the compiler is under register pressure? (yes, this seems unlikely if we
flushed out most registers to memory because of the barrier, but still... ;) )

This seems to be also the case if we publish a multi-dereferencing pointers
e.g. ptr->ptr->ptr.

Thanks,

  Hannes

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: annie li @ 2013-10-10  2:19 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD012FE96@AMSPEX01CL01.citrite.net>


On 2013-10-9 18:26, Paul Durrant wrote:
>> -----Original Message-----
>> From: annie li [mailto:annie.li@oracle.com]
>> Sent: 09 October 2013 05:42
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>> Ian Campbell
>> Subject: Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6
>> TCP GSO to the guest
>>
>>
>> On 2013-10-8 18:58, Paul Durrant wrote:
>>> This patch adds code to handle SKB_GSO_TCPV6 skbs and construct
>> appropriate
>>> extra or prefix segments to pass the large packet to the frontend. New
>>> xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are
>> sampled
>>> to determine if the frontend is capable of handling such packets.
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: Wei Liu <wei.liu2@citrix.com>
>>> Cc: David Vrabel <david.vrabel@citrix.com>
>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>> ---
>>>    drivers/net/xen-netback/common.h    |    6 +++--
>>>    drivers/net/xen-netback/interface.c |    8 ++++--
>>>    drivers/net/xen-netback/netback.c   |   47
>> +++++++++++++++++++++++++++--------
>>>    drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
>>>    4 files changed, 74 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
>> netback/common.h
>>> index b4a9a3c..720b1ca 100644
>>> --- a/drivers/net/xen-netback/common.h
>>> +++ b/drivers/net/xen-netback/common.h
>>> @@ -87,6 +87,7 @@ struct pending_tx_info {
>>>    struct xenvif_rx_meta {
>>>    	int id;
>>>    	int size;
>>> +	int gso_type;
>>>    	int gso_size;
>>>    };
>>>
>>> @@ -150,9 +151,10 @@ struct xenvif {
>>>    	u8               fe_dev_addr[6];
>>>
>>>    	/* Frontend feature information. */
>>> +	int gso_mask;
>>> +	int gso_prefix_mask;
>> I assume it is a flag instead of mask here, right? If it is mask, then 1
>> means disabling the gso.
> I don't understand what you're saying here. I'm just swapping from bit flags to a couple of masks. Masks without either of the requisite bits for v4 or v6 gso mean it is disabled.

It is just about semantics,  my understanding is masks WITH bits for v4 
or v6 means disabling.

>
>>> +
>>>    	u8 can_sg:1;
>>> -	u8 gso:1;
>>> -	u8 gso_prefix:1;
>>>    	u8 ip_csum:1;
>>>    	u8 ipv6_csum:1;
>>>
>>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>>> index cb0d8ea..3d11387 100644
>>> --- a/drivers/net/xen-netback/interface.c
>>> +++ b/drivers/net/xen-netback/interface.c
>>> @@ -214,8 +214,12 @@ static netdev_features_t
>> xenvif_fix_features(struct net_device *dev,
>>>    	if (!vif->can_sg)
>>>    		features &= ~NETIF_F_SG;
>>> -	if (!vif->gso && !vif->gso_prefix)
>>> +	if (~(vif->gso_mask | vif->gso_prefix_mask) &
>>> +	    (1 << XEN_NETIF_GSO_TYPE_TCPV4))
>> Is it better to use XEN_NETIF_GSO_TYPE_TCPV4 directly and setting
>> gso_mask(gso_prefix_mask) with "|= XEN_NETIF_GSO_TYPE_TCPV4" or "|=
>> XEN_NETIF_GSO_TYPE_TCPV6" instead of "1 <<"?
>>
> I thought about it but decided it was best to leave XEN_NETIF_GSO_TYPE_xxx as a list of types rather than bits in a mask as there's no intrinsic reason why you'd ever want to OR them together (unlike the tx or rx flags). That fact I use them as bit shifts in netback is purely for convenience of coding - I guess I could define macros to make it a little tidier though.

Macros would be fine.

Thanks
Annie

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Pravin Shelar @ 2013-10-10  3:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <1381288051-3607-1-git-send-email-ast@plumgrid.com>

On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> The combination of two commits
>
> commit 8e4e1713e4
> ("openvswitch: Simplify datapath locking.")
>
> and
>
> commit 2537b4dd0a
> ("openvswitch:: link upper device for port devices")
>
> introduced a bug where upper_dev wasn't unlinked upon
> netdev_unregister notification
>
> The following steps:
>
>   modprobe openvswitch
>   ovs-dpctl add-dp test
>   ip tuntap add dev tap1 mode tap
>   ovs-dpctl add-if test tap1
>   ip tuntap del dev tap1 mode tap
>
> are causing multiple warnings:
>
> [   62.747557] gre: GRE over IPv4 demultiplexor driver
> [   62.749579] openvswitch: Open vSwitch switching datapath
> [   62.755087] device test entered promiscuous mode
> [   62.765911] device tap1 entered promiscuous mode
> [   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
> [   62.769017] ------------[ cut here ]------------
> [   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
> [   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
> [   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
> [   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
> [   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
> [   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
> [   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
> [   62.769059] Call Trace:
> [   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
> [   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
> [   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
> [   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
> [   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
> [   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
> [   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
> [   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
> [   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
> [   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
> [   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
> [   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
> [   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
> [   62.769093] ---[ end trace 838756c62e156ffb ]---
> [   62.769481] ------------[ cut here ]------------
> [   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
> [   62.769486] sysfs: can not remove 'master', no directory
> [   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
> [   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
> [   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
> [   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
> [   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
> [   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
> [   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
> [   62.769525] Call Trace:
> [   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
> [   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
> [   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
> [   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
> [   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
> [   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
> [   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
> [   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
> [   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
> [   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
> [   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
> [   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
> [   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
> [   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
> [   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
> [   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
> [   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
> [   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
> [   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
> [   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
> [   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
> [   62.769573] ---[ end trace 838756c62e156ffc ]---
> [   62.769574] ------------[ cut here ]------------
> [   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
> [   62.769577] sysfs: can not remove 'upper_test', no directory
> [   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
> [   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
> [   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
> [   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
> [   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
> [   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
> [   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
> [   62.769613] Call Trace:
> [   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
> [   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
> [   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
> [   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
> [   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
> [   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
> [   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
> [   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
> [   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
> [   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
> [   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
> [   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
> [   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
> [   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
> [   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
> [   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
> [   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
> [   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
> [   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
> [   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
> [   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
> [   62.769657] ---[ end trace 838756c62e156ffd ]---
> [   62.769724] device tap1 left promiscuous mode
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  net/openvswitch/dp_notify.c    |    5 +++++
>  net/openvswitch/vport-netdev.c |   16 +++++++++++++---
>  net/openvswitch/vport-netdev.h |    1 +
>  3 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
> index c323567..e9380bd 100644
> --- a/net/openvswitch/dp_notify.c
> +++ b/net/openvswitch/dp_notify.c
> @@ -88,6 +88,11 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
>                 return NOTIFY_DONE;
>
>         if (event == NETDEV_UNREGISTER) {
> +               /* rx_handler_unregister and upper_dev_unlink immediately */
> +               if (dev->reg_state == NETREG_UNREGISTERING)
> +                       ovs_netdev_unlink_dev(vport);
> +

Rather than doing vport destroy here, we can just unlink upper device
and let workq do rest of work.

Thanks.

> +               /* schedule vport destroy, dev_put and genl notification */
>                 ovs_net = net_generic(dev_net(dev), ovs_net_id);
>                 queue_work(system_wq, &ovs_net->dp_notify_work);
>         }
> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> index 09d93c1..cce933a 100644
> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu)
>         ovs_vport_free(vport_from_priv(netdev_vport));
>  }
>
> -static void netdev_destroy(struct vport *vport)
> +void ovs_netdev_unlink_dev(struct vport *vport)
>  {
>         struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
>
> -       rtnl_lock();
> +       ASSERT_RTNL();
>         netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
>         netdev_rx_handler_unregister(netdev_vport->dev);
> -       netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
> +       netdev_upper_dev_unlink(netdev_vport->dev,
> +                               netdev_master_upper_dev_get(netdev_vport->dev));
>         dev_set_promiscuity(netdev_vport->dev, -1);
> +}
> +
> +static void netdev_destroy(struct vport *vport)
> +{
> +       struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
> +
> +       rtnl_lock();
> +       if (netdev_vport->dev->reg_state != NETREG_UNREGISTERING)
> +               ovs_netdev_unlink_dev(vport);
>         rtnl_unlock();
>
>         call_rcu(&netdev_vport->rcu, free_port_rcu);
> diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
> index dd298b5..21e3770 100644
> --- a/net/openvswitch/vport-netdev.h
> +++ b/net/openvswitch/vport-netdev.h
> @@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport)
>  }
>
>  const char *ovs_netdev_get_name(const struct vport *);
> +void ovs_netdev_unlink_dev(struct vport *);
>
>  #endif /* vport_netdev.h */
> --
> 1.7.9.5
>

^ permalink raw reply

* Re: pull-request: can 2013-10-09
From: David Miller @ 2013-10-10  3:45 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <1381353118-25111-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed,  9 Oct 2013 23:11:55 +0200

> here are three fixes for the v3.12 release cycle. The first fixes a regression
> in the flexcan driver which was introduced in a previous v3.12 patch. The
> second one fixes the mx28 detection in the flexcan driver. The third one
> targets the at91_can driver and fixes the driver data mapping for platform
> devices.

Pulled, thanks Marc.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-10  4:11 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: David S. Miller, Jesse Gross, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <CALnjE+p3MC7D9oYZ4VhH=fdOWSYcV6h9twnaZBocYnRdAEcpBg@mail.gmail.com>

On Wed, Oct 9, 2013 at 8:02 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> The combination of two commits
>>
>> commit 8e4e1713e4
>> ("openvswitch: Simplify datapath locking.")
>>
>> and
>>
>> commit 2537b4dd0a
>> ("openvswitch:: link upper device for port devices")
>>
>> introduced a bug where upper_dev wasn't unlinked upon
>> netdev_unregister notification
>>
>> The following steps:
>>
>>   modprobe openvswitch
>>   ovs-dpctl add-dp test
>>   ip tuntap add dev tap1 mode tap
>>   ovs-dpctl add-if test tap1
>>   ip tuntap del dev tap1 mode tap
>>
>> are causing multiple warnings:
>>
>> [   62.747557] gre: GRE over IPv4 demultiplexor driver
>> [   62.749579] openvswitch: Open vSwitch switching datapath
>> [   62.755087] device test entered promiscuous mode
>> [   62.765911] device tap1 entered promiscuous mode
>> [   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
>> [   62.769017] ------------[ cut here ]------------
>> [   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
>> [   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>> [   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
>> [   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>> [   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
>> [   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
>> [   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
>> [   62.769059] Call Trace:
>> [   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>> [   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>> [   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
>> [   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
>> [   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
>> [   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
>> [   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
>> [   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
>> [   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
>> [   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
>> [   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
>> [   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
>> [   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
>> [   62.769093] ---[ end trace 838756c62e156ffb ]---
>> [   62.769481] ------------[ cut here ]------------
>> [   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
>> [   62.769486] sysfs: can not remove 'master', no directory
>> [   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>> [   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
>> [   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>> [   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
>> [   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
>> [   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
>> [   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
>> [   62.769525] Call Trace:
>> [   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>> [   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>> [   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
>> [   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
>> [   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
>> [   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
>> [   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
>> [   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
>> [   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
>> [   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
>> [   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
>> [   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
>> [   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
>> [   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
>> [   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
>> [   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
>> [   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
>> [   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
>> [   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>> [   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
>> [   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>> [   62.769573] ---[ end trace 838756c62e156ffc ]---
>> [   62.769574] ------------[ cut here ]------------
>> [   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
>> [   62.769577] sysfs: can not remove 'upper_test', no directory
>> [   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>> [   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
>> [   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>> [   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
>> [   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
>> [   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
>> [   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
>> [   62.769613] Call Trace:
>> [   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>> [   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>> [   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
>> [   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
>> [   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
>> [   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
>> [   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
>> [   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
>> [   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
>> [   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
>> [   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
>> [   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
>> [   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
>> [   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
>> [   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
>> [   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
>> [   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
>> [   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
>> [   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>> [   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
>> [   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>> [   62.769657] ---[ end trace 838756c62e156ffd ]---
>> [   62.769724] device tap1 left promiscuous mode
>>
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>> ---
>>  net/openvswitch/dp_notify.c    |    5 +++++
>>  net/openvswitch/vport-netdev.c |   16 +++++++++++++---
>>  net/openvswitch/vport-netdev.h |    1 +
>>  3 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>> index c323567..e9380bd 100644
>> --- a/net/openvswitch/dp_notify.c
>> +++ b/net/openvswitch/dp_notify.c
>> @@ -88,6 +88,11 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
>>                 return NOTIFY_DONE;
>>
>>         if (event == NETDEV_UNREGISTER) {
>> +               /* rx_handler_unregister and upper_dev_unlink immediately */
>> +               if (dev->reg_state == NETREG_UNREGISTERING)
>> +                       ovs_netdev_unlink_dev(vport);
>> +
>
> Rather than doing vport destroy here, we can just unlink upper device
> and let workq do rest of work.

isn't it what it's doing?
It does unregister and unlink immediately and schedules workq to do destroy.
Obviously destroy cannot happen here, since we cannot grab ovs_lock here,
since it would be in the wrong order vs the rest of the code.

>
> Thanks.
>
>> +               /* schedule vport destroy, dev_put and genl notification */
>>                 ovs_net = net_generic(dev_net(dev), ovs_net_id);
>>                 queue_work(system_wq, &ovs_net->dp_notify_work);
>>         }
>> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
>> index 09d93c1..cce933a 100644
>> --- a/net/openvswitch/vport-netdev.c
>> +++ b/net/openvswitch/vport-netdev.c
>> @@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu)
>>         ovs_vport_free(vport_from_priv(netdev_vport));
>>  }
>>
>> -static void netdev_destroy(struct vport *vport)
>> +void ovs_netdev_unlink_dev(struct vport *vport)
>>  {
>>         struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
>>
>> -       rtnl_lock();
>> +       ASSERT_RTNL();
>>         netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
>>         netdev_rx_handler_unregister(netdev_vport->dev);
>> -       netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
>> +       netdev_upper_dev_unlink(netdev_vport->dev,
>> +                               netdev_master_upper_dev_get(netdev_vport->dev));
>>         dev_set_promiscuity(netdev_vport->dev, -1);
>> +}
>> +
>> +static void netdev_destroy(struct vport *vport)
>> +{
>> +       struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
>> +
>> +       rtnl_lock();
>> +       if (netdev_vport->dev->reg_state != NETREG_UNREGISTERING)
>> +               ovs_netdev_unlink_dev(vport);
>>         rtnl_unlock();
>>
>>         call_rcu(&netdev_vport->rcu, free_port_rcu);
>> diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
>> index dd298b5..21e3770 100644
>> --- a/net/openvswitch/vport-netdev.h
>> +++ b/net/openvswitch/vport-netdev.h
>> @@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport)
>>  }
>>
>>  const char *ovs_netdev_get_name(const struct vport *);
>> +void ovs_netdev_unlink_dev(struct vport *);
>>
>>  #endif /* vport_netdev.h */
>> --
>> 1.7.9.5
>>

^ permalink raw reply

* Re: [PATCH] veth: allow to setup multicast address for veth device
From: David Miller @ 2013-10-10  4:16 UTC (permalink / raw)
  To: gaofeng; +Cc: netdev, pablo, edumazet, kaber, hannes
In-Reply-To: <1380876744-606-1-git-send-email-gaofeng@cn.fujitsu.com>

From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Fri, 4 Oct 2013 16:52:24 +0800

> We can only setup multicast address for network device when
> net_device_ops->ndo_set_rx_mode is not null.
> 
> Some configurations need to add multicast address for net
> device, such as netfilter cluster match module.
> 
> Add a fake ndo_set_rx_mode function to allow this operation.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] fib_trie: only calc for the un-first node
From: David Miller @ 2013-10-10  4:17 UTC (permalink / raw)
  To: baker.kernel; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <1381203411-12047-1-git-send-email-baker.kernel@gmail.com>

From: baker.kernel@gmail.com
Date: Tue,  8 Oct 2013 11:36:51 +0800

> From: "baker.zhang" <baker.kernel@gmail.com>
> 
> This is a enhancement.
> 
> for the first node in fib_trie, newpos is 0, bit is 1.
> Only for the leaf or node with unmatched key need calc pos.
> 
> Signed-off-by: baker.zhang <baker.kernel@gmail.com>

The compiler is probably doing this transformation all by itself,
but applied anyways, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: allow to build full sized skb
From: David Miller @ 2013-10-10  4:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381248143.12191.53.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 08 Oct 2013 09:02:23 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
> typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
> 
> It's relatively easy to extend the skb using frag_list to allow
> more frags to be appended into the last sk_buff.
> 
> This still builds very efficient skbs, and allows reaching 45 MSS per
> skb.
> 
> (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
> sk_buff)
> 
> High speed TCP flows benefit from this extension by lowering TCP stack
> cpu usage (less packets stored in receive queue, less ACK packets
> processed)
> 
> Forwarding setups could be hurt, as such skbs will need to be
> linearized, although its not a new problem, as GRO could already
> provide skbs with a frag_list.
> 
> We could make the 65536 bytes threshold a tunable to mitigate this.
> 
> (First time we need to linearize skb in skb_needs_linearize(), we could
> lower the tunable to ~16*1460 so that following skb_gro_receive() calls
> build smaller skbs)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] inet: includes a sock_common in request_sock
From: David Miller @ 2013-10-10  4:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381357289.4971.35.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Oct 2013 15:21:29 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> TCP listener refactoring, part 5 :
> 
> We want to be able to insert request sockets (SYN_RECV) into main
> ehash table instead of the per listener hash table to allow RCU
> lookups and remove listener lock contention.
> 
> This patch includes the needed struct sock_common in front
> of struct request_sock
> 
> This means there is no more inet6_request_sock IPv6 specific
> structure.
> 
> Following inet_request_sock fields were renamed as they became
> macros to reference fields from struct sock_common.
> Prefix ir_ was chosen to avoid name collisions.
> 
> loc_port   -> ir_loc_port
> loc_addr   -> ir_loc_addr
> rmt_addr   -> ir_rmt_addr
> rmt_port   -> ir_rmt_port
> iif        -> ir_iif
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] tcp: use ACCESS_ONCE() in tcp_update_pacing_rate()
From: David Miller @ 2013-10-10  4:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381364092.4971.57.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Oct 2013 17:14:52 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> sk_pacing_rate is read by sch_fq packet scheduler at any time,
> with no synchronization, so make sure we update it in a
> sensible way. ACCESS_ONCE() is how we instruct compiler
> to not do stupid things, like using the memory location
> as a temporary variable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Also applied, thanks a lot.

^ permalink raw reply

* [PATCH v2] Stmmac: fix a bug when the clk_csr is euqal to 0x0
From: Wan ZongShun @ 2013-10-10  4:40 UTC (permalink / raw)
  To: Giuseppe Cavallaro, David Miller, Wan ZongShun; +Cc: netdev

Hi Giuseppe,

According to your advice, I add this new field comments in stmmac.txt.
And I only make one patch for fix this issue and field comments.

Signed-off-by: Wan Zongshun <mcuos.com@gmail.com>

---
 Documentation/networking/stmmac.txt               | 3 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 include/linux/stmmac.h                            | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/stmmac.txt
b/Documentation/networking/stmmac.txt
index 457b8bb..3ed0ea9 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -116,6 +116,7 @@ struct plat_stmmacenet_data {
     struct stmmac_mdio_bus_data *mdio_bus_data;
     struct stmmac_dma_cfg *dma_cfg;
     int clk_csr;
+    unsigned int dynamic_mdc_clk_en;
     int has_gmac;
     int enh_desc;
     int tx_coe;
@@ -148,6 +149,8 @@ Where:
        GMAC also enables the 4xPBL by default.
    o fixed_burst/mixed_burst/burst_len
  o clk_csr: fixed CSR Clock range selection.
+ o dynamic_mdc_clk_en: If it is set to >=1 MDC clk will be selected
dynamically,
+               or else you must set a fixed CSR Clock range to clk_src.
  o has_gmac: uses the GMAC core.
  o enh_desc: if sets the MAC will use the enhanced descriptor structure.
  o tx_coe: core is able to perform the tx csum in HW.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 8d4ccd3..93fc6bc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2741,7 +2741,7 @@ struct stmmac_priv *stmmac_dvr_probe(struct
device *device,
      * set the MDC clock dynamically according to the csr actual
      * clock input.
      */
-    if (!priv->plat->clk_csr)
+    if (!!priv->plat->dynamic_mdc_clk_en)
         stmmac_clk_csr_set(priv);
     else
         priv->clk_csr = priv->plat->clk_csr;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index bb5deb0..1b9f0b5 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -101,6 +101,7 @@ struct plat_stmmacenet_data {
     struct stmmac_mdio_bus_data *mdio_bus_data;
     struct stmmac_dma_cfg *dma_cfg;
     int clk_csr;
+    unsigned int dynamic_mdc_clk_en;
     int has_gmac;
     int enh_desc;
     int tx_coe;
-- 
1.8.1.2


-- 
Wan ZongShun.
www.mcuos.com

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox