Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] vxlan: synchronously and race-free destruction of vxlan sockets
From: Hannes Frederic Sowa @ 2016-04-08 20:30 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, Jiri Benc
In-Reply-To: <20160408185114.GA1920@localhost.localdomain>

Hi Marcelo,


On 08.04.2016 20:51, Marcelo Ricardo Leitner wrote:
> On Thu, Apr 07, 2016 at 04:57:40PM +0200, Hannes Frederic Sowa wrote:
>> Due to the fact that the udp socket is destructed asynchronously in a
>> work queue, we have some nondeterministic behavior during shutdown of
>> vxlan tunnels and creating new ones. Fix this by keeping the destruction
>> process synchronous in regards to the user space process so IFF_UP can
>> be reliably set.
>>
>> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
>> indicates so. We expect to have the same lifetime of vxlan_sock and
>> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
>> only destruct the whole socket after we can be sure it cannot be found
>> by searching vxlan_net->sock_list.
>>
>> Cc: Jiri Benc <jbenc@redhat.com>
>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> ---
>>   drivers/net/vxlan.c | 20 +++-----------------
>>   include/net/vxlan.h |  2 --
>>   2 files changed, 3 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index 1c0fa364323e28..487e48b7a53090 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -98,7 +98,6 @@ struct vxlan_fdb {
>>
>>   /* salt for hash table */
>>   static u32 vxlan_salt __read_mostly;
>> -static struct workqueue_struct *vxlan_wq;
>>
>>   static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
>>   {
>> @@ -1065,7 +1064,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>>   	vxlan_notify_del_rx_port(vs);
>>   	spin_unlock(&vn->sock_lock);
>>
>> -	queue_work(vxlan_wq, &vs->del_work);
>> +	synchronize_rcu();
>
> __vxlan_sock_release is called by vxlan_sock_release which is called by
> vxlan_open/stop. Do we really want to have synchronize_rcu() while
> holding rtnl?

I thought about that and try not to use synchronize_rcu, but I don't see 
any other way. Anyway, ndo_stop isn't really fast path and is used to 
shut the interface down. Also since we have lwtunnels we don't really 
need a lot of interfaces created and torn down.

But I could switch to synchronize_rcu_expedited here.

Also we have another synchronize_rcu during device dismantling, maybe we 
can split ndo_stop into two callbacks, one preparing for stopping and 
the other one after the synchronize_rcu when we safely can free resources.

I will investigate this but for the mean time I think this patch is 
already improving things as user space can bind the socket again when 
the dellink command returned.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH] net: thunderx: Fix broken of_node_put() code.
From: David Miller @ 2016-04-08 20:15 UTC (permalink / raw)
  To: ddaney
  Cc: ddaney.cavm, netdev, linux-kernel, linux-arm-kernel, rric,
	sgoutham, david.daney
In-Reply-To: <5707DF3F.3000508@caviumnetworks.com>

From: David Daney <ddaney@caviumnetworks.com>
Date: Fri, 8 Apr 2016 09:41:35 -0700

> Due to mail server malfunction, this patch was sent twice.  Please
> ignore this duplicate.

This submission had another problem too.

Do not use the date of your commit as the date that gets put into
your email headers.

This makes all of your patch submissions look like they occurred in
the past, and this mixes up the ordering of patches in patchwork.

So please resubmit this properly with a normal, current, date in your
email headers.

Thanks.

^ permalink raw reply

* Re: [PATCH v4 1/2] RDS: memory allocated must be align to 8
From: David Miller @ 2016-04-08 20:10 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: shamir.rabinovitch, rds-devel, netdev
In-Reply-To: <57080A27.8050509@oracle.com>

From: santosh shilimkar <santosh.shilimkar@oracle.com>
Date: Fri, 8 Apr 2016 12:44:39 -0700

> On 4/7/2016 4:57 AM, Shamir Rabinovitch wrote:
>> Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory
>> allocated by 'rds_page_remainder_alloc' using uint64_t pointer.
>>
> Sorry I still didn't follow this change still. What exactly is the
> problem.

You can't stop the offset at non-8byte intervals, because the chunks
being used in these arenas can have 64-bit values in it, which must be
8-byte aligned.

It looks extremely obvious to me.

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jesper Dangaard Brouer @ 2016-04-08 20:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Brenden Blanco, davem, netdev, tom, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo, linux-mm, brouer
In-Reply-To: <20160408172651.GA38264@ast-mbp.thefacebook.com>

On Fri, 8 Apr 2016 10:26:53 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Fri, Apr 08, 2016 at 02:33:40PM +0200, Jesper Dangaard Brouer wrote:
> > 
> > On Fri, 8 Apr 2016 12:36:14 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >   
> > > > +/* user return codes for PHYS_DEV prog type */
> > > > +enum bpf_phys_dev_action {
> > > > +	BPF_PHYS_DEV_DROP,
> > > > +	BPF_PHYS_DEV_OK,
> > > > +};    
> > > 
> > > I can imagine these extra return codes:
> > > 
> > >  BPF_PHYS_DEV_MODIFIED,   /* Packet page/payload modified */
> > >  BPF_PHYS_DEV_STOLEN,     /* E.g. forward use-case */
> > >  BPF_PHYS_DEV_SHARED,     /* Queue for async processing, e.g. tcpdump use-case */
> > > 
> > > The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> > > which we can look at when we get that far...  
> > 
> > I want to point out something which is quite FUNDAMENTAL, for
> > understanding these return codes (and network stack).
> > 
> > 
> > At driver RX time, the network stack basically have two ways of
> > building an SKB, which is send up the stack.
> > 
> > Option-A (fastest): The packet page is writable. The SKB can be
> > allocated and skb->data/head can point directly to the page.  And
> > we place/write skb_shared_info in the end/tail-room. (This is done by
> > calling build_skb()).
> > 
> > Option-B (slower): The packet page is read-only.  The SKB cannot point
> > skb->data/head directly to the page, because skb_shared_info need to be
> > written into skb->end (slightly hidden via skb_shinfo() casting).  To
> > get around this, a separate piece of memory is allocated (speedup by
> > __alloc_page_frag) for pointing skb->data/head, so skb_shared_info can
> > be written. (This is done when calling netdev/napi_alloc_skb()).
> >   Drivers then need to copy over packet headers, and assign + adjust
> > skb_shinfo(skb)->frags[0] offset to skip copied headers.
> > 
> > 
> > Unfortunately most drivers use option-B.  Due to cost of calling the
> > page allocator.  It is only slightly most expensive to get a larger
> > compound page from the page allocator, which then can be partitioned into
> > page-fragments, thus amortizing the page alloc cost.  Unfortunately the
> > cost is added later, when constructing the SKB.
> >  Another reason for option-B, is that archs with expensive IOMMU
> > requirements (like PowerPC), don't need to dma_unmap on every packet,
> > but only on the compound page level.
> > 
> > Side-note: Most drivers have a "copy-break" optimization.  Especially
> > for option-B, when copying header data anyhow. For small packet, one
> > might as well free (or recycle) the RX page, if header size fits into
> > the newly allocated memory (for skb_shared_info).  
> 
> I think you guys are going into overdesign territory, so
> . nack on read-only pages

Unfortunately you cannot just ignore or nack read-only pages. They are
a fact in the current drivers.

Most drivers today (at-least the ones we care about) only deliver
read-only pages.  If you don't accept read-only pages day-1, then you
first have to rewrite a lot of drivers... and that will stall the
project!  How will you deal with this fact?

The early drop filter use-case in this patchset, can ignore read-only
pages.  But ABI wise we need to deal with the future case where we do
need/require writeable pages.  A simple need-writable pages in the API
could help us move forward.


> . nack on copy-break approach

Copy-break can be ignored.  It sort of happens at a higher-level in the
driver. (Eric likely want/care this happens for local socket delivery).


> . nack on per-ring programs

Hmmm... I don't see it as a lot more complicated to attach the program
to the ring.  But maybe we can extend the API later, and thus postpone that
discussion.

> . nack on modified/stolen/shared return codes
> 
> The whole thing must be dead simple to use. Above is not simple by any means.

Maybe you missed that the above was a description of how the current
network stack handles this, which is not simple... which is root of the
hole performance issue.


> The programs must see writeable pages only and return codes:
> drop, pass to stack, redirect to xmit.
> If program wishes to modify packets before passing it to stack, it
> shouldn't need to deal with different return values.

> No special things to deal with small or large packets. No header splits.
> Program must not be aware of any such things.

I agree on this.  This layer only deals with packets at the page level,
single packets stored in continuous memory.


> Drivers can use DMA_BIDIRECTIONAL to allow received page to be
> modified by the program and immediately sent to xmit.

We just have to verify that DMA_BIDIRECTIONAL does not add extra
overhead (which is explicitly stated that it likely does on the
DMA-API-HOWTO.txt, but I like to verify this with a micro benchmark)

> No dma map/unmap/sync per packet. If some odd architectures/dma setups
> cannot do it, then XDP will not be applicable there.

I do like the idea of rejecting XDP eBPF programs based on the DMA
setup is not compatible, or if the driver does not implement e.g.
writable DMA pages.

Customers wanting this feature will then go buy the NIC which support
this feature.  There is nothing more motivating for NIC vendors seeing
customers buying the competitors hardware. And it only require a driver
change to get this market...


> We are not going to sacrifice performance for generality.

Agree.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH v3] route: do not cache fib route info on local routes with oif
From: Chris Friesen @ 2016-04-08 20:07 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.11.1604082207330.2124@ja.home.ssi.bg>

For local routes that require a particular output interface we do not want to
cache the result.  Caching the result causes incorrect behaviour when there are
multiple source addresses on the interface.  The end result being that if the
intended recipient is waiting on that interface for the packet he won't receive
it because it will be delivered on the loopback interface and the IP_PKTINFO
ipi_ifindex will be set to the loopback interface as well.

This can be tested by running a program such as "dhcp_release" which attempts
to inject a packet on a particular interface so that it is received by another
program on the same board.  The receiving process should see an IP_PKTINFO
ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback
interface (e.g., lo).  The packet will still appear on the loopback interface
in tcpdump but the important aspect is that the CMSG info is correct.

Sample dhcp_release command line:

   dhcp_release eth1 192.168.204.222 02:11:33:22:44:66

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed off-by: Chris Friesen <chris.friesen@windriver.com>
---
 net/ipv4/route.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..437a377 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,18 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		 */
 		if (fi && res->prefixlen < 4)
 			fi = NULL;
+	} else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
+		   (orig_oif != dev_out->ifindex)) {
+		/* For local routes that require a particular output interface
+                 * we do not want to cache the result.  Caching the result
+                 * causes incorrect behaviour when there are multiple source
+                 * addresses on the interface, the end result being that if the
+                 * intended recipient is waiting on that interface for the
+                 * packet he won't receive it because it will be delivered on
+                 * the loopback interface and the IP_PKTINFO ipi_ifindex will
+                 * be set to the loopback interface as well.
+		 */
+		fi = NULL;
 	}
 
 	fnhe = NULL;

^ permalink raw reply related

* Re: [PATCH v2] route: do not cache fib route info on local routes with oif
From: Chris Friesen @ 2016-04-08 20:06 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.11.1604082207330.2124@ja.home.ssi.bg>

On 04/08/2016 01:14 PM, Julian Anastasov wrote:

> 	Your patch is corrupted. I was in the same trap
> some time ago but with different client:
>
>  From Documentation/email-clients.txt:
>
> Don't send patches with "format=flowed".  This can cause unexpected
> and unwanted line breaks.
>
> 	Anyways, the change looks good to me and I'll add my
> Reviewed-by tag the next time.


Doh...forgot to turn off word wrapping.  New patch coming.

Chris

^ permalink raw reply

* Re: [PATCH net] tuntap: restore default qdisc
From: David Miller @ 2016-04-08 19:53 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, linux-kernel, mst, phil
In-Reply-To: <1460093208-4364-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri,  8 Apr 2016 13:26:48 +0800

> After commit f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using
> alloc_netdev"), default qdisc was changed to noqueue because
> tuntap does not set tx_queue_len during .setup(). This patch restores
> default qdisc by setting tx_queue_len in tun_setup().
> 
> Fixes: f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
> Cc: Phil Sutter <phil@nwl.cc>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied and queued up for -stable, thanks Jason.

^ permalink raw reply

* Re: [PATCH v4 1/2] RDS: memory allocated must be align to 8
From: santosh shilimkar @ 2016-04-08 19:44 UTC (permalink / raw)
  To: Shamir Rabinovitch, rds-devel, netdev; +Cc: davem
In-Reply-To: <1460030256-16791-1-git-send-email-shamir.rabinovitch@oracle.com>

On 4/7/2016 4:57 AM, Shamir Rabinovitch wrote:
> Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory
> allocated by 'rds_page_remainder_alloc' using uint64_t pointer.
>
Sorry I still didn't follow this change still. What exactly is the
problem.

> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> ---
>   net/rds/page.c |    4 ++--
>   1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/rds/page.c b/net/rds/page.c
> index 616f21f..e2b5a58 100644
> --- a/net/rds/page.c
> +++ b/net/rds/page.c
> @@ -135,8 +135,8 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
>   			if (rem->r_offset != 0)
>   				rds_stats_inc(s_page_remainder_hit);
>
> -			rem->r_offset += bytes;
> -			if (rem->r_offset == PAGE_SIZE) {
> +			rem->r_offset += ALIGN(bytes, 8);
> +			if (rem->r_offset >= PAGE_SIZE) {
>   				__free_page(rem->r_page);
>   				rem->r_page = NULL;
>   			}
>

^ permalink raw reply

* [PATCH v3 2/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki
In-Reply-To: <cover.1460144373.git.marcelo.leitner@gmail.com>

Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.

With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

v2: series re-checked
v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 3 ++-
 net/sctp/sm_sideeffect.c   | 7 +++++++
 net/sctp/ulpqueue.c        | 4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1a6a626904bba4223b7921bbb4be41c2550271a7..21cb11107e378b4da1e7efde22fab4349496e35a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,8 @@ struct sctp_sock {
 		v4mapped:1,
 		frag_interleave:1,
 		recvrcvinfo:1,
-		recvnxtinfo:1;
+		recvnxtinfo:1,
+		pending_data_ready:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..d06317de873090be359ce768fe291224ee50658f 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1222,6 +1222,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 				sctp_cmd_seq_t *commands,
 				gfp_t gfp)
 {
+	struct sock *sk = ep->base.sk;
+	struct sctp_sock *sp = sctp_sk(sk);
 	int error = 0;
 	int force;
 	sctp_cmd_t *cmd;
@@ -1742,6 +1744,11 @@ out:
 			error = sctp_outq_uncork(&asoc->outqueue, gfp);
 	} else if (local_cork)
 		error = sctp_outq_uncork(&asoc->outqueue, gfp);
+
+	if (sp->pending_data_ready) {
+		sk->sk_data_ready(sk);
+		sp->pending_data_ready = 0;
+	}
 	return error;
 nomem:
 	error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ce469d648ffbe166f9ae1c5650f481256f31a7f8..72e5b3e41cddf9d79371de8ab01484e4601b97b6 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -264,7 +264,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 		sctp_ulpq_clear_pd(ulpq);
 
 	if (queue == &sk->sk_receive_queue)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 	return 1;
 
 out_free:
@@ -1140,5 +1140,5 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 
 	/* If there is data waiting, send it up the socket now. */
 	if (sctp_ulpq_clear_pd(ulpq) || ev)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 }
-- 
2.5.0

^ permalink raw reply related

* [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki
In-Reply-To: <cover.1460144373.git.marcelo.leitner@gmail.com>

It wastes space and gets worse as we add new flags, so convert bit-wide
flags to a bitfield.

Currently it already saves 4 bytes in sctp_sock, which are left as holes
in it for now. The whole struct needs packing, which should be done in
another patch.

Note that do_auto_asconf cannot be merged, as explained in the comment
before it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -210,14 +210,14 @@ struct sctp_sock {
 	int user_frag;
 
 	__u32 autoclose;
-	__u8 nodelay;
-	__u8 disable_fragments;
-	__u8 v4mapped;
-	__u8 frag_interleave;
 	__u32 adaptation_ind;
 	__u32 pd_point;
-	__u8 recvrcvinfo;
-	__u8 recvnxtinfo;
+	__u16	nodelay:1,
+		disable_fragments:1,
+		v4mapped:1,
+		frag_interleave:1,
+		recvrcvinfo:1,
+		recvnxtinfo:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
-- 
2.5.0

^ permalink raw reply related

* [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki

1st patch is a preparation for the 2nd. The idea is to not call
->sk_data_ready() for every data chunk processed while processing
packets but only once before releasing the socket.

v2: patchset re-checked, small changelog fixes
v3: on patch 2, make use of local vars to make it more readable

Marcelo Ricardo Leitner (2):
  sctp: compress bit-wide flags to a bitfield on sctp_sock
  sctp: delay calls to sk_data_ready() as much as possible

 include/net/sctp/structs.h | 13 +++++++------
 net/sctp/sm_sideeffect.c   |  7 +++++++
 net/sctp/ulpqueue.c        |  4 ++--
 3 files changed, 16 insertions(+), 8 deletions(-)

-- 
2.5.0

^ permalink raw reply

* Re: [patch net-next] devlink: share user_ptr pointer for both devlink and devlink_port
From: David Miller @ 2016-04-08 19:40 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135568-16168-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri,  8 Apr 2016 19:12:48 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Ptr to devlink structure can be easily obtained from
> devlink_port->devlink. So share user_ptr[0] pointer for both and leave
> user_ptr[1] free for other users.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>

Applied, thanks again Jiri.

^ permalink raw reply

* Re: [PATCH v4 2/2] RDS: fix congestion map corruption for PAGE_SIZE > 4k
From: santosh shilimkar @ 2016-04-08 19:39 UTC (permalink / raw)
  To: Shamir Rabinovitch, rds-devel, netdev; +Cc: davem
In-Reply-To: <1460030256-16791-2-git-send-email-shamir.rabinovitch@oracle.com>

On 4/7/2016 4:57 AM, Shamir Rabinovitch wrote:
> When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If
> 'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it
> then read the data fragment as far congestion map update and lead to
> corruption of the RDS connection far congestion map.
>
> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [patch net-next 0/6] mlxsw: small driver update + one tiny devlink dependency
From: David Miller @ 2016-04-08 19:39 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri,  8 Apr 2016 19:11:19 +0200

> Cosmetics, in preparation to sharedbuffer patchset.
> First patch is here to allow patch number two.

Series applied, thanks Jiri.

^ permalink raw reply

* Re: [PATCH v5 net-next 00/15] MTU/buffer reconfig changes
From: David Miller @ 2016-04-08 19:34 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev
In-Reply-To: <1460054388-471-1-git-send-email-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Thu,  7 Apr 2016 19:39:33 +0100

> I re-discussed MPLS/MTU internally, dropped it from the patch 1,
> re-tested everything, found out I forgot about debugfs pointers,
> fixed that as well.
> 
> v5:
>  - don't reserve space in RX buffers for MPLS label stack
>    (patch 1);
>  - fix debugfs pointers to ring structures (patch 5).
> v4:
>  - cut down on unrelated patches;
>  - don't "close" the device on error path.
> 
> --- v4 cover letter
> 
> Previous series included some not entirely related patches,
> this one is cut down.  Main issue I'm trying to solve here
> is that .ndo_change_mtu() in nfpvf driver is doing full
> close/open to reallocate buffers - which if open fails
> can result in device being basically closed even though
> the interface is started.  As suggested by you I try to move
> towards a paradigm where the resources are allocated first
> and the MTU change is only done once I'm certain (almost)
> nothing can fail.  Almost because I need to communicate 
> with FW and that can always time out.
> 
> Patch 1 fixes small issue.  Next 10 patches reorganize things
> so that I can easily allocate new rings and sets of buffers
> while the device is running.  Patches 13 and 15 reshape the
> .ndo_change_mtu() and ethtool's ring-resize operation into
> desired form.

Looks good, series applied, thanks!

^ permalink raw reply

* [PATCH] mISDN: Fixing missing validation in base_sock_bind()
From: Emrah Demir @ 2016-04-08 19:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, isdn, Emrah Demir

From: Emrah Demir <ed@abdsec.com>

Add validation code into mISDN/socket.c

Signed-off-by: Emrah Demir <ed@abdsec.com>
---
 drivers/isdn/mISDN/socket.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 0d29b5a..99e5f97 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -715,6 +715,9 @@ base_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	if (!maddr || maddr->family != AF_ISDN)
 		return -EINVAL;
 
+	if (addr_len < sizeof(struct sockaddr_mISDN))
+		return -EINVAL;
+
 	lock_sock(sk);
 
 	if (_pms(sk)->dev) {
-- 
2.8.0.rc3

^ permalink raw reply related

* Re: [PATCH v2] route: do not cache fib route info on local routes with oif
From: Julian Anastasov @ 2016-04-08 19:14 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev
In-Reply-To: <5707C950.6060806@windriver.com>


	Hello,

On Fri, 8 Apr 2016, Chris Friesen wrote:

> For local routes that require a particular output interface we do not want to
> cache the result.  Caching the result causes incorrect behaviour when there
> are
> multiple source addresses on the interface.  The end result being that if the
> intended recipient is waiting on that interface for the packet he won't
> receive
> it because it will be delivered on the loopback interface and the IP_PKTINFO
> ipi_ifindex will be set to the loopback interface as well.
> 
> This can be tested by running a program such as "dhcp_release" which attempts
> to inject a packet on a particular interface so that it is received by another
> program on the same board.  The receiving process should see an IP_PKTINFO
> ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback
> interface (e.g., lo).  The packet will still appear on the loopback interface
> in tcpdump but the important aspect is that the CMSG info is correct.
> 
> Sample dhcp_release command line:
> 
>    dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
> 
> Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
> Signed off-by: Chris Friesen <chris.friesen@windriver.com>
> ---
>  net/ipv4/route.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 02c6229..437a377 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2045,6 +2045,18 @@ static struct rtable *__mkroute_output(const struct
> fib_result *res,

	Your patch is corrupted. I was in the same trap
some time ago but with different client:

>From Documentation/email-clients.txt:

Don't send patches with "format=flowed".  This can cause unexpected
and unwanted line breaks.

	Anyways, the change looks good to me and I'll add my
Reviewed-by tag the next time.

>  		*/
>  		if (fi && res->prefixlen < 4)
>  			fi = NULL;
> +	} else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
> +		   (orig_oif != dev_out->ifindex)) {
> +		/* For local routes that require a particular output interface
> +		 * we do not want to cache the result.  Caching the result
> +		 * causes incorrect behaviour when there are multiple source
> +		 * addresses on the interface, the end result being that if
> the
> +		 * intended recipient is waiting on that interface for the
> +		 * packet he won't receive it because it will be delivered on
> +		 * the loopback interface and the IP_PKTINFO ipi_ifindex will
> +		 * be set to the loopback interface as well.
> +		 */
> +		fi = NULL;
>  	}
> 
>  	fnhe = NULL;

Regards

^ permalink raw reply

* FROM: MR. OLIVER SENO!!
From: AKINWUMI @ 2016-04-08 18:53 UTC (permalink / raw)
  To: Recipients

Dear Sir.

I bring you greetings. My name is Mr.Oliver Seno Lim, I am a staff of Abbey National Plc. London and heading our regional office in West Africa. Our late customer named Engr.Ben W.westland, made a fixed deposit amount of US$7Million.He did not declare any next of kin in any of his paper work, I want you as a foreigner to stand as the beneficiary to transfer this funds out of my bank into your account, after the successful transfer, we shall share in the ratio of 30% for you, 70%for me. Should you be interested please send me your information:

1,Full names.
2,current residential address.
3,Tele/Fax numbers./your work.
 
   
All I need from you is your readiness, trustworthiness and edication. Please email me directly on my private email address: officeosenol@yahoo.com) so we can begin arrangements and I would give you more information on how we would handle this venture and once i hear from you i will give you information of the bank for the transferring funds on your name.

Regards,
Mr.Oliver Seno Lim 

^ permalink raw reply

* Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver
From: Timur Tabi @ 2016-04-08 19:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rob Herring, Gilad Avidov, netdev, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-msm, Sagar Dharia, shankerd,
	Greg Kroah-Hartman, vikrams, Christopher Covington
In-Reply-To: <20160408005317.GA28125@lunn.ch>

Andrew Lunn wrote:

> There are two different things here. One is configuring the pin to be
> a GPIO. The second is using the GPIO as a GPIO. In this case,
> bit-banging the MDIO bus.
>
> The firmware could be doing the configuration, setting the pin as a
> GPIO. However, the firmware cannot be doing the MDIO bit-banging to
> make an MDIO bus available. Linux has to do that.
>
> Or it could be we have all completely misunderstood the hardware, and
> we are not doing bit-banging GPIO MDIO. There is a real MDIO
> controller there, we don't use these pins as GPIOs, etc....

Actually, I think there is a misunderstanding.

On the FSM9900 SOC (which uses device-tree), the two pins that connect 
to the external PHY are gpio pins.  However, the driver needs to 
reprogram the pinmux so that those pins are wired to the Emac 
controller.  That's what the the gpio code in this driver is doing: it's 
just configuring the pins so that they connect directly between the Emac 
and the external PHY.  After that, they are no longer GPIO pins, and you 
cannot use the "GPIO controlled MDIO bus".  There is no MDIO controller 
on the SOC.  The external PHY is controlled directly from the Emac and 
also from the internal PHY.  It is screwy, I know, but that's what Gilad 
was trying to explain.

On the QDF2432 (which uses ACPI), those two wires are now dedicated. 
There are not muxed GPIOs any more -- they are hard wired between Emac 
and the external PHY.

In both cases, you need to use Emac registers to communicate with the 
external PHY.  Stuff like link detect and link speed are configured by 
programming the Emac and/or the internal phy.

And the internal phy isn't really an internal phy.  It's an SGMII-like 
device that's connected to the Emac and handles various phy-related 
tasks.  It has its own register block, but you still have to program it 
in concert with the Emac.  You can't really treat it separately.

So I'm beginning to believe that Gilad's driver is actually correct 
as-is.  There are a few minor bug fixes, but in general it's correct.  I 
would like to post a V4 soon that has those minor fixes.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jesper Dangaard Brouer @ 2016-04-08 19:05 UTC (permalink / raw)
  To: Brenden Blanco
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo, brouer
In-Reply-To: <20160408170159.GC28353@gmail.com>

On Fri, 8 Apr 2016 10:02:00 -0700
Brenden Blanco <bblanco@plumgrid.com> wrote:

> On Fri, Apr 08, 2016 at 02:33:40PM +0200, Jesper Dangaard Brouer wrote:
> > 
> > On Fri, 8 Apr 2016 12:36:14 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >   
> > > > +/* user return codes for PHYS_DEV prog type */
> > > > +enum bpf_phys_dev_action {
> > > > +	BPF_PHYS_DEV_DROP,
> > > > +	BPF_PHYS_DEV_OK,
> > > > +};    
> > > 
> > > I can imagine these extra return codes:
> > > 
> > >  BPF_PHYS_DEV_MODIFIED,   /* Packet page/payload modified */
> > >  BPF_PHYS_DEV_STOLEN,     /* E.g. forward use-case */
> > >  BPF_PHYS_DEV_SHARED,     /* Queue for async processing, e.g. tcpdump use-case */
> > > 
> > > The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> > > which we can look at when we get that far...  
> > 
> > I want to point out something which is quite FUNDAMENTAL, for
> > understanding these return codes (and network stack).
> > 
> > 
> > At driver RX time, the network stack basically have two ways of
> > building an SKB, which is send up the stack.
> > 
> > Option-A (fastest): The packet page is writable. The SKB can be
> > allocated and skb->data/head can point directly to the page.  And
> > we place/write skb_shared_info in the end/tail-room. (This is done by
> > calling build_skb()).
> > 
> > Option-B (slower): The packet page is read-only.  The SKB cannot point
> > skb->data/head directly to the page, because skb_shared_info need to be
> > written into skb->end (slightly hidden via skb_shinfo() casting).  To
> > get around this, a separate piece of memory is allocated (speedup by
> > __alloc_page_frag) for pointing skb->data/head, so skb_shared_info can
> > be written. (This is done when calling netdev/napi_alloc_skb()).
> >   Drivers then need to copy over packet headers, and assign + adjust
> > skb_shinfo(skb)->frags[0] offset to skip copied headers.
> > 
> > 
> > Unfortunately most drivers use option-B.  Due to cost of calling the
> > page allocator.  It is only slightly most expensive to get a larger
> > compound page from the page allocator, which then can be partitioned into
> > page-fragments, thus amortizing the page alloc cost.  Unfortunately the
> > cost is added later, when constructing the SKB.
> >  Another reason for option-B, is that archs with expensive IOMMU
> > requirements (like PowerPC), don't need to dma_unmap on every packet,
> > but only on the compound page level.
> > 
> > Side-note: Most drivers have a "copy-break" optimization.  Especially
> > for option-B, when copying header data anyhow. For small packet, one
> > might as well free (or recycle) the RX page, if header size fits into
> > the newly allocated memory (for skb_shared_info).
> > 
> > 
> > For the early filter drop (DDoS use-case), it does not matter that the
> > packet-page is read-only.
> > 
> > BUT for the future XDP (eXpress Data Path) use-case it does matter.  If
> > we ever want to see speeds comparable to DPDK, then drivers to
> > need to implement option-A, as this allow forwarding at the packet-page
> > level.
> > 
> > I hope, my future page-pool facility can remove/hide the cost calling
> > the page allocator.
> >   
> Can't wait! This will open up a lot of doors.
>

If you talk about the page-pool, then it is just once piece of the
puzzle, not the silver bullet ;-)

> > 
> > Back to the return codes, thus:
> > -------------------------------
> > BPF_PHYS_DEV_SHARED requires driver use option-B, when constructing
> > the SKB, and treat packet data as read-only.
> > 
> > BPF_PHYS_DEV_MODIFIED requires driver to provide a writable packet-page.  
>
> I understand the driver/hw requirement, but the codes themselves I think
> need some tweaking.

I'm very open to changing these return codes. I'm just trying to open
up the discussion.


> For instance, if the packet is both modified and forwarded, should
> the flags be ORed together? 

I didn't see these as bit-flags. I assumed that if you want to forward
the packet, then you need to steal it (BPF_PHYS_DEV_STOLEN) and cannot
return it to the stack.

I'm open to changing this to bit-flags, BUT we just have to take care
not to introduce too many things we need to check, due to performance
issues.


> Or is the need for this return code made obsolete if the driver knows
> ahead of time via struct bpf_prog flags that the prog intends to
> modify the packet, and can set up the page accordingly?

Yes, maybe we can drop the modified (BPF_PHYS_DEV_MODIFIED) return code.
I was just thinking this could be used to indicate if the checksum
would need to be recalculated.  If the usual checksum people don't
care, we should drop this indication.

Think about it performance wise... if we know the program _can_ modify
(but don't know if it did so), then we would have mark the SKB to the
stack as the checksum needed to be recalculated, always...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] net: ipv6: Do not keep linklocal and loopback addresses
From: David Ahern @ 2016-04-08 19:01 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

f1705ec197e7 added the option to retain user configured addresses on an
admin down. A comment to one of the later revisions suggested using the
IFA_F_PERMANENT flag rather than adding a user_managed boolean to the
ifaddr struct. A side effect of this change is that link local and
loopback addresses are also retained which is not part of the objective
of f1705ec197e7. Add check to drop those addresses.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/addrconf.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 27aed1afcf81..2dd8c1ca3287 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3438,6 +3438,12 @@ static void addrconf_type_change(struct net_device *dev, unsigned long event)
 		ipv6_mc_unmap(idev);
 }
 
+static bool addr_is_local(const struct in6_addr *addr)
+{
+	return ipv6_addr_type(addr) &
+		(IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
+}
+
 static int addrconf_ifdown(struct net_device *dev, int how)
 {
 	struct net *net = dev_net(dev);
@@ -3495,7 +3501,8 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 				 * address is retained on a down event
 				 */
 				if (!keep_addr ||
-				    !(ifa->flags & IFA_F_PERMANENT)) {
+				    !(ifa->flags & IFA_F_PERMANENT) ||
+				    addr_is_local(&ifa->addr)) {
 					hlist_del_init_rcu(&ifa->addr_lst);
 					goto restart;
 				}
@@ -3544,7 +3551,8 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 		write_unlock_bh(&idev->lock);
 		spin_lock_bh(&ifa->lock);
 
-		if (keep_addr && (ifa->flags & IFA_F_PERMANENT)) {
+		if (keep_addr && (ifa->flags & IFA_F_PERMANENT) &&
+		    !addr_is_local(&ifa->addr)) {
 			/* set state to skip the notifier below */
 			state = INET6_IFADDR_STATE_DEAD;
 			ifa->state = 0;
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v2 1/5] net: w5100: move mmiowb into register access callbacks
From: Akinobu Mita @ 2016-04-08 18:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Mike Sinkovsky
In-Reply-To: <20160407.122951.1457426147075873743.davem@davemloft.net>

2016-04-08 1:29 GMT+09:00 David Miller <davem@davemloft.net>:
>
> Where is your "[PATCH v2 0/5] ..." header posting explaing what this series
> is doing, at a high level, how it is doing that, and why it is doing it
> that way?
>
> This is mandator for patch series submissions.

I see.  I'll surely include the explanations at the v3 submission.

^ permalink raw reply

* Re: [PATCH net] vxlan: synchronously and race-free destruction of vxlan sockets
From: Marcelo Ricardo Leitner @ 2016-04-08 18:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, Jiri Benc
In-Reply-To: <1460041060-8619-1-git-send-email-hannes@stressinduktion.org>

Hi Hannes,

On Thu, Apr 07, 2016 at 04:57:40PM +0200, Hannes Frederic Sowa wrote:
> Due to the fact that the udp socket is destructed asynchronously in a
> work queue, we have some nondeterministic behavior during shutdown of
> vxlan tunnels and creating new ones. Fix this by keeping the destruction
> process synchronous in regards to the user space process so IFF_UP can
> be reliably set.
> 
> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> indicates so. We expect to have the same lifetime of vxlan_sock and
> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> only destruct the whole socket after we can be sure it cannot be found
> by searching vxlan_net->sock_list.
> 
> Cc: Jiri Benc <jbenc@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  drivers/net/vxlan.c | 20 +++-----------------
>  include/net/vxlan.h |  2 --
>  2 files changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 1c0fa364323e28..487e48b7a53090 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -98,7 +98,6 @@ struct vxlan_fdb {
>  
>  /* salt for hash table */
>  static u32 vxlan_salt __read_mostly;
> -static struct workqueue_struct *vxlan_wq;
>  
>  static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
>  {
> @@ -1065,7 +1064,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>  	vxlan_notify_del_rx_port(vs);
>  	spin_unlock(&vn->sock_lock);
>  
> -	queue_work(vxlan_wq, &vs->del_work);
> +	synchronize_rcu();

__vxlan_sock_release is called by vxlan_sock_release which is called by
vxlan_open/stop. Do we really want to have synchronize_rcu() while
holding rtnl?

> +	udp_tunnel_sock_release(vs->sock);
> +	kfree(vs);
>  }
>  
>  static void vxlan_sock_release(struct vxlan_dev *vxlan)
> @@ -2574,13 +2575,6 @@ static const struct ethtool_ops vxlan_ethtool_ops = {
>  	.get_link	= ethtool_op_get_link,
>  };
>  
> -static void vxlan_del_work(struct work_struct *work)
> -{
> -	struct vxlan_sock *vs = container_of(work, struct vxlan_sock, del_work);
> -	udp_tunnel_sock_release(vs->sock);
> -	kfree_rcu(vs, rcu);
> -}
> -
>  static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
>  					__be16 port, u32 flags)
>  {
> @@ -2626,8 +2620,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
>  	for (h = 0; h < VNI_HASH_SIZE; ++h)
>  		INIT_HLIST_HEAD(&vs->vni_list[h]);
>  
> -	INIT_WORK(&vs->del_work, vxlan_del_work);
> -
>  	sock = vxlan_create_sock(net, ipv6, port, flags);
>  	if (IS_ERR(sock)) {
>  		pr_info("Cannot bind port %d, err=%ld\n", ntohs(port),
> @@ -3218,10 +3210,6 @@ static int __init vxlan_init_module(void)
>  {
>  	int rc;
>  
> -	vxlan_wq = alloc_workqueue("vxlan", 0, 0);
> -	if (!vxlan_wq)
> -		return -ENOMEM;
> -
>  	get_random_bytes(&vxlan_salt, sizeof(vxlan_salt));
>  
>  	rc = register_pernet_subsys(&vxlan_net_ops);
> @@ -3242,7 +3230,6 @@ out3:
>  out2:
>  	unregister_pernet_subsys(&vxlan_net_ops);
>  out1:
> -	destroy_workqueue(vxlan_wq);
>  	return rc;
>  }
>  late_initcall(vxlan_init_module);
> @@ -3251,7 +3238,6 @@ static void __exit vxlan_cleanup_module(void)
>  {
>  	rtnl_link_unregister(&vxlan_link_ops);
>  	unregister_netdevice_notifier(&vxlan_notifier_block);
> -	destroy_workqueue(vxlan_wq);
>  	unregister_pernet_subsys(&vxlan_net_ops);
>  	/* rcu_barrier() is called by netns */
>  }
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 73ed2e951c020d..2113f808e905a4 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -126,9 +126,7 @@ struct vxlan_metadata {
>  /* per UDP socket information */
>  struct vxlan_sock {
>  	struct hlist_node hlist;
> -	struct work_struct del_work;
>  	struct socket	 *sock;
> -	struct rcu_head	  rcu;
>  	struct hlist_head vni_list[VNI_HASH_SIZE];
>  	atomic_t	  refcnt;
>  	struct udp_offload udp_offloads;
> -- 
> 2.5.5
> 

^ permalink raw reply

* [PATCH] drivers/net/ethernet/jme.c: Deinline jme_reset_mac_processor, save 2816 bytes
From: Denys Vlasenko @ 2016-04-08 18:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: Denys Vlasenko, linux-kernel, netdev

This function compiles to 895 bytes of machine code.

Clearly, this isn't a time-critical function.
For one, it has a number of udelay(1) calls.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/jme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index 3ddf657..711cb19 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -222,7 +222,7 @@ jme_clear_ghc_reset(struct jme_adapter *jme)
 	jwrite32f(jme, JME_GHC, jme->reg_ghc);
 }
 
-static inline void
+static void
 jme_reset_mac_processor(struct jme_adapter *jme)
 {
 	static const u32 mask[WAKEUP_FRAME_MASK_DWNR] = {0, 0, 0, 0};
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Eric Dumazet @ 2016-04-08 18:23 UTC (permalink / raw)
  To: Petri Gynther
  Cc: Florian Fainelli, netdev, David Miller, opendmb, Jaedon Shin
In-Reply-To: <CAGXr9JED3WLEQr67FrMJoJnUaYJwOcnwRE4JxHTaf+0ha00kig@mail.gmail.com>

On Fri, 2016-04-08 at 09:54 -0700, Petri Gynther wrote:
> On Wed, Apr 6, 2016 at 1:25 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >
> > 2016-04-05 17:50 GMT-07:00 Petri Gynther <pgynther@google.com>:
> > > Add Byte Queue Limits (BQL) support to bcmgenet driver.
> > >
> > > Signed-off-by: Petri Gynther <pgynther@google.com>
> >
> > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >
> > Thanks!
> > --
> > Florian
> 
> Any further comments?
> 
> Notable difference from some other drivers --
> netdev_tx_reset_queue(txq) is called for all queues in
> bcmgenet_netif_start(), just before netif_tx_start_all_queues(dev).
> This is to ensure that BQL is reset before the interface becomes
> operational.
> 
> I think that is the right place for these calls.
> 
> Some other drivers call it from the "interface down" path.

BQL is ready to go at device setup :

__QUEUE_STATE_STACK_XOFF is not set

dql_reset() was called from dql_init(), called from
netdev_init_one_queue()

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox