Netdev List
 help / color / mirror / Atom feed
* [PATCHv4 net-next 00/15] BPF hardware offload (cls_bpf for now)
From: Jakub Kicinski @ 2016-09-15 19:12 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, jiri, john.fastabend, kubakici, Jakub Kicinski

Hi!

Dave, this set depends on bitfield.h which is sitting in the
pull request from Kalle.  I'm expecting buildbot to complain
about patch 8, please pull wireless-drivers-next before applying.

v4:
 - rename parser -> analyzer;
 - reorganize the analyzer patches a bit;
 - use bitfield.h directly.

--- merge blurb:
In the last year a lot of progress have been made on offloading
simpler TC classifiers.  There is also growing interest in using
BPF for generic high-speed packet processing in the kernel.
It seems beneficial to tie those two trends together and think
about hardware offloads of BPF programs.  This patch set presents
such offload to Netronome smart NICs.  cls_bpf is extended with
hardware offload capabilities and NFP driver gets a JIT translator
which in presence of capable firmware can be used to offload
the BPF program onto the card.

BPF JIT implementation is not 100% complete (e.g. missing instructions)
but it is functional.  Encouragingly it should be possible to
offload most (if not all) advanced BPF features onto the NIC - 
including packet modification, maps, tunnel encap/decap etc.

Example of basic tests I used:
  __section_cls_entry
  int cls_entry(struct __sk_buff *skb)
  {
	if (load_byte(skb, 0) != 0x0)
		return 0;

	if (load_byte(skb, 4) != 0x1)
		return 0;

	skb->mark = 0xcafe;

	if (load_byte(skb, 50) != 0xff)
		return 0;

	return ~0U;
  }

Above code can be compiled with Clang and loaded like this:
# ethtool -K p1p1 hw-tc-offload on
# tc qdisc add dev p1p1 ingress
# tc filter add dev p1p1 parent ffff:  bpf obj prog.o action drop

This set implements the basic transparent offload, the skip_{sw,hw}
flags and reporting statistics for cls_bpf.

Jakub Kicinski (15):
  net: cls_bpf: add hardware offload
  net: cls_bpf: limit hardware offload by software-only flag
  net: cls_bpf: add support for marking filters as hardware-only
  bpf: don't (ab)use instructions to store state
  bpf: expose internal verfier structures
  bpf: enable non-core use of the verfier
  bpf: recognize 64bit immediate loads as consts
  nfp: add BPF to NFP code translator
  nfp: bpf: add hardware bpf offload
  net: cls_bpf: allow offloaded filters to update stats
  net: bpf: allow offloaded filters to update stats
  nfp: bpf: add packet marking support
  net: act_mirred: allow statistic updates from offloaded actions
  nfp: bpf: add support for legacy redirect action
  nfp: bpf: add offload of TC direct action mode

 drivers/net/ethernet/netronome/nfp/Makefile        |    7 +
 drivers/net/ethernet/netronome/nfp/nfp_asm.h       |  233 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h       |  212 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   | 1816 ++++++++++++++++++++
 .../net/ethernet/netronome/nfp/nfp_bpf_verifier.c  |  160 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |   47 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  134 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   51 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |   12 +
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  291 ++++
 .../net/ethernet/netronome/nfp/nfp_netvf_main.c    |    2 +-
 include/linux/bpf_verifier.h                       |   89 +
 include/linux/netdevice.h                          |    2 +
 include/net/pkt_cls.h                              |   16 +
 include/uapi/linux/pkt_cls.h                       |    1 +
 kernel/bpf/verifier.c                              |  384 +++--
 net/sched/act_mirred.c                             |    8 +
 net/sched/cls_bpf.c                                |  117 +-
 18 files changed, 3376 insertions(+), 206 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_asm.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_verifier.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
 create mode 100644 include/linux/bpf_verifier.h

-- 
1.9.1

^ permalink raw reply

* MDB offloading of local ipv4 multicast groups
From: John Crispin @ 2016-09-15 18:58 UTC (permalink / raw)
  To: Elad Raz, netdev@vger.kernel.org
  Cc: Ido Schimmel, Jiri Pirko, Nikolay Aleksandrov, David S. Miller

Hi,

While adding MDB support to the qca8k dsa driver I found that ipv4 mcast
groups don't always get propagated to the dsa driver. In my setup there
are 2 clients connected to the switch, both running a mdns client. The
.port_mdb_add() callback is properly called for 33:33:00:00:00:FB but
01:00:5E:00:00:FB never got propagated to the dsa driver.

The reason is that the call to ipv4_is_local_multicast() here [1] will
return true and the notifier is never called. Is this intentional or is
there something missing in the code ?

	John

[1]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_multicast.c?id=refs/tags/v4.8-rc6#n737

^ permalink raw reply

* Re: XDP user interface confusions
From: Brenden Blanco @ 2016-09-15 18:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Tom Herbert, Daniel Borkmann,
	netdev@vger.kernel.org, iovisor-dev@lists.iovisor.org
In-Reply-To: <20160915201402.15b65136@redhat.com>

On Thu, Sep 15, 2016 at 08:14:02PM +0200, Jesper Dangaard Brouer wrote:
> Hi Brenden,
> 
> I don't quite understand the semantics of the XDP userspace interface.
> 
> We allow XDP programs to be (unconditionally) exchanged by another
> program, this avoids taking the link down+up and avoids reallocating
> RX ring resources (which is great).
> 
> We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
> want to first load xdp1 and then to avoid the linkdown I load xdp2,
> and then afterwards remove/stop program xdp1.
> 
> This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
> exits it unconditionally removes the running XDP program (loaded by xdp2)
> via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
> running, and is unaware of its xdp/bpf program have been unloaded.
> 
> I find this userspace interface confusing. What this your intention?
> Perhaps you can explain what the intended semantics or specification is?

In practice, we've used a single agent process to manage bpf programs on
behalf of the user applications. This agent process uses common linux
functionalities to add semantics, while not really relying on the bpf
handles themselves to take care of that. For instance, the process may
put some lockfiles and what-not in /var/run/$PID, and maybe returns the
list of running programs through a http: or unix: interface.

So, from a user<->kernel API, the requirements are minimal...the agent
process just overwrites the loaded bpf program when the application
changes, or a new application comes online. There is nobody to 'notify'
when a handle changes.

When translating this into the kernel api that you see now, none of this
exists, because IMHO the kernel api should be unopinionated and generic.
The result is something that appears very "fire-and-forget", which
results in something simple yet safe at the same time; the refcounting
is done transparently by the kernel.

So, in practice, there is no xdp1 or xdp2, just xdp-agent at different
points in time. Or, better yet, no agent, just the programs running in
the kernel, with the handles of the programs residing solely in the
device, which are perhaps pinned to /sys/fs/bpf for semantic management
purposes. I didn't feel like it was appropriate to conflate different
bpf features in the kernel samples, so we don't see (and probably never
will) a sample which combines these features into a whole. That is best
left to userspace tools. It so happens that this is one of the projects
I am currently active on at $DAYJOB, and we fully intend to share the
details of that when it's in a suitable state.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* XDP user interface confusions
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-15 18:14 UTC (permalink / raw)
  To: Brenden Blanco
  Cc: Tom Herbert,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Daniel Borkmann

Hi Brenden,

I don't quite understand the semantics of the XDP userspace interface.

We allow XDP programs to be (unconditionally) exchanged by another
program, this avoids taking the link down+up and avoids reallocating
RX ring resources (which is great).

We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
want to first load xdp1 and then to avoid the linkdown I load xdp2,
and then afterwards remove/stop program xdp1.

This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
exits it unconditionally removes the running XDP program (loaded by xdp2)
via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
running, and is unaware of its xdp/bpf program have been unloaded.

I find this userspace interface confusing. What this your intention?
Perhaps you can explain what the intended semantics or specification is?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* XDP user interface conclusions
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-15 18:13 UTC (permalink / raw)
  To: Brenden Blanco
  Cc: Tom Herbert,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Daniel Borkmann

Hi Brenden,

I don't quite understand the semantics of the XDP userspace interface.

We allow XDP programs to be (unconditionally) exchanged by another
program, this avoids taking the link down+up and avoids reallocating
RX ring resources (which is great).

We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
want to first load xdp1 and then to avoid the linkdown I load xdp2,
and then afterwards remove/stop program xdp1.

This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
exits it unconditionally removes the running XDP program (loaded by xdp2)
via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
running, and is unaware of its xdp/bpf program have been unloaded.

I find this userspace interface confusing. What this your intention?
Perhaps you can explain what the intended semantics or specification is?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH next] sctp: make use of WORD_TRUNC macro
From: Marcelo Ricardo Leitner @ 2016-09-15 18:12 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich

No functional change. Just to avoid the usage of '&~3'.
Also break the line to make it easier to read.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/chunk.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index a55e54738b81ff8cf9cd711cf5fc466ac71374c0..adae4a41ca2078cfee387631f76e5cb768c2269c 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -182,9 +182,10 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	/* This is the biggest possible DATA chunk that can fit into
 	 * the packet
 	 */
-	max_data = (asoc->pathmtu -
-		sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-		sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk)) & ~3;
+	max_data = asoc->pathmtu -
+		   sctp_sk(asoc->base.sk)->pf->af->net_header_len -
+		   sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+	max_data = WORD_TRUNC(max_data);
 
 	max = asoc->frag_point;
 	/* If the the peer requested that we authenticate DATA chunks
-- 
2.7.4

^ permalink raw reply related

* [PATCH net] sctp: fix SSN comparision
From: Marcelo Ricardo Leitner @ 2016-09-15 18:02 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich

This function actually operates on u32 yet its paramteres were declared
as u16, causing integer truncation upon calling.

Note in patch context that ADDIP_SERIAL_SIGN_BIT is already 32 bits.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---

This issue exists since before git import, so I can't put a Fixes tag.
Also, that said, probably not worth queueing it to stable.
Thanks

 include/net/sctp/sm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index efc01743b9d641bf6b16a37780ee0df34b4ec698..bafe2a0ab9085f24e17038516c55c00cfddd02f4 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -382,7 +382,7 @@ enum {
 	ADDIP_SERIAL_SIGN_BIT = (1<<31)
 };
 
-static inline int ADDIP_SERIAL_gte(__u16 s, __u16 t)
+static inline int ADDIP_SERIAL_gte(__u32 s, __u32 t)
 {
 	return ((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT);
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] tcp: prepare skbs for better sack shifting
From: Yuchung Cheng @ 2016-09-15 17:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1473957182.22679.50.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Sep 15, 2016 at 9:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> With large BDP TCP flows and lossy networks, it is very important
> to keep a low number of skbs in the write queue.
>
> RACK and SACK processing can perform a linear scan of it.
>
> We should avoid putting any payload in skb->head, so that SACK
> shifting can be done if needed.
>
> With this patch, we allow to pack ~0.5 MB per skb instead of
> the 64KB initially cooked at tcp_sendmsg() time.
>
> This gives a reduction of number of skbs in write queue by eight.
> tcp_rack_detect_loss() likes this.
>
> We still allow payload in skb->head for first skb put in the queue,
> to not impact RPC workloads.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>


> ---
>  net/ipv4/tcp.c |   31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
>  }
>  EXPORT_SYMBOL(tcp_sendpage);
>
> -static inline int select_size(const struct sock *sk, bool sg)
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it wont fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> +       if (first_skb)
> +               return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +       return 0;
> +}
> +
> +static int select_size(const struct sock *sk, bool sg, bool first_skb)
>  {
>         const struct tcp_sock *tp = tcp_sk(sk);
>         int tmp = tp->mss_cache;
>
>         if (sg) {
>                 if (sk_can_gso(sk)) {
> -                       /* Small frames wont use a full page:
> -                        * Payload will immediately follow tcp header.
> -                        */
> -                       tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +                       tmp = linear_payload_sz(first_skb);
>                 } else {
>                         int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
>
> @@ -1161,6 +1175,8 @@ restart:
>                 }
>
>                 if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> +                       bool first_skb;
> +
>  new_segment:
>                         /* Allocate new segment. If the interface is SG,
>                          * allocate skb fitting to single page.
> @@ -1172,10 +1188,11 @@ new_segment:
>                                 process_backlog = false;
>                                 goto restart;
>                         }
> +                       first_skb = skb_queue_empty(&sk->sk_write_queue);
>                         skb = sk_stream_alloc_skb(sk,
> -                                                 select_size(sk, sg),
> +                                                 select_size(sk, sg, first_skb),
>                                                   sk->sk_allocation,
> -                                                 skb_queue_empty(&sk->sk_write_queue));
> +                                                 first_skb);
>                         if (!skb)
>                                 goto wait_for_memory;
>
>
>

^ permalink raw reply

* [PATCH] llc: switch type to bool as the timeout is only tested versus 0
From: Alan @ 2016-09-15 17:51 UTC (permalink / raw)
  To: netdev

(As asked by Dave in Februrary)

Signed-off-by: Alan Cox <alan@linux.intel.com>

---
 net/llc/af_llc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..db916cf 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -38,7 +38,7 @@ static u16 llc_ui_sap_link_no_max[256];
 static struct sockaddr_llc llc_ui_addrnull;
 static const struct proto_ops llc_ui_ops;
 
-static long llc_ui_wait_for_conn(struct sock *sk, long timeout);
+static bool llc_ui_wait_for_conn(struct sock *sk, long timeout);
 static int llc_ui_wait_for_disc(struct sock *sk, long timeout);
 static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout);
 
@@ -551,7 +551,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long timeout)
 	return rc;
 }
 
-static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
+static bool llc_ui_wait_for_conn(struct sock *sk, long timeout)
 {
 	DEFINE_WAIT(wait);
 

^ permalink raw reply related

* Re: [PATCH] mwifiex: fix memory leak on regd when chan is zero
From: Colin Ian King @ 2016-09-15 17:26 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Amitkumar Karwar, Nishant Sarmukadam, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <87vaxx9jhk.fsf@kamboji.qca.qualcomm.com>

On 15/09/16 18:10, Kalle Valo wrote:
> Colin King <colin.king@canonical.com> writes:
> 
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> When chan is zero mwifiex_create_custom_regdomain does not kfree
>> regd and we have a memory leak. Fix this by freeing regd before
>> the return.
>>
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
>> ---
>>  drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> index 3344a26..15a91f3 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> @@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct mwifiex_private *priv,
>>  		enum nl80211_band band;
>>  
>>  		chan = *buf++;
>> -		if (!chan)
>> +		if (!chan) {
>> +			kfree(regd);
>>  			return NULL;
>> +		}
> 
> Bob sent a similar fix and he also did more:
> 
> mwifiex: fix error handling in mwifiex_create_custom_regdomain
> 
> https://patchwork.kernel.org/patch/9331337/
> 
Ah, sorry for the duplication noise.

Colin

^ permalink raw reply

* [PATCH net-next] net: l3mdev: Remove netif_index_is_l3_master
From: David Ahern @ 2016-09-15 17:18 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

No longer used after e0d56fdd73422 ("net: l3mdev: remove redundant calls")

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/l3mdev.h | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 3832099289c5..b220dabeab45 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -114,25 +114,6 @@ static inline u32 l3mdev_fib_table(const struct net_device *dev)
 	return tb_id;
 }
 
-static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
-{
-	struct net_device *dev;
-	bool rc = false;
-
-	if (ifindex == 0)
-		return false;
-
-	rcu_read_lock();
-
-	dev = dev_get_by_index_rcu(net, ifindex);
-	if (dev)
-		rc = netif_is_l3_master(dev);
-
-	rcu_read_unlock();
-
-	return rc;
-}
-
 struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6);
 
 static inline
@@ -226,11 +207,6 @@ static inline u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
 	return 0;
 }
 
-static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
-{
-	return false;
-}
-
 static inline
 struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6)
 {
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 1/7] lwt: Add net to build_state argument
From: Roopa Prabhu @ 2016-09-15 17:17 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, tgraf, kernel-team
In-Reply-To: <1473895376-347096-2-git-send-email-tom@herbertland.com>

On 9/14/16, 4:22 PM, Tom Herbert wrote:
> Users of LWT need to know net if they want to have per net operations
> in LWT.
>
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
>  
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

^ permalink raw reply

* [PATCH net-next] net: vrf: Remove RT_FL_TOS
From: David Ahern @ 2016-09-15 17:13 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

No longer used after d66f6c0a8f3c0 ("net: ipv4: Remove l3mdev_get_saddr")

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 55674b0e65b7..85c271c70d42 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -37,9 +37,6 @@
 #include <net/l3mdev.h>
 #include <net/fib_rules.h>
 
-#define RT_FL_TOS(oldflp4) \
-	((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
-
 #define DRV_NAME	"vrf"
 #define DRV_VERSION	"1.0"
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH] mwifiex: fix memory leak on regd when chan is zero
From: Kalle Valo @ 2016-09-15 17:10 UTC (permalink / raw)
  To: Colin King
  Cc: Amitkumar Karwar, Nishant Sarmukadam, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20160915162117.1209-1-colin.king@canonical.com>

Colin King <colin.king@canonical.com> writes:

> From: Colin Ian King <colin.king@canonical.com>
>
> When chan is zero mwifiex_create_custom_regdomain does not kfree
> regd and we have a memory leak. Fix this by freeing regd before
> the return.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> index 3344a26..15a91f3 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> @@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct mwifiex_private *priv,
>  		enum nl80211_band band;
>  
>  		chan = *buf++;
> -		if (!chan)
> +		if (!chan) {
> +			kfree(regd);
>  			return NULL;
> +		}

Bob sent a similar fix and he also did more:

mwifiex: fix error handling in mwifiex_create_custom_regdomain

https://patchwork.kernel.org/patch/9331337/

-- 
Kalle Valo

^ permalink raw reply

* [net-next PATCH] net: netlink messages for HW addr programming
From: Patrick Ruddy @ 2016-09-15 16:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, alexander.h.duyck, stephen, lboccass, sven,
	Patrick Ruddy

Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
AF_UNSPEC to indicate interest in specific unicast and multicast
hardware addresses. These messages are sent when addresses are
added or deleted from the appropriate interface driver.
Added AF_UNSPEC GETADDR function to allow the netlink notifications
to be replayed to avoid loss of state due to application start
ordering or restart.

Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
---
 include/linux/netdevice.h |   1 +
 net/core/dev_addr_lists.c | 157 ++++++++++++++++++++++++++++++++++++++++++++--
 net/core/rtnetlink.c      |   8 ++-
 3 files changed, 161 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2095b6a..2029618 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3751,6 +3751,7 @@ int dev_mc_sync_multiple(struct net_device *to, struct net_device *from);
 void dev_mc_unsync(struct net_device *to, struct net_device *from);
 void dev_mc_flush(struct net_device *dev);
 void dev_mc_init(struct net_device *dev);
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb);
 
 /**
  *  __dev_mc_sync - Synchonize device's multicast list
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index c0548d2..70343e6 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -12,9 +12,17 @@
  */
 
 #include <linux/netdevice.h>
+#include <net/netlink.h>
 #include <linux/rtnetlink.h>
 #include <linux/export.h>
 #include <linux/list.h>
+#include <net/sock.h>
+
+enum unspec_addr_idx {
+	UNSPEC_UCAST = 0,
+	UNSPEC_MCAST,
+	UNSPEC_MAX
+};
 
 /*
  * General list handling functions
@@ -477,6 +485,139 @@ out:
 }
 EXPORT_SYMBOL(dev_uc_add_excl);
 
+static int fill_addr(struct sk_buff *skb, struct net_device *dev,
+		     const unsigned char *addr, u32 seq, int type,
+		     int addr_type, int ifa_flags, unsigned int flags)
+{
+	struct nlmsghdr *nlh;
+	struct ifaddrmsg *ifm;
+
+	nlh = nlmsg_put(skb, 0, seq, type, sizeof(*ifm), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	ifm = nlmsg_data(nlh);
+	ifm->ifa_family = AF_UNSPEC;
+	ifm->ifa_prefixlen = 0;
+	ifm->ifa_flags = ifa_flags;
+	ifm->ifa_scope = RT_SCOPE_LINK;
+	ifm->ifa_index = dev->ifindex;
+	if (nla_put(skb, addr_type, dev->addr_len, addr))
+		goto nla_put_failure;
+	nlmsg_end(skb, nlh);
+	return 0;
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static inline size_t addr_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
+		+ nla_total_size(MAX_ADDR_LEN);
+}
+
+static void addr_notify(struct net_device *dev, const unsigned char *addr,
+			int type, int addr_type)
+{
+	struct net *net = dev_net(dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
+	if (!skb)
+		goto errout;
+
+	err = fill_addr(skb, dev, addr, 0, type, addr_type, IFA_F_SECONDARY,
+			0);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(skb);
+		goto errout;
+	}
+	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	return;
+errout:
+	if (err < 0)
+		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+}
+
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	struct hlist_head *head;
+	struct netdev_hw_addr_list *list;
+	struct netdev_hw_addr *ha;
+	int h, s_h;
+	int idx = 0, s_idx;
+	int mac_idx = 0, s_mac_idx;
+	enum unspec_addr_idx addr_idx = 0, s_addr_idx;
+	int err = 0;
+
+	s_h = cb->args[0];
+	s_idx = cb->args[1];
+	s_addr_idx = cb->args[2];
+	s_mac_idx = cb->args[3];
+
+	rcu_read_lock();
+	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+		idx = 0;
+		head = &net->dev_index_head[h];
+		cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
+			  net->dev_base_seq;
+		hlist_for_each_entry_rcu(dev, head, index_hlist) {
+			if (idx < s_idx)
+				goto cont;
+			if (h > s_h || idx > s_idx)
+				s_mac_idx = 0;
+			for (addr_idx = 0; addr_idx < UNSPEC_MAX;
+			     addr_idx++, s_addr_idx = 0) {
+				if (addr_idx < s_addr_idx)
+					continue;
+				list = (addr_idx == UNSPEC_UCAST) ? &dev->uc :
+					&dev->mc;
+				if (netdev_hw_addr_list_empty(list))
+					continue;
+				mac_idx = 0;
+				list_for_each_entry(ha, &list->list, list) {
+					if (mac_idx < s_mac_idx) {
+						mac_idx++;
+						continue;
+					}
+					err = fill_addr(skb, dev, ha->addr,
+							cb->nlh->nlmsg_seq,
+							RTM_NEWADDR,
+							(addr_idx ==
+							 UNSPEC_UCAST) ?
+							IFA_ADDRESS :
+							IFA_MULTICAST,
+							IFA_F_SECONDARY,
+							NLM_F_MULTI);
+					if (err < 0)
+						goto done;
+					nl_dump_check_consistent(cb,
+								 nlmsg_hdr(skb)
+								 );
+					mac_idx++;
+				}
+				s_mac_idx = 0;
+			}
+cont:
+			idx++;
+		}
+	}
+done:
+	rcu_read_unlock();
+	cb->args[0] = h;
+	cb->args[1] = idx;
+	cb->args[2] = addr_idx;
+	cb->args[3] = mac_idx;
+
+	return skb->len;
+}
+
 /**
  *	dev_uc_add - Add a secondary unicast address
  *	@dev: device
@@ -492,8 +633,10 @@ int dev_uc_add(struct net_device *dev, const unsigned char *addr)
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_add(&dev->uc, addr, dev->addr_len,
 			    NETDEV_HW_ADDR_T_UNICAST);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_NEWADDR, IFA_ADDRESS);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -514,8 +657,10 @@ int dev_uc_del(struct net_device *dev, const unsigned char *addr)
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_del(&dev->uc, addr, dev->addr_len,
 			    NETDEV_HW_ADDR_T_UNICAST);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_DELADDR, IFA_ADDRESS);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -669,8 +814,10 @@ static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_NEWADDR, IFA_MULTICAST);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -709,8 +856,10 @@ static int __dev_mc_del(struct net_device *dev, const unsigned char *addr,
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_del_ex(&dev->mc, addr, dev->addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_DELADDR, IFA_MULTICAST);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 937e459..e6292bb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2686,8 +2686,14 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 	int idx;
 	int s_idx = cb->family;
 
-	if (s_idx == 0)
+	if (s_idx == 0) {
+		if (unspec_dump_ifaddr(skb, cb))
+			return skb->len;
+		memset(&cb->args[0], 0, sizeof(cb->args));
+		cb->prev_seq = 0;
+		cb->seq = 0;
 		s_idx = 1;
+	}
 	for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
 		int type = cb->nlh->nlmsg_type-RTM_BASE;
 		if (idx < s_idx || idx == PF_PACKET)
-- 
2.1.4

^ permalink raw reply related

* [PATCH] irda: Free skb on irda_accept error path.
From: Phil Turnbull @ 2016-09-15 16:41 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Samuel Ortiz, Phil Turnbull

skb is not freed if newsk is NULL. Rework the error path so free_skb is
unconditionally called on function exit.

Fixes: c3ea9fa27413 ("[IrDA] af_irda: IRDA_ASSERT cleanups")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
---
 net/irda/af_irda.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 8d2f7c9b491d..ccc244406fb9 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -832,7 +832,7 @@ static int irda_accept(struct socket *sock, struct socket *newsock, int flags)
 	struct sock *sk = sock->sk;
 	struct irda_sock *new, *self = irda_sk(sk);
 	struct sock *newsk;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	int err;
 
 	err = irda_create(sock_net(sk), newsock, sk->sk_protocol, 0);
@@ -900,7 +900,6 @@ static int irda_accept(struct socket *sock, struct socket *newsock, int flags)
 	err = -EPERM; /* value does not seem to make sense. -arnd */
 	if (!new->tsap) {
 		pr_debug("%s(), dup failed!\n", __func__);
-		kfree_skb(skb);
 		goto out;
 	}
 
@@ -919,7 +918,6 @@ static int irda_accept(struct socket *sock, struct socket *newsock, int flags)
 	/* Clean up the original one to keep it in listen state */
 	irttp_listen(self->tsap);
 
-	kfree_skb(skb);
 	sk->sk_ack_backlog--;
 
 	newsock->state = SS_CONNECTED;
@@ -927,6 +925,7 @@ static int irda_accept(struct socket *sock, struct socket *newsock, int flags)
 	irda_connect_response(new);
 	err = 0;
 out:
+	kfree_skb(skb);
 	release_sock(sk);
 	return err;
 }
-- 
2.9.0.rc2

^ permalink raw reply related

* Re: [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()
From: Eric Dumazet @ 2016-09-15 16:35 UTC (permalink / raw)
  To: David Laight; +Cc: David Miller, netdev
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB00FDAFC@AcuExch.aculab.com>

On Thu, 2016-09-15 at 15:52 +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 15 September 2016 16:13
> > If a TCP socket gets a large write queue, an overflow can happen
> > in a test in __tcp_retransmit_skb() preventing all retransmits.
> ...
> >  	if (atomic_read(&sk->sk_wmem_alloc) >
> > -	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
> > +	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> > +		  sk->sk_sndbuf))
> >  		return -EAGAIN;
> 
> Might it also be better to split that test to (say):
> 
> 	u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
> 	if (unlikely((wmem_alloc > sk->sk_sndbuf))
> 		return -EAGAIN;
> 	if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2)))
> 		return -EAGAIN;

Well, I find the existing code more readable, but this is just an
opinion.

Thanks.

^ permalink raw reply

* [PATCH net-next] tcp: prepare skbs for better sack shifting
From: Eric Dumazet @ 2016-09-15 16:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

With large BDP TCP flows and lossy networks, it is very important
to keep a low number of skbs in the write queue.

RACK and SACK processing can perform a linear scan of it.

We should avoid putting any payload in skb->head, so that SACK
shifting can be done if needed.

With this patch, we allow to pack ~0.5 MB per skb instead of
the 64KB initially cooked at tcp_sendmsg() time.

This gives a reduction of number of skbs in write queue by eight.
tcp_rack_detect_loss() likes this.

We still allow payload in skb->head for first skb put in the queue,
to not impact RPC workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp.c |   31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
-static inline int select_size(const struct sock *sk, bool sg)
+/* Do not bother using a page frag for very small frames.
+ * But use this heuristic only for the first skb in write queue.
+ *
+ * Having no payload in skb->head allows better SACK shifting
+ * in tcp_shift_skb_data(), reducing sack/rack overhead, because
+ * write queue has less skbs.
+ * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
+ * This also speeds up tso_fragment(), since it wont fallback
+ * to tcp_fragment().
+ */
+static int linear_payload_sz(bool first_skb)
+{
+	if (first_skb)
+		return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+	return 0;
+}
+
+static int select_size(const struct sock *sk, bool sg, bool first_skb)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	int tmp = tp->mss_cache;
 
 	if (sg) {
 		if (sk_can_gso(sk)) {
-			/* Small frames wont use a full page:
-			 * Payload will immediately follow tcp header.
-			 */
-			tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+			tmp = linear_payload_sz(first_skb);
 		} else {
 			int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
 
@@ -1161,6 +1175,8 @@ restart:
 		}
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
+			bool first_skb;
+
 new_segment:
 			/* Allocate new segment. If the interface is SG,
 			 * allocate skb fitting to single page.
@@ -1172,10 +1188,11 @@ new_segment:
 				process_backlog = false;
 				goto restart;
 			}
+			first_skb = skb_queue_empty(&sk->sk_write_queue);
 			skb = sk_stream_alloc_skb(sk,
-						  select_size(sk, sg),
+						  select_size(sk, sg, first_skb),
 						  sk->sk_allocation,
-						  skb_queue_empty(&sk->sk_write_queue));
+						  first_skb);
 			if (!skb)
 				goto wait_for_memory;
 

^ permalink raw reply related

* Re: [PATCH] mwifiex: fix null pointer deference when adapter is null
From: kbuild test robot @ 2016-09-15 16:29 UTC (permalink / raw)
  To: Colin King
  Cc: kbuild-all, Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20160915134238.5167-1-colin.king@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 9843 bytes --]

Hi Colin,

[auto build test WARNING on wireless-drivers-next/master]
[also build test WARNING on next-20160915]
[cannot apply to v4.8-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Colin-King/mwifiex-fix-null-pointer-deference-when-adapter-is-null/20160915-231625
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git master
config: x86_64-randconfig-x013-201637 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/wireless/marvell/mwifiex/main.c: In function 'mwifiex_shutdown_sw':
>> drivers/net/wireless/marvell/mwifiex/main.c:1433:1: warning: label 'exit_remove' defined but not used [-Wunused-label]
    exit_remove:
    ^~~~~~~~~~~
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_inc
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_dec
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_add_return
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/kasan.h:kasan_kmalloc
   Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_trace
   Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_order_trace
   Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_end_pointer
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_queue_empty
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_shared
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_headroom
   Cyclomatic Complexity 1 include/linux/netdevice.h:netdev_get_tx_queue
   Cyclomatic Complexity 1 include/linux/netdevice.h:netdev_priv
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_tx_stop_queue
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_tx_queue_stopped
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_carrier_ok
   Cyclomatic Complexity 1 include/linux/etherdevice.h:is_multicast_ether_addr
   Cyclomatic Complexity 1 include/linux/etherdevice.h:ether_addr_copy
   Cyclomatic Complexity 1 include/linux/etherdevice.h:ether_addr_equal
   Cyclomatic Complexity 1 include/linux/etherdevice.h:ether_addr_equal_unaligned
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/util.h:MWIFIEX_SKB_TXCB
   Cyclomatic Complexity 6 drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_get_priv
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_netdev_get_priv
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_is_skb_mgmt_frame
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_get_stats
   Cyclomatic Complexity 1 include/linux/workqueue.h:queue_work
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_queue_rx_work
   Cyclomatic Complexity 4 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_set_multicast_list
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_netdev_select_wmm_queue
   Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
   Cyclomatic Complexity 1 include/linux/timekeeping.h:ktime_get_real
   Cyclomatic Complexity 1 include/linux/skbuff.h:__net_timestamp
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_open
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/util.h:MWIFIEX_SKB_RXCB
   Cyclomatic Complexity 1 include/linux/netdevice.h:dev_kfree_skb_any
   Cyclomatic Complexity 6 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_unregister
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_free_adapter
   Cyclomatic Complexity 3 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_terminate_workqueue
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_init_module
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_cleanup_module
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_queue_main_work
   Cyclomatic Complexity 9 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_process_rx
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_rx_work_queue
   Cyclomatic Complexity 5 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_clone_skb_for_tx_status
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_init_priv_params
   Cyclomatic Complexity 1 drivers/net/wireless/marvell/mwifiex/main.c:is_command_pending
   Cyclomatic Complexity 81 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_main_process
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_main_work_queue
   Cyclomatic Complexity 3 drivers/net/wireless/marvell/mwifiex/main.c:_mwifiex_dbg
   Cyclomatic Complexity 6 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_bypass_tx_queue
   Cyclomatic Complexity 4 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_queue_tx_pkt
   Cyclomatic Complexity 4 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_multi_chan_resync
   Cyclomatic Complexity 19 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_drv_info_dump
   Cyclomatic Complexity 9 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_upload_device_dump
   Cyclomatic Complexity 3 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_tx_timeout
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_set_mac_address
   Cyclomatic Complexity 16 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_hard_start_xmit
   Cyclomatic Complexity 3 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_close
   Cyclomatic Complexity 34 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_fw_dpc
   Cyclomatic Complexity 6 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_init_hw_fw
   Cyclomatic Complexity 14 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_reinit_sw
   Cyclomatic Complexity 20 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_shutdown_sw
   Cyclomatic Complexity 2 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_do_flr
   Cyclomatic Complexity 7 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_register
   Cyclomatic Complexity 15 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_add_card
   Cyclomatic Complexity 20 drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_remove_card

vim +/exit_remove +1433 drivers/net/wireless/marvell/mwifiex/main.c

4c5dae59 Amitkumar Karwar 2016-07-26  1417  			    atomic_read(&adapter->rx_pending),
4c5dae59 Amitkumar Karwar 2016-07-26  1418  			    atomic_read(&adapter->tx_pending),
4c5dae59 Amitkumar Karwar 2016-07-26  1419  			    atomic_read(&adapter->cmd_pending));
4c5dae59 Amitkumar Karwar 2016-07-26  1420  	}
4c5dae59 Amitkumar Karwar 2016-07-26  1421  
4c5dae59 Amitkumar Karwar 2016-07-26  1422  	for (i = 0; i < adapter->priv_num; i++) {
4c5dae59 Amitkumar Karwar 2016-07-26  1423  		priv = adapter->priv[i];
4c5dae59 Amitkumar Karwar 2016-07-26  1424  		if (!priv)
4c5dae59 Amitkumar Karwar 2016-07-26  1425  			continue;
4c5dae59 Amitkumar Karwar 2016-07-26  1426  		rtnl_lock();
4c5dae59 Amitkumar Karwar 2016-07-26  1427  		if (priv->netdev &&
4c5dae59 Amitkumar Karwar 2016-07-26  1428  		    priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED)
4c5dae59 Amitkumar Karwar 2016-07-26  1429  			mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev);
4c5dae59 Amitkumar Karwar 2016-07-26  1430  		rtnl_unlock();
4c5dae59 Amitkumar Karwar 2016-07-26  1431  	}
4c5dae59 Amitkumar Karwar 2016-07-26  1432  
4c5dae59 Amitkumar Karwar 2016-07-26 @1433  exit_remove:
4c5dae59 Amitkumar Karwar 2016-07-26  1434  	up(sem);
4c5dae59 Amitkumar Karwar 2016-07-26  1435  exit_sem_err:
4c5dae59 Amitkumar Karwar 2016-07-26  1436  	mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
2d1cb5d4 Colin Ian King   2016-09-15  1437  exit_return:
4c5dae59 Amitkumar Karwar 2016-07-26  1438  	return 0;
4c5dae59 Amitkumar Karwar 2016-07-26  1439  }
4c5dae59 Amitkumar Karwar 2016-07-26  1440  
4c5dae59 Amitkumar Karwar 2016-07-26  1441  /* This function gets called during PCIe function level reset. Required

:::::: The code at line 1433 was first introduced by commit
:::::: 4c5dae59d2e9386c706a2f3c7c2746ae277bf568 mwifiex: add PCIe function level reset support

:::::: TO: Amitkumar Karwar <akarwar@marvell.com>
:::::: CC: Kalle Valo <kvalo@codeaurora.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 29215 bytes --]

^ permalink raw reply

* [PATCH] mwifiex: fix memory leak on regd when chan is zero
From: Colin King @ 2016-09-15 16:21 UTC (permalink / raw)
  To: Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo, linux-wireless,
	netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

When chan is zero mwifiex_create_custom_regdomain does not kfree
regd and we have a memory leak. Fix this by freeing regd before
the return.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
index 3344a26..15a91f3 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
@@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct mwifiex_private *priv,
 		enum nl80211_band band;
 
 		chan = *buf++;
-		if (!chan)
+		if (!chan) {
+			kfree(regd);
 			return NULL;
+		}
 		chflags = *buf++;
 		band = (chan <= 14) ? NL80211_BAND_2GHZ : NL80211_BAND_5GHZ;
 		freq = ieee80211_channel_to_frequency(chan, band);
-- 
2.9.3

^ permalink raw reply related

* RE: [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()
From: David Laight @ 2016-09-15 15:52 UTC (permalink / raw)
  To: 'Eric Dumazet', David Miller; +Cc: netdev
In-Reply-To: <1473952353.22679.25.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet
> Sent: 15 September 2016 16:13
> If a TCP socket gets a large write queue, an overflow can happen
> in a test in __tcp_retransmit_skb() preventing all retransmits.
...
>  	if (atomic_read(&sk->sk_wmem_alloc) >
> -	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
> +	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> +		  sk->sk_sndbuf))
>  		return -EAGAIN;

Might it also be better to split that test to (say):

	u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
	if (unlikely((wmem_alloc > sk->sk_sndbuf))
		return -EAGAIN;
	if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2)))
		return -EAGAIN;

It might even be worth splitting the second test as:

	if (unlikely(wmem_alloc > sk->sk_wmem_queued)
	    && wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2))
		return -EAGAIN;

	David


^ permalink raw reply

* [PATCH net] net: avoid sk_forward_alloc overflows
From: Eric Dumazet @ 2016-09-15 15:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.

Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)

sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)

Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc

All TCP stack can be stuck because TCP is under memory pressure.

A simple fix is to preemptively reclaim from sk_mem_uncharge().

This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index c797c57f4d9f6b2ef6cc23f1d63210cd41c8cff4..ebf75db08e062dfe7867cc80c7699f593be16349 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1339,6 +1339,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 	if (!sk_has_account(sk))
 		return;
 	sk->sk_forward_alloc += size;
+
+	/* Avoid a possible overflow.
+	 * TCP send queues can make this happen, if sk_mem_reclaim()
+	 * is not called and more than 2 GBytes are released at once.
+	 *
+	 * If we reach 2 MBytes, reclaim 1 MBytes right now, there is
+	 * no need to hold that much forward allocation anyway.
+	 */
+	if (unlikely(sk->sk_forward_alloc >= 1 << 21))
+		__sk_mem_reclaim(sk, 1 << 20);
 }
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)

^ permalink raw reply related

* Re: [net-next PATCH 00/11] iw_cxgb4,cxgbit: remove duplicate code
From: Varun Prakash @ 2016-09-15 15:16 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, Linux Netdev List, linux-rdma@vger.kernel.org,
	target-devel@vger.kernel.org, Nicholas A. Bellinger, Doug Ledford,
	SWise OGC, Indranil Choudhury
In-Reply-To: <CAJ3xEMi_w3nhqSpODsm=Cs42dg9w0Cu3Cc-0mZvauY7Coc+fwg@mail.gmail.com>

Hi Or,

On Wed, Sep 14, 2016 at 02:02:43PM +0530, Or Gerlitz wrote:
> On Tue, Sep 13, 2016 at 6:53 PM, Varun Prakash <varun@chelsio.com> wrote:
> > This patch series removes duplicate code from
> > iw_cxgb4 and cxgbit by adding common function definitions in libcxgb.
> 
> Is that bunch of misc functionalities or you can provide a more high
> level description what
> you are cleaning out. Also, what other areas are you planning to
> refactor following the review
> comments we had on the target driver?

This patch series removes duplicate function definitions
that are used in connection management.
I am looking into more improvements in connection management,
will post next series once it is ready. 

Thanks
Varun 

^ permalink raw reply

* [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()
From: Eric Dumazet @ 2016-09-15 15:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.

The flow then stalls and resets after timeouts.

Tested:

sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdaef7fd6e47..f53d0cca5fa4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2605,7 +2605,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
+	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
+		  sk->sk_sndbuf))
 		return -EAGAIN;
 
 	if (skb_still_in_host_queue(sk, skb))

^ permalink raw reply related

* [PATCH v2] xen-netback: fix error handling on netback_probe()
From: Filipe Manco @ 2016-09-15 15:10 UTC (permalink / raw)
  To: netdev; +Cc: wei.liu2, xen-devel, Filipe Manco

In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.

Signed-off-by: Filipe Manco <filipe.manco@neclab.eu>
---
 drivers/net/xen-netback/xenbus.c | 46 ++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 6a31f2610c23..daf4c7867102 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -271,6 +271,11 @@ static int netback_probe(struct xenbus_device *dev,
 	be->dev = dev;
 	dev_set_drvdata(&dev->dev, be);
 
+	be->state = XenbusStateInitialising;
+	err = xenbus_switch_state(dev, XenbusStateInitialising);
+	if (err)
+		goto fail;
+
 	sg = 1;
 
 	do {
@@ -383,11 +388,6 @@ static int netback_probe(struct xenbus_device *dev,
 
 	be->hotplug_script = script;
 
-	err = xenbus_switch_state(dev, XenbusStateInitWait);
-	if (err)
-		goto fail;
-
-	be->state = XenbusStateInitWait;
 
 	/* This kicks hotplug scripts, so do it immediately. */
 	err = backend_create_xenvif(be);
@@ -492,20 +492,20 @@ static inline void backend_switch_state(struct backend_info *be,
 
 /* Handle backend state transitions:
  *
- * The backend state starts in InitWait and the following transitions are
+ * The backend state starts in Initialising and the following transitions are
  * allowed.
  *
- * InitWait -> Connected
- *
- *    ^    \         |
- *    |     \        |
- *    |      \       |
- *    |       \      |
- *    |        \     |
- *    |         \    |
- *    |          V   V
+ * Initialising -> InitWait -> Connected
+ *          \
+ *           \        ^    \         |
+ *            \       |     \        |
+ *             \      |      \       |
+ *              \     |       \      |
+ *               \    |        \     |
+ *                \   |         \    |
+ *                 V  |          V   V
  *
- *  Closed  <-> Closing
+ *                  Closed  <-> Closing
  *
  * The state argument specifies the eventual state of the backend and the
  * function transitions to that state via the shortest path.
@@ -515,6 +515,20 @@ static void set_backend_state(struct backend_info *be,
 {
 	while (be->state != state) {
 		switch (be->state) {
+		case XenbusStateInitialising:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosing:
+				backend_switch_state(be, XenbusStateInitWait);
+				break;
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosed);
+				break;
+			default:
+				BUG();
+			}
+			break;
 		case XenbusStateClosed:
 			switch (state) {
 			case XenbusStateInitWait:
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox