Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next] selftests/bpf: make sure build-id is on
From: Alexei Starovoitov @ 2018-05-15  0:11 UTC (permalink / raw)
  To: David S . Miller; +Cc: daniel, songliubraving, netdev

--build-id may not be a default linker config.
Make sure it's used when linking urandom_read test program.
Otherwise test_stacktrace_build_id[_nmi] tests will be failling.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 438d4f93875b..133ebc68cbe4 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -19,7 +19,7 @@ all: $(TEST_CUSTOM_PROGS)
 $(TEST_CUSTOM_PROGS): urandom_read
 
 urandom_read: urandom_read.c
-	$(CC) -o $(TEST_CUSTOM_PROGS) -static $<
+	$(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
 
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next] erspan: set bso bit based on mirrored packet's len
From: William Tu @ 2018-05-14 23:54 UTC (permalink / raw)
  To: netdev

Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
handled.  BSO has 4 possible values:
  00 --> Good frame with no error, or unknown integrity
  11 --> Payload is a Bad Frame with CRC or Alignment Error
  01 --> Payload is a Short Frame
  10 --> Payload is an Oversized Frame

Based the short/oversized definitions in RFC1757, the patch sets
the bso bit based on the mirrored packet's size.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
 include/net/erspan.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/net/erspan.h b/include/net/erspan.h
index d044aa60cc76..5eb95f78ad45 100644
--- a/include/net/erspan.h
+++ b/include/net/erspan.h
@@ -219,6 +219,30 @@ static inline __be32 erspan_get_timestamp(void)
 	return htonl((u32)h_usecs);
 }
 
+/* ERSPAN BSO (Bad/Short/Oversized)
+ *   00b --> Good frame with no error, or unknown integrity
+ *   01b --> Payload is a Short Frame
+ *   10b --> Payload is an Oversized Frame
+ *   11b --> Payload is a Bad Frame with CRC or Alignment Error
+ */
+enum erspan_bso {
+	BSO_NOERROR,
+	BSO_SHORT,
+	BSO_OVERSIZED,
+	BSO_BAD,
+};
+
+static inline u8 erspan_detect_bso(struct sk_buff *skb)
+{
+	if (skb->len < ETH_ZLEN)
+		return BSO_SHORT;
+
+	if (skb->len > ETH_FRAME_LEN)
+		return BSO_OVERSIZED;
+
+	return BSO_NOERROR;
+}
+
 static inline void erspan_build_header_v2(struct sk_buff *skb,
 					  u32 id, u8 direction, u16 hwid,
 					  bool truncate, bool is_ipv4)
@@ -248,6 +272,7 @@ static inline void erspan_build_header_v2(struct sk_buff *skb,
 		vlan_tci = ntohs(qp->tci);
 	}
 
+	bso = erspan_detect_bso(skb);
 	skb_push(skb, sizeof(*ershdr) + ERSPAN_V2_MDSIZE);
 	ershdr = (struct erspan_base_hdr *)skb->data;
 	memset(ershdr, 0, sizeof(*ershdr) + ERSPAN_V2_MDSIZE);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 3/3] udp: only use paged allocation with scatter-gather
From: Eric Dumazet @ 2018-05-14 23:45 UTC (permalink / raw)
  To: Willem de Bruijn, Eric Dumazet
  Cc: Network Development, David Miller, Willem de Bruijn
In-Reply-To: <CAF=yD-KYer3RV6hB+-5LYt6VgL3LA6OpgbCBzdmnGrCvGF=ySQ@mail.gmail.com>



On 05/14/2018 04:30 PM, Willem de Bruijn wrote:

> I don't quite follow. The reported crash happens in the protocol layer,
> because of this check. With pagedlen we have not allocated
> sufficient space for the skb_put.
> 
>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>                         unsigned int off;
> 
>                         off = skb->len;
>                         if (getfrag(from, skb_put(skb, copy),
>                                         offset, copy, off, skb) < 0) {
>                                 __skb_trim(skb, off);
>                                 err = -EFAULT;
>                                 goto error;
>                         }
>                 } else {
>                         int i = skb_shinfo(skb)->nr_frags;
> 
> Are you referring to a separate potential issue in the gso layer?
> If a bonding device advertises SG, but a slave does not, then
> skb_segment on the slave should build linear segs? I have not
> tested that.

Given that the device attribute could change under us, we need to not
crash, even if initially we thought NETIF_F_SG was available.

Unless you want to hold RTNL in UDP xmit :)

Ideally, GSO should be always on, as we did for TCP.

Otherwise, I can guarantee syzkaller will hit again.

^ permalink raw reply

* Re: [PATCH 01/14] net: sched: use rcu for action cookie update
From: kbuild test robot @ 2018-05-14 23:39 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: kbuild-all, netdev, davem, jhs, xiyou.wangcong, jiri, pablo,
	kadlec, fw, ast, daniel, edumazet, vladbu, keescook, linux-kernel,
	netfilter-devel, coreteam, kliteyn
In-Reply-To: <1526308035-12484-2-git-send-email-vladbu@mellanox.com>

Hi Vlad,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[also build test WARNING on v4.17-rc5 next-20180514]
[cannot apply to net-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vlad-Buslov/Modify-action-API-for-implementing-lockless-actions/20180515-025420
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/sched/act_api.c:71:15: sparse: incorrect type in initializer (different address spaces) @@    expected struct tc_cookie [noderef] <asn:4>*__ret @@    got [noderef] <asn:4>*__ret @@
   net/sched/act_api.c:71:15:    expected struct tc_cookie [noderef] <asn:4>*__ret
   net/sched/act_api.c:71:15:    got struct tc_cookie *new_cookie
>> net/sched/act_api.c:71:13: sparse: incorrect type in assignment (different address spaces) @@    expected struct tc_cookie *old @@    got struct tc_cookie [noderef] <struct tc_cookie *old @@
   net/sched/act_api.c:71:13:    expected struct tc_cookie *old
   net/sched/act_api.c:71:13:    got struct tc_cookie [noderef] <asn:4>*[assigned] __ret
>> net/sched/act_api.c:132:48: sparse: dereference of noderef expression

vim +71 net/sched/act_api.c

    65	
    66	static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
    67					  struct tc_cookie *new_cookie)
    68	{
    69		struct tc_cookie *old;
    70	
  > 71		old = xchg(old_cookie, new_cookie);
    72		if (old)
    73			call_rcu(&old->rcu, tcf_free_cookie_rcu);
    74	}
    75	
    76	/* XXX: For standalone actions, we don't need a RCU grace period either, because
    77	 * actions are always connected to filters and filters are already destroyed in
    78	 * RCU callbacks, so after a RCU grace period actions are already disconnected
    79	 * from filters. Readers later can not find us.
    80	 */
    81	static void free_tcf(struct tc_action *p)
    82	{
    83		free_percpu(p->cpu_bstats);
    84		free_percpu(p->cpu_qstats);
    85	
    86		tcf_set_action_cookie(&p->act_cookie, NULL);
    87		if (p->goto_chain)
    88			tcf_action_goto_chain_fini(p);
    89	
    90		kfree(p);
    91	}
    92	
    93	static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
    94	{
    95		spin_lock_bh(&idrinfo->lock);
    96		idr_remove(&idrinfo->action_idr, p->tcfa_index);
    97		spin_unlock_bh(&idrinfo->lock);
    98		gen_kill_estimator(&p->tcfa_rate_est);
    99		free_tcf(p);
   100	}
   101	
   102	int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
   103	{
   104		int ret = 0;
   105	
   106		ASSERT_RTNL();
   107	
   108		if (p) {
   109			if (bind)
   110				p->tcfa_bindcnt--;
   111			else if (strict && p->tcfa_bindcnt > 0)
   112				return -EPERM;
   113	
   114			p->tcfa_refcnt--;
   115			if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
   116				if (p->ops->cleanup)
   117					p->ops->cleanup(p);
   118				tcf_idr_remove(p->idrinfo, p);
   119				ret = ACT_P_DELETED;
   120			}
   121		}
   122	
   123		return ret;
   124	}
   125	EXPORT_SYMBOL(__tcf_idr_release);
   126	
   127	static size_t tcf_action_shared_attrs_size(const struct tc_action *act)
   128	{
   129		u32 cookie_len = 0;
   130	
   131		if (act->act_cookie)
 > 132			cookie_len = nla_total_size(act->act_cookie->len);
   133	
   134		return  nla_total_size(0) /* action number nested */
   135			+ nla_total_size(IFNAMSIZ) /* TCA_ACT_KIND */
   136			+ cookie_len /* TCA_ACT_COOKIE */
   137			+ nla_total_size(0) /* TCA_ACT_STATS nested */
   138			/* TCA_STATS_BASIC */
   139			+ nla_total_size_64bit(sizeof(struct gnet_stats_basic))
   140			/* TCA_STATS_QUEUE */
   141			+ nla_total_size_64bit(sizeof(struct gnet_stats_queue))
   142			+ nla_total_size(0) /* TCA_OPTIONS nested */
   143			+ nla_total_size(sizeof(struct tcf_t)); /* TCA_GACT_TM */
   144	}
   145	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH net-next 3/3] udp: only use paged allocation with scatter-gather
From: Willem de Bruijn @ 2018-05-14 23:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller, Willem de Bruijn
In-Reply-To: <a629c4fa-3666-48c2-900f-9d04d9ecfcbc@gmail.com>

On Mon, May 14, 2018 at 7:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/14/2018 04:07 PM, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Paged allocation stores most payload in skb frags. This helps udp gso
>> by avoiding copying from the gso skb to segment skb in skb_segment.
>>
>> But without scatter-gather, data must be linear, so do not use paged
>> mode unless NETIF_F_SG.
>>
>> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
>> Reported-by: Sean Tranchetti <stranche@codeaurora.org>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  net/ipv4/ip_output.c  | 2 +-
>>  net/ipv6/ip6_output.c | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index b5e21eb198d8..b38731d8a44f 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
>>
>>       exthdrlen = !skb ? rt->dst.header_len : 0;
>>       mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
>> -     paged = !!cork->gso_size;
>> +     paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>>
>>       if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
>>           sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 7f4493080df6..35a940b9f208 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1262,7 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
>>               dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
>>       }
>>
>> -     paged = !!cork->gso_size;
>> +     paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>>       mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
>>       orig_mtu = mtu;
>>
>>
>
> As I said, this wont help for stacked device
>
> bonding might advertise NETIF_F_SG, but one slave might not.

I don't quite follow. The reported crash happens in the protocol layer,
because of this check. With pagedlen we have not allocated
sufficient space for the skb_put.

                if (!(rt->dst.dev->features&NETIF_F_SG)) {
                        unsigned int off;

                        off = skb->len;
                        if (getfrag(from, skb_put(skb, copy),
                                        offset, copy, off, skb) < 0) {
                                __skb_trim(skb, off);
                                err = -EFAULT;
                                goto error;
                        }
                } else {
                        int i = skb_shinfo(skb)->nr_frags;

Are you referring to a separate potential issue in the gso layer?
If a bonding device advertises SG, but a slave does not, then
skb_segment on the slave should build linear segs? I have not
tested that.

^ permalink raw reply

* Re: [PATCH v1 1/4] media: rc: introduce BPF_PROG_IR_DECODER
From: Randy Dunlap @ 2018-05-14 23:27 UTC (permalink / raw)
  To: Sean Young, linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller
In-Reply-To: <32a944171d5c48abf126259595b0088ce3122c91.1526331777.git.sean@mess.org>

On 05/14/2018 02:10 PM, Sean Young wrote:
> Add support for BPF_PROG_IR_DECODER. This type of BPF program can call

Kconfig file below uses IR_BPF_DECODER instead of the symbol name above.

and then patch 3 says a third choice:
The context provided to a BPF_PROG_RAWIR_DECODER is a struct ir_raw_event;

> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
> 
> Signed-off-by: Sean Young <sean@mess.org>
> ---
>  drivers/media/rc/Kconfig          |  8 +++
>  drivers/media/rc/Makefile         |  1 +
>  drivers/media/rc/ir-bpf-decoder.c | 93 +++++++++++++++++++++++++++++++
>  include/linux/bpf_types.h         |  3 +
>  include/uapi/linux/bpf.h          | 16 +++++-
>  5 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/rc/ir-bpf-decoder.c
> 
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..10ad6167d87c 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -120,6 +120,14 @@ config IR_IMON_DECODER
>  	   remote control and you would like to use it with a raw IR
>  	   receiver, or if you wish to use an encoder to transmit this IR.
>  
> +config IR_BPF_DECODER
> +	bool "Enable IR raw decoder using BPF"
> +	depends on BPF_SYSCALL
> +	depends on RC_CORE=y
> +	help
> +	   Enable this option to make it possible to load custom IR
> +	   decoders written in BPF.
> +
>  endif #RC_DECODERS
>  
>  menuconfig RC_DEVICES
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..12e1118430d0 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_IR_BPF_DECODER) += ir-bpf-decoder.o


-- 
~Randy

^ permalink raw reply

* Re: [net 1/1] net/mlx5: Fix build break when CONFIG_SMP=n
From: Randy Dunlap @ 2018-05-14 23:19 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller; +Cc: netdev, Guenter Roeck, Thomas Gleixner
In-Reply-To: <20180514223810.21197-1-saeedm@mellanox.com>

On 05/14/2018 03:38 PM, Saeed Mahameed wrote:
> Avoid using the kernel's irq_descriptor and return IRQ vector affinity
> directly from the driver.
> 
> This fixes the following build break when CONFIG_SMP=n
> 
> include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
> include/linux/mlx5/driver.h:1299:13: error:
>         ‘struct irq_desc’ has no member named ‘affinity_hint’
> 
> Fixes: 6082d9c9c94a ("net/mlx5: Fix mlx5_get_vector_affinity function")
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> CC: Randy Dunlap <rdunlap@infradead.org>
> CC: Guenter Roeck <linux@roeck-us.net>
> CC: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Israel Rukshin <israelr@mellanox.com>

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.

> ---
> 
> For -stable v4.14
> 
>  include/linux/mlx5/driver.h | 12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 2a156c5dfadd..d703774982ca 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1286,17 +1286,7 @@ enum {
>  static inline const struct cpumask *
>  mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
>  {
> -	struct irq_desc *desc;
> -	unsigned int irq;
> -	int eqn;
> -	int err;
> -
> -	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> -	if (err)
> -		return NULL;
> -
> -	desc = irq_to_desc(irq);
> -	return desc->affinity_hint;
> +	return dev->priv.irq_info[vector].mask;
>  }
>  
>  #endif /* MLX5_DRIVER_H */
> 


-- 
~Randy

^ permalink raw reply

* Re: [PATCH net-next 2/3] gso: limit udp gso to egress-only virtual devices
From: Willem de Bruijn @ 2018-05-14 23:12 UTC (permalink / raw)
  To: Network Development; +Cc: David Miller, Willem de Bruijn, Alexander Duyck
In-Reply-To: <20180514230747.118875-3-willemdebruijn.kernel@gmail.com>

On Mon, May 14, 2018 at 7:07 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Until the udp receive stack supports large packets (UDP GRO), GSO
> packets must not loop from the egress to the ingress path.
>
> Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
> devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
> may loop packets, such as veth and macvlan.
>
> Instead add it to specific devices that forward to another device's
> egress path: bonding and team.
>
> Fixes: 83aa025f535f ("udp: add gso support to virtual devices")
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index 9dbd390ace34..c6a9f0cafea2 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -1026,7 +1026,8 @@ static void __team_compute_features(struct team *team)
>         }
>
>         team->dev->vlan_features = vlan_features;
> -       team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
> +       team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
> +                                    NETIF_GSO_UDP_L4;

This has a typo. team.ko did not build automatically for me and caught it
with a full compile just too late.

Need to send a v2, sorry.

^ permalink raw reply

* Re: [PATCH net-next 3/3] udp: only use paged allocation with scatter-gather
From: Eric Dumazet @ 2018-05-14 23:12 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: davem, Willem de Bruijn
In-Reply-To: <20180514230747.118875-4-willemdebruijn.kernel@gmail.com>



On 05/14/2018 04:07 PM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Paged allocation stores most payload in skb frags. This helps udp gso
> by avoiding copying from the gso skb to segment skb in skb_segment.
> 
> But without scatter-gather, data must be linear, so do not use paged
> mode unless NETIF_F_SG.
> 
> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
> Reported-by: Sean Tranchetti <stranche@codeaurora.org>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/ipv4/ip_output.c  | 2 +-
>  net/ipv6/ip6_output.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b5e21eb198d8..b38731d8a44f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
>  
>  	exthdrlen = !skb ? rt->dst.header_len : 0;
>  	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
> -	paged = !!cork->gso_size;
> +	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>  
>  	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
>  	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 7f4493080df6..35a940b9f208 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1262,7 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
>  		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
>  	}
>  
> -	paged = !!cork->gso_size;
> +	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>  	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
>  	orig_mtu = mtu;
>  
> 

As I said, this wont help for stacked device

bonding might advertise NETIF_F_SG, but one slave might not.

^ permalink raw reply

* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Eric Dumazet @ 2018-05-14 23:10 UTC (permalink / raw)
  To: stranche, Willem de Bruijn
  Cc: Eric Dumazet, Willem de Bruijn, David Miller, Network Development,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <c86a2fbde50aea2b28e82c333f29d575@codeaurora.org>



On 05/14/2018 03:45 PM, stranche@codeaurora.org wrote:
> On 2018-05-11 17:16, Willem de Bruijn wrote:
> 
>>> Hmm, no, we absolutely need to fix GSO instead.
>>>
>>> Think of a bonding device (or any virtual devices), your patch wont avoid the crash.
> 
> Hi Eric. Can you clarify what you mean by "fix GSO?" Is that just having the GSO path work
> regardless of whether or not SG is enabled for the device?
>

Yes. GSO is a fallback, and must work all the time, not panic.

^ permalink raw reply

* Re: [PATCH v3 bpf-next 0/2] bpf: enable stackmap with build_id in nmi
From: Daniel Borkmann @ 2018-05-14 23:09 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: kernel-team, qinteng, tobin
In-Reply-To: <20180507175049.1541963-1-songliubraving@fb.com>

On 05/07/2018 07:50 PM, Song Liu wrote:
> Changes v2 -> v3:
>   Improve syntax based on suggestion by Tobin C. Harding.
> 
> Changes v1 -> v2:
>   1. Rename some variables to (hopefully) reduce confusion;
>   2. Check irq_work status with IRQ_WORK_BUSY (instead of work->sem);
>   3. In Kconfig, let BPF_SYSCALL select IRQ_WORK;
>   4. Add static to DEFINE_PER_CPU();
>    5. Remove pr_info() in stack_map_init().
> 
> Song Liu (2):
>   bpf: enable stackmap with build_id in nmi context
>   bpf: add selftest for stackmap with build_id in NMI context
> 
>  init/Kconfig                               |   1 +
>  kernel/bpf/stackmap.c                      |  59 +++++++++++--
>  tools/testing/selftests/bpf/test_progs.c   | 134 +++++++++++++++++++++++++++++
>  tools/testing/selftests/bpf/urandom_read.c |  10 ++-
>  4 files changed, 196 insertions(+), 8 deletions(-)

Applied to bpf-next, thanks Song!

^ permalink raw reply

* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Willem de Bruijn @ 2018-05-14 23:07 UTC (permalink / raw)
  To: Sean Tranchetti
  Cc: Eric Dumazet, Willem de Bruijn, David Miller, Network Development,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <c86a2fbde50aea2b28e82c333f29d575@codeaurora.org>

>> Paged skbuffs is an optimization for gso, but the feature should
>> continue to work even if gso skbs are linear, indeed (if at the cost
>> of copying during skb_segment).
>>
>> We need to make paged contingent on scatter-gather. Rough
>> patch below. That is for ipv4 only, the same will be needed for ipv6.
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index b5e21eb198d8..b38731d8a44f 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
>>
>>         exthdrlen = !skb ? rt->dst.header_len : 0;
>>         mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
>> -       paged = !!cork->gso_size;
>> +       paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>
>
> Hi Willem. That's definitely a much cleaner patch than ours since it allows
> the GSO to continue without failure.
> We tried it on both the IPv4 and IPv6 path and didn't see the crash in
> either case.

Thanks for testing. I have a small set of fixes to udp gso, including
this one. Let me send them right away.

^ permalink raw reply

* Re: [PATCH bpf-next v2] samples/bpf: xdp_monitor, accept short options
From: Daniel Borkmann @ 2018-05-14 23:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Prashant Bhole
  Cc: Alexei Starovoitov, David S . Miller, netdev
In-Reply-To: <20180514122044.598feec2@redhat.com>

On 05/14/2018 12:20 PM, Jesper Dangaard Brouer wrote:
> 
> On Mon, 14 May 2018 17:29:15 +0900 Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> wrote:
> 
>> Updated optstring parameter for getopt_long() to accept short options.
>> Also updated usage() function.
>>
>> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Applied to bpf-next, thanks everyone!

^ permalink raw reply

* [PATCH net-next 3/3] udp: only use paged allocation with scatter-gather
From: Willem de Bruijn @ 2018-05-14 23:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn
In-Reply-To: <20180514230747.118875-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Paged allocation stores most payload in skb frags. This helps udp gso
by avoiding copying from the gso skb to segment skb in skb_segment.

But without scatter-gather, data must be linear, so do not use paged
mode unless NETIF_F_SG.

Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
Reported-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_output.c  | 2 +-
 net/ipv6/ip6_output.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb198d8..b38731d8a44f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
 
 	exthdrlen = !skb ? rt->dst.header_len : 0;
 	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
-	paged = !!cork->gso_size;
+	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
 
 	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
 	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7f4493080df6..35a940b9f208 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1262,7 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
 		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
 	}
 
-	paged = !!cork->gso_size;
+	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
 	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
 	orig_mtu = mtu;
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH net-next 2/3] gso: limit udp gso to egress-only virtual devices
From: Willem de Bruijn @ 2018-05-14 23:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn, Alexander Duyck
In-Reply-To: <20180514230747.118875-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Until the udp receive stack supports large packets (UDP GRO), GSO
packets must not loop from the egress to the ingress path.

Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
may loop packets, such as veth and macvlan.

Instead add it to specific devices that forward to another device's
egress path: bonding and team.

Fixes: 83aa025f535f ("udp: add gso support to virtual devices")
CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/bonding/bond_main.c | 5 +++--
 drivers/net/team/team.c         | 5 +++--
 include/linux/netdev_features.h | 1 -
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4176e1d95f47..d7b58370ae77 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1107,7 +1107,8 @@ static void bond_compute_features(struct bonding *bond)
 
 done:
 	bond_dev->vlan_features = vlan_features;
-	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				    NETIF_F_GSO_UDP_L4;
 	bond_dev->gso_max_segs = gso_max_segs;
 	netif_set_gso_max_size(bond_dev, gso_max_size);
 
@@ -4263,7 +4264,7 @@ void bond_setup(struct net_device *bond_dev)
 				NETIF_F_HW_VLAN_CTAG_RX |
 				NETIF_F_HW_VLAN_CTAG_FILTER;
 
-	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
 	bond_dev->features |= bond_dev->hw_features;
 }
 
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 9dbd390ace34..c6a9f0cafea2 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1026,7 +1026,8 @@ static void __team_compute_features(struct team *team)
 	}
 
 	team->dev->vlan_features = vlan_features;
-	team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+	team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				     NETIF_GSO_UDP_L4;
 	team->dev->hard_header_len = max_hard_header_len;
 
 	team->dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
@@ -2117,7 +2118,7 @@ static void team_setup(struct net_device *dev)
 			   NETIF_F_HW_VLAN_CTAG_RX |
 			   NETIF_F_HW_VLAN_CTAG_FILTER;
 
-	dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+	dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
 	dev->features |= dev->hw_features;
 }
 
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index c87c3a3453c1..623bb8ced060 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -220,7 +220,6 @@ enum {
 				 NETIF_F_GSO_GRE_CSUM |			\
 				 NETIF_F_GSO_IPXIP4 |			\
 				 NETIF_F_GSO_IPXIP6 |			\
-				 NETIF_F_GSO_UDP_L4 |			\
 				 NETIF_F_GSO_UDP_TUNNEL |		\
 				 NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH net-next 1/3] udp: exclude gso from xfrm paths
From: Willem de Bruijn @ 2018-05-14 23:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn, Michal Kubecek
In-Reply-To: <20180514230747.118875-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

UDP GSO conflicts with transformations in the XFRM layer.
Return an error if GSO is attempted.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/udp.c | 3 ++-
 net/ipv6/udp.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ff4d4ba67735..d71f1f3e1155 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -788,7 +788,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 			return -EINVAL;
 		if (sk->sk_no_check_tx)
 			return -EINVAL;
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
+		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+		    dst_xfrm(skb_dst(skb)))
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2839c1bd1e58..426c9d2b418d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1053,7 +1053,8 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 			return -EINVAL;
 		if (udp_sk(sk)->no_check6_tx)
 			return -EINVAL;
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
+		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+		    dst_xfrm(skb_dst(skb)))
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH net-next 0/3] udp gso fixes
From: Willem de Bruijn @ 2018-05-14 23:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

A few small fixes:
- disallow segmentation with XFRM
- do not leak gso packets into the ingress path
- fix a panic if scatter-gather is disabled

Willem de Bruijn (3):
  udp: exclude gso from xfrm paths
  gso: limit udp gso to egress-only virtual devices
  udp: only use paged allocation with scatter-gather

 drivers/net/bonding/bond_main.c | 5 +++--
 drivers/net/team/team.c         | 5 +++--
 include/linux/netdev_features.h | 1 -
 net/ipv4/ip_output.c            | 2 +-
 net/ipv4/udp.c                  | 3 ++-
 net/ipv6/ip6_output.c           | 2 +-
 net/ipv6/udp.c                  | 3 ++-
 7 files changed, 12 insertions(+), 9 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply

* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: stranche @ 2018-05-14 22:45 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Eric Dumazet, Willem de Bruijn, David Miller, Network Development,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <CAF=yD-JH8ahoLNKOVjBScRXKP4UQqQpfq89C6xq0=nwd3jQtzw@mail.gmail.com>

On 2018-05-11 17:16, Willem de Bruijn wrote:

>> Hmm, no, we absolutely need to fix GSO instead.
>> 
>> Think of a bonding device (or any virtual devices), your patch wont 
>> avoid the crash.

Hi Eric. Can you clarify what you mean by "fix GSO?" Is that just having 
the GSO path work
regardless of whether or not SG is enabled for the device?

> 
> Thanks for reporting the issue.
> 
> Paged skbuffs is an optimization for gso, but the feature should
> continue to work even if gso skbs are linear, indeed (if at the cost
> of copying during skb_segment).
> 
> We need to make paged contingent on scatter-gather. Rough
> patch below. That is for ipv4 only, the same will be needed for ipv6.
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b5e21eb198d8..b38731d8a44f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
> 
>         exthdrlen = !skb ? rt->dst.header_len : 0;
>         mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
> -       paged = !!cork->gso_size;
> +       paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);

Hi Willem. That's definitely a much cleaner patch than ours since it 
allows the GSO to continue without failure.
We tried it on both the IPv4 and IPv6 path and didn't see the crash in 
either case.

-----
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* Re: [PATCH bpf-next v5 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-14 22:40 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <7839e5fff52b4b96e5eb0ae8a72a76f8a1e76a8e.1526143526.git.m.xhonneux@gmail.com>

On 05/12/2018 07:25 PM, Mathieu Xhonneux wrote:
[...]
> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
> +	   const void *, from, u32, len)
> +{
> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
> +	struct seg6_bpf_srh_state *srh_state =
> +		this_cpu_ptr(&seg6_bpf_srh_states);
> +	void *srh_tlvs, *srh_end, *ptr;
> +	struct ipv6_sr_hdr *srh;
> +	int srhoff = 0;
> +
> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
> +		return -EINVAL;
> +
> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> +	srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
> +	srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);

Do we need to check that this cannot go out of bounds wrt skb data?

> +	ptr = skb->data + offset;
> +	if (ptr >= srh_tlvs && ptr + len <= srh_end)
> +		srh_state->valid = 0;
> +	else if (ptr < (void *)&srh->flags ||
> +		 ptr + len > (void *)&srh->segments)
> +		return -EFAULT;
> +
> +	if (unlikely(bpf_try_make_writable(skb, offset + len)))
> +		return -EFAULT;
> +
> +	memcpy(ptr, from, len);

You have a use after free here. bpf_try_make_writable() is potentially changing
underlying skb->data (e.g. see pskb_expand_head()). Therefore memcpy()'ing into
cached ptr is invalid.

> +	return 0;
> +#else /* CONFIG_IPV6_SEG6_BPF */
> +	return -EOPNOTSUPP;
> +#endif
> +}
> +
> +static const struct bpf_func_proto bpf_lwt_seg6_store_bytes_proto = {
> +	.func		= bpf_lwt_seg6_store_bytes,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_PTR_TO_MEM,
> +	.arg4_type	= ARG_CONST_SIZE
> +};
> +
> +BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
> +	   u32, action, void *, param, u32, param_len)
> +{
> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
> +	struct seg6_bpf_srh_state *srh_state =
> +		this_cpu_ptr(&seg6_bpf_srh_states);
> +	struct ipv6_sr_hdr *srh;
> +	int srhoff = 0;
> +	int err;
> +
> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
> +		return -EINVAL;
> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> +
> +	if (!srh_state->valid) {
> +		if (unlikely((srh_state->hdrlen & 7) != 0))
> +			return -EBADMSG;
> +
> +		srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
> +		if (unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
> +			return -EBADMSG;
> +
> +		srh_state->valid = 1;
> +	}
> +
> +	switch (action) {
> +	case SEG6_LOCAL_ACTION_END_X:
> +		if (param_len != sizeof(struct in6_addr))
> +			return -EINVAL;
> +		return seg6_lookup_nexthop(skb, (struct in6_addr *)param, 0);
> +	case SEG6_LOCAL_ACTION_END_T:
> +		if (param_len != sizeof(int))
> +			return -EINVAL;
> +		return seg6_lookup_nexthop(skb, NULL, *(int *)param);
> +	case SEG6_LOCAL_ACTION_END_B6:
> +		err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6_INLINE,
> +					  param, param_len);
> +		if (!err)
> +			srh_state->hdrlen =
> +				((struct ipv6_sr_hdr *)param)->hdrlen << 3;
> +		return err;
> +	case SEG6_LOCAL_ACTION_END_B6_ENCAP:
> +		err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6,
> +					  param, param_len);
> +		if (!err)
> +			srh_state->hdrlen =
> +				((struct ipv6_sr_hdr *)param)->hdrlen << 3;
> +		return err;
> +	default:
> +		return -EINVAL;
> +	}
> +#else /* CONFIG_IPV6_SEG6_BPF */
> +	return -EOPNOTSUPP;
> +#endif
> +}
> +
> +static const struct bpf_func_proto bpf_lwt_seg6_action_proto = {
> +	.func		= bpf_lwt_seg6_action,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_PTR_TO_MEM,
> +	.arg4_type	= ARG_CONST_SIZE
> +};
> +
> +BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset,
> +	   s32, len)
> +{
> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
> +	struct seg6_bpf_srh_state *srh_state =
> +		this_cpu_ptr(&seg6_bpf_srh_states);
> +	void *srh_end, *srh_tlvs, *ptr;
> +	struct ipv6_sr_hdr *srh;
> +	struct ipv6hdr *hdr;
> +	int srhoff = 0;
> +	int ret;
> +
> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
> +		return -EINVAL;
> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> +
> +	srh_tlvs = (void *)((unsigned char *)srh + sizeof(*srh) +
> +			((srh->first_segment + 1) << 4));
> +	srh_end = (void *)((unsigned char *)srh + sizeof(*srh) +
> +			srh_state->hdrlen);
> +	ptr = skb->data + offset;
> +
> +	if (unlikely(ptr < srh_tlvs || ptr > srh_end))
> +		return -EFAULT;
> +	if (unlikely(len < 0 && (void *)((char *)ptr - len) > srh_end))
> +		return -EFAULT;
> +
> +	if (len > 0) {
> +		ret = skb_cow_head(skb, len);
> +		if (unlikely(ret < 0))
> +			return ret;
> +
> +		ret = bpf_skb_net_hdr_push(skb, offset, len);
> +	} else {
> +		ret = bpf_skb_net_hdr_pop(skb, offset, -1 * len);
> +	}
> +	if (unlikely(ret < 0))
> +		return ret;

And here as well. You changed underlying pointers via skb_cow_head(), but in
the error path you leave the cached pointers that now point to already freed
buffer. Thus, you'd now be able to access the new skb data out of bounds since
cb->data_end is still the old one due to missing bpf_compute_data_pointers(skb).
Please fix and audit your whole series carefully against these types of subtle
bugs.

> +	hdr = (struct ipv6hdr *)skb->data;
> +	hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
> +
> +	bpf_compute_data_pointers(skb);
> +	srh_state->hdrlen += len;
> +	srh_state->valid = 0;
> +	return 0;
> +#else /* CONFIG_IPV6_SEG6_BPF */
> +	return -EOPNOTSUPP;
> +#endif
> +}
> +
> +static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
> +	.func		= bpf_lwt_seg6_adjust_srh,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_ANYTHING,
> +};
> +
> +bool bpf_helper_changes_pkt_data(void *func)
> +{
> +	if (func == bpf_skb_vlan_push ||
> +	    func == bpf_skb_vlan_pop ||
> +	    func == bpf_skb_store_bytes ||
> +	    func == bpf_skb_change_proto ||
> +	    func == bpf_skb_change_head ||
> +	    func == bpf_skb_change_tail ||
> +	    func == bpf_skb_adjust_room ||
> +	    func == bpf_skb_pull_data ||
> +	    func == bpf_clone_redirect ||
> +	    func == bpf_l3_csum_replace ||
> +	    func == bpf_l4_csum_replace ||
> +	    func == bpf_xdp_adjust_head ||
> +	    func == bpf_xdp_adjust_meta ||
> +	    func == bpf_msg_pull_data ||
> +	    func == bpf_xdp_adjust_tail ||
> +	    func == bpf_lwt_push_encap ||
> +	    func == bpf_lwt_seg6_store_bytes ||
> +	    func == bpf_lwt_seg6_adjust_srh ||
> +	    func == bpf_lwt_seg6_action
> +	    )
> +		return true;
> +
> +	return false;
> +}
> +
>  static const struct bpf_func_proto *
>  bpf_base_func_proto(enum bpf_func_id func_id)
>  {
> @@ -4703,7 +4940,6 @@ static bool lwt_is_valid_access(int off, int size,
>  	return bpf_skb_is_valid_access(off, size, type, prog, info);
>  }
>  
> -
>  /* Attach type specific accesses */
>  static bool __sock_filter_check_attach_type(int off,
>  					    enum bpf_access_type access_type,
> diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
> index 6794ddf0547c..f0e8a762ae0c 100644
> --- a/net/ipv6/Kconfig
> +++ b/net/ipv6/Kconfig
> @@ -330,4 +330,9 @@ config IPV6_SEG6_HMAC
>  
>  	  If unsure, say N.
>  
> +config IPV6_SEG6_BPF
> +	def_bool y
> +	depends on IPV6_SEG6_LWTUNNEL
> +	depends on IPV6 = y
> +
>  endif # IPV6
> diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
> index e9b23fb924ad..ae68c1ef8fb0 100644
> --- a/net/ipv6/seg6_local.c
> +++ b/net/ipv6/seg6_local.c
> @@ -449,6 +449,8 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
>  	return err;
>  }
>  
> +DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
> +
>  static struct seg6_action_desc seg6_action_table[] = {
>  	{
>  		.action		= SEG6_LOCAL_ACTION_END,
> 

^ permalink raw reply

* [net 1/1] net/mlx5: Fix build break when CONFIG_SMP=n
From: Saeed Mahameed @ 2018-05-14 22:38 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Saeed Mahameed, Randy Dunlap, Guenter Roeck,
	Thomas Gleixner

Avoid using the kernel's irq_descriptor and return IRQ vector affinity
directly from the driver.

This fixes the following build break when CONFIG_SMP=n

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
        ‘struct irq_desc’ has no member named ‘affinity_hint’

Fixes: 6082d9c9c94a ("net/mlx5: Fix mlx5_get_vector_affinity function")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CC: Randy Dunlap <rdunlap@infradead.org>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Israel Rukshin <israelr@mellanox.com>
---

For -stable v4.14

 include/linux/mlx5/driver.h | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 2a156c5dfadd..d703774982ca 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1286,17 +1286,7 @@ enum {
 static inline const struct cpumask *
 mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
 {
-	struct irq_desc *desc;
-	unsigned int irq;
-	int eqn;
-	int err;
-
-	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
-	if (err)
-		return NULL;
-
-	desc = irq_to_desc(irq);
-	return desc->affinity_hint;
+	return dev->priv.irq_info[vector].mask;
 }
 
 #endif /* MLX5_DRIVER_H */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
From: Saeed Mahameed @ 2018-05-14 22:35 UTC (permalink / raw)
  To: David Miller
  Cc: christophe.jaillet, Saeed Mahameed, Matan Barak, Leon Romanovsky,
	Linux Netdev List, RDMA mailing list, linux-kernel,
	kernel-janitors
In-Reply-To: <20180514.145642.989041199343505570.davem@davemloft.net>

On Mon, May 14, 2018 at 11:56 AM, David Miller <davem@davemloft.net> wrote:
> From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> Date: Sat, 12 May 2018 19:09:25 +0200
>
>> 'out' is allocated with 'kvzalloc()'. 'kvfree()' must be used to free it.
>>
>> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>
> Saeed, I assume I will see this in one of your forthcoming pull
> requests.
>
> Thanks.

In case this is for net-next,  I will apply v3 to mlx5-next once
Christophe adds the "Fixes" tags according to Eric's request.
if it is for net (RC) then you can go ahead and apply v3 to net branch.

Thanks,
Saeed.

^ permalink raw reply

* [PATCH net-stable 24/24] hv_netvsc: set master device
From: Stephen Hemminger @ 2018-05-14 22:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180514223223.25433-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>

commit 97f3efb64323beb0690576e9d74e94998ad6e82a upstream

The hyper-v transparent bonding should have used master_dev_link.
The netvsc device should look like a master bond device not
like the upper side of a tunnel.

This makes the semantics the same so that userspace applications
looking at network devices see the correct master relationshipship.

Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/hyperv/netvsc_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index be4b15b63355..11b46c8d2d67 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1778,7 +1778,8 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
 		goto rx_handler_failed;
 	}
 
-	ret = netdev_upper_dev_link(vf_netdev, ndev);
+	ret = netdev_master_upper_dev_link(vf_netdev, ndev,
+					   NULL, NULL);
 	if (ret != 0) {
 		netdev_err(vf_netdev,
 			   "can not set master device %s (err = %d)\n",
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-stable 23/24] hv_netvsc: Fix net device attach on older Windows hosts
From: Stephen Hemminger @ 2018-05-14 22:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Mohammed Gamal
In-Reply-To: <20180514223223.25433-1-sthemmin@microsoft.com>

From: Mohammed Gamal <mgamal@redhat.com>

commit 55be9f25be1ca5bda75c39808fc77e42691bc07f upstream

On older windows hosts the net_device instance is returned to
the caller of rndis_filter_device_add() without having the presence
bit set first. This would cause any subsequent calls to network device
operations (e.g. MTU change, channel change) to fail after the device
is detached once, returning -ENODEV.

Instead of returning the device instabce, we take the exit path where
we call netif_device_attach()

Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/hyperv/rndis_filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 3bfa56560286..6dde92c1c113 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1276,7 +1276,7 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 		   rndis_device->link_state ? "down" : "up");
 
 	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5)
-		return net_device;
+		goto out;
 
 	rndis_filter_query_link_speed(rndis_device, net_device);
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-stable 22/24] hv_netvsc: Ensure correct teardown message sequence order
From: Stephen Hemminger @ 2018-05-14 22:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Mohammed Gamal
In-Reply-To: <20180514223223.25433-1-sthemmin@microsoft.com>

From: Mohammed Gamal <mgamal@redhat.com>

commit a56d99d714665591fed8527b90eef21530ea61e0 upstream

Prior to commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
the call sequence in netvsc_device_remove() was as follows (as
implemented in netvsc_destroy_buf()):
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Teardown receive buffer GPADL
3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
4- Teardown send buffer GPADL
5- Close vmbus

This didn't work for WS2016 hosts. Commit 0cf737808ae7
("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the
teardown sequence as follows:
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Close vmbus
4- Teardown receive buffer GPADL
5- Teardown send buffer GPADL

That worked well for WS2016 hosts, but it prevented guests on older hosts from
shutting down after changing network settings. Commit 0ef58b0a05c1
("hv_netvsc: change GPAD teardown order on older versions") ensured the
following message sequence for older hosts
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Teardown receive buffer GPADL
4- Teardown send buffer GPADL
5- Close vmbus

However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the
process becomes uninterruptible. On futher analysis it turns out that on tearing
down the receive buffer GPADL the kernel is waiting indefinitely
in vmbus_teardown_gpadl() for a completion to be signaled.

Here is a snippet of where this occurs:
int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
{
        struct vmbus_channel_gpadl_teardown *msg;
        struct vmbus_channel_msginfo *info;
        unsigned long flags;
        int ret;

        info = kmalloc(sizeof(*info) +
                       sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
        if (!info)
                return -ENOMEM;

        init_completion(&info->waitevent);
        info->waiting_channel = channel;
[....]
        ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_gpadl_teardown),
                             true);

        if (ret)
                goto post_msg_err;

        wait_for_completion(&info->waitevent);
[....]
}

The completion is signaled from vmbus_ongpadl_torndown(), which gets called when
the corresponding message is received from the host, which apparently never happens
in that case.
This patch works around the issue by restoring the first mentioned message sequence
for older hosts

Fixes: 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions")
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/hyperv/netvsc.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 25fcba506ac5..99be63eacaeb 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -571,8 +571,17 @@ void netvsc_device_remove(struct hv_device *device)
 		= rtnl_dereference(net_device_ctx->nvdev);
 	int i;
 
+	/*
+	 * Revoke receive buffer. If host is pre-Win2016 then tear down
+	 * receive buffer GPADL. Do the same for send buffer.
+	 */
 	netvsc_revoke_recv_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_recv_gpadl(device, net_device);
+
 	netvsc_revoke_send_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_send_gpadl(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -586,15 +595,13 @@ void netvsc_device_remove(struct hv_device *device)
 	 */
 	netdev_dbg(ndev, "net device safe to remove\n");
 
-	/* older versions require that buffer be revoked before close */
-	if (vmbus_proto_version < VERSION_WIN10) {
-		netvsc_teardown_recv_gpadl(device, net_device);
-		netvsc_teardown_send_gpadl(device, net_device);
-	}
-
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
+	/*
+	 * If host is Win2016 or higher then we do the GPADL tear down
+	 * here after VMBus is closed.
+	*/
 	if (vmbus_proto_version >= VERSION_WIN10) {
 		netvsc_teardown_recv_gpadl(device, net_device);
 		netvsc_teardown_send_gpadl(device, net_device);
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-stable 21/24] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
From: Stephen Hemminger @ 2018-05-14 22:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Mohammed Gamal
In-Reply-To: <20180514223223.25433-1-sthemmin@microsoft.com>

From: Mohammed Gamal <mgamal@redhat.com>

commit 7992894c305eaf504d005529637ff8283d0a849d upstream

Split each of the functions into two for each of send/recv buffers.
This will be needed in order to implement a fine-grained messaging
sequence to the host so that we accommodate the requirements of
different Windows versions

Fixes: 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions")
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/hyperv/netvsc.c | 46 +++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index aa9b7a912c31..25fcba506ac5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -105,11 +105,11 @@ static void free_netvsc_device_rcu(struct netvsc_device *nvdev)
 	call_rcu(&nvdev->rcu, free_netvsc_device);
 }
 
-static void netvsc_revoke_buf(struct hv_device *device,
-			      struct netvsc_device *net_device)
+static void netvsc_revoke_recv_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
 {
-	struct nvsp_message *revoke_packet;
 	struct net_device *ndev = hv_get_drvdata(device);
+	struct nvsp_message *revoke_packet;
 	int ret;
 
 	/*
@@ -151,6 +151,14 @@ static void netvsc_revoke_buf(struct hv_device *device,
 		}
 		net_device->recv_section_cnt = 0;
 	}
+}
+
+static void netvsc_revoke_send_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
+{
+	struct net_device *ndev = hv_get_drvdata(device);
+	struct nvsp_message *revoke_packet;
+	int ret;
 
 	/* Deal with the send buffer we may have setup.
 	 * If we got a  send section size, it means we received a
@@ -194,8 +202,8 @@ static void netvsc_revoke_buf(struct hv_device *device,
 	}
 }
 
-static void netvsc_teardown_gpadl(struct hv_device *device,
-				  struct netvsc_device *net_device)
+static void netvsc_teardown_recv_gpadl(struct hv_device *device,
+				       struct netvsc_device *net_device)
 {
 	struct net_device *ndev = hv_get_drvdata(device);
 	int ret;
@@ -214,6 +222,13 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
 		}
 		net_device->recv_buf_gpadl_handle = 0;
 	}
+}
+
+static void netvsc_teardown_send_gpadl(struct hv_device *device,
+				       struct netvsc_device *net_device)
+{
+	struct net_device *ndev = hv_get_drvdata(device);
+	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
@@ -423,8 +438,10 @@ static int netvsc_init_buf(struct hv_device *device,
 	goto exit;
 
 cleanup:
-	netvsc_revoke_buf(device, net_device);
-	netvsc_teardown_gpadl(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
+	netvsc_teardown_recv_gpadl(device, net_device);
+	netvsc_teardown_send_gpadl(device, net_device);
 
 exit:
 	return ret;
@@ -554,7 +571,8 @@ void netvsc_device_remove(struct hv_device *device)
 		= rtnl_dereference(net_device_ctx->nvdev);
 	int i;
 
-	netvsc_revoke_buf(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -569,14 +587,18 @@ void netvsc_device_remove(struct hv_device *device)
 	netdev_dbg(ndev, "net device safe to remove\n");
 
 	/* older versions require that buffer be revoked before close */
-	if (vmbus_proto_version < VERSION_WIN10)
-		netvsc_teardown_gpadl(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10) {
+		netvsc_teardown_recv_gpadl(device, net_device);
+		netvsc_teardown_send_gpadl(device, net_device);
+	}
 
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	if (vmbus_proto_version >= VERSION_WIN10)
-		netvsc_teardown_gpadl(device, net_device);
+	if (vmbus_proto_version >= VERSION_WIN10) {
+		netvsc_teardown_recv_gpadl(device, net_device);
+		netvsc_teardown_send_gpadl(device, net_device);
+	}
 
 	/* Release all resources */
 	free_netvsc_device_rcu(net_device);
-- 
2.17.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox