Netdev List
 help / color / mirror / Atom feed
* Re: [net-next PATCH 2/2] bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSER
From: David Ahern @ 2017-08-16 22:06 UTC (permalink / raw)
  To: John Fastabend, daniel, davem, eric.dumazet; +Cc: netdev
In-Reply-To: <20170816220232.25438.47447.stgit@john-Precision-Tower-5810>

On 8/16/17 4:02 PM, John Fastabend wrote:
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index aa24287..897daa0 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -3,7 +3,10 @@ obj-y := core.o
>  obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
>  ifeq ($(CONFIG_NET),y)
> -obj-$(CONFIG_BPF_SYSCALL) += devmap.o sockmap.o
> +obj-$(CONFIG_BPF_SYSCALL) += devmap.o
> +ifeq ($(CONFIG_STREAM_PARSER),y)
> +obj-$(CONFIG_BPF_SYSCALL) += sockmap.o
> +endif
>  endif

sockmap requires KCM? I thought it just needed STREAM_PARSER. Can't
STREAM_PARSER be used outside of KCM? If so why not make STREAM_PARSER
enabled if sockmap is wanted vs requiring KCM to be compiled in to get
stream parser to get sockmap?

^ permalink raw reply

* [net-next PATCH 2/2] bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSER
From: John Fastabend @ 2017-08-16 22:02 UTC (permalink / raw)
  To: daniel, davem, eric.dumazet, dsahern; +Cc: netdev, john.fastabend
In-Reply-To: <20170816220049.25438.62373.stgit@john-Precision-Tower-5810>

Resolve issues with !CONFIG_BPF_SYSCALL and !STREAM_PARSER

net/core/filter.c: In function ‘do_sk_redirect_map’:
net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
   ^
net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h       |   10 +++++++++-
 include/linux/bpf_types.h |    2 ++
 kernel/bpf/Makefile       |    5 ++++-
 kernel/bpf/core.c         |    1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a4145e9..1cc6c5f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -313,7 +313,6 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
 
 /* Map specifics */
 struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
-struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
 
@@ -377,6 +376,15 @@ static inline void __dev_map_flush(struct bpf_map *map)
 }
 #endif /* CONFIG_BPF_SYSCALL */
 
+#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL)
+struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
+#else
+static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
+{
+	return NULL;
+}
+#endif
+
 /* verifier prototypes for helper functions called from eBPF programs */
 extern const struct bpf_func_proto bpf_map_lookup_elem_proto;
 extern const struct bpf_func_proto bpf_map_update_elem_proto;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa80507..6f1a567 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -38,5 +38,7 @@
 BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
 #ifdef CONFIG_NET
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
+#ifdef CONFIG_STREAM_PARSER
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
 #endif
+#endif
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index aa24287..897daa0 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -3,7 +3,10 @@ obj-y := core.o
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
 ifeq ($(CONFIG_NET),y)
-obj-$(CONFIG_BPF_SYSCALL) += devmap.o sockmap.o
+obj-$(CONFIG_BPF_SYSCALL) += devmap.o
+ifeq ($(CONFIG_STREAM_PARSER),y)
+obj-$(CONFIG_BPF_SYSCALL) += sockmap.o
+endif
 endif
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c69e7f5..917cc04 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1438,6 +1438,7 @@ void bpf_user_rnd_init_once(void)
 const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+const struct bpf_func_proto bpf_sock_map_update_proto __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {

^ permalink raw reply related

* [net-next PATCH 1/2] bpf: sockmap state change warning fix
From: John Fastabend @ 2017-08-16 22:02 UTC (permalink / raw)
  To: daniel, davem, eric.dumazet, dsahern; +Cc: netdev, john.fastabend
In-Reply-To: <20170816220049.25438.62373.stgit@john-Precision-Tower-5810>

psock will uninitialized in default case we need to do the same psock lookup
and check as in other branch. Fixes compile warning below.

kernel/bpf/sockmap.c: In function ‘smap_state_change’:
kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  struct smap_psock *psock;

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 792f0ad..f7e5e6c 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -188,6 +188,9 @@ static void smap_state_change(struct sock *sk)
 			smap_release_sock(sk);
 		break;
 	default:
+		psock = smap_psock_sk(sk);
+		if (unlikely(!psock))
+			break;
 		smap_report_sk_error(psock, EPIPE);
 		break;
 	}

^ permalink raw reply related

* [net-next PATCH 0/2] bpf: sockmap build fixes
From: John Fastabend @ 2017-08-16 22:01 UTC (permalink / raw)
  To: daniel, davem, eric.dumazet, dsahern; +Cc: netdev, john.fastabend

Two build fixes for sockmap, this should resolve the build errors
and warnings that were reported. Thanks everyone.

---

John Fastabend (2):
      bpf: sockmap state change warning fix
      bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSER


 include/linux/bpf.h       |   10 +++++++++-
 include/linux/bpf_types.h |    2 ++
 kernel/bpf/Makefile       |    5 ++++-
 kernel/bpf/core.c         |    1 +
 kernel/bpf/sockmap.c      |    3 +++
 5 files changed, 19 insertions(+), 2 deletions(-)

^ permalink raw reply

* [PATCH net] tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
From: Neal Cardwell @ 2017-08-16 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng, Eric Dumazet

In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 53de1424c13c..bab7f0493098 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3009,8 +3009,7 @@ void tcp_rearm_rto(struct sock *sk)
 			/* delta_us may not be positive if the socket is locked
 			 * when the retrans timer fires and is rescheduled.
 			 */
-			if (delta_us > 0)
-				rto = usecs_to_jiffies(delta_us);
+			rto = usecs_to_jiffies(max_t(int, delta_us, 1));
 		}
 		inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto,
 					  TCP_RTO_MAX);
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* Re: [bug report] net: dsa: ksz: fix skb freeing
From: Andrew Lunn @ 2017-08-16 21:52 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: vivien.didelot, netdev
In-Reply-To: <20170816212719.uyszb2krtgir3xn2@mwanda>

On Thu, Aug 17, 2017 at 12:27:19AM +0300, Dan Carpenter wrote:
> Hello Vivien Didelot,
> 
> The patch e71cb9e00922: "net: dsa: ksz: fix skb freeing" from Aug 9,
> 2017, leads to the following static checker warning:
> 
> 	net/dsa/tag_ksz.c:64 ksz_xmit()
> 	warn: 'nskb' was already freed.

> net/dsa/tag_ksz.c
>     35  static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
>     36  {
>     37          struct dsa_slave_priv *p = netdev_priv(dev);
>     38          struct sk_buff *nskb;
>     39          int padlen;
>     40          u8 *tag;
>     41  
>     42          padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
>     43  
>     44          if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
>     45                  if (skb_put_padto(skb, skb->len + padlen))
>     46                          return NULL;
>     47  
>     48                  nskb = skb;
>     49          } else {
>     50                  nskb = alloc_skb(NET_IP_ALIGN + skb->len +
>     51                                   padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
>     52                  if (!nskb)
>     53                          return NULL;
>     54                  skb_reserve(nskb, NET_IP_ALIGN);
>     55  
>     56                  skb_reset_mac_header(nskb);
>     57                  skb_set_network_header(nskb,
>     58                                         skb_network_header(skb) - skb->head);
>     59                  skb_set_transport_header(nskb,
>     60                                           skb_transport_header(skb) - skb->head);
>     61                  skb_copy_and_csum_dev(skb, skb_put(nskb, skb->len));
>     62  
>     63                  if (skb_put_padto(nskb, nskb->len + padlen)) {
>     64                          kfree_skb(nskb);
>                                 ^^^^^^^^^^^^^^
> This patch deliberately adds a kfree_skb() here, but it looks like a
> double free, and my static checker complains.  My guess is you had
> intended to free "skb" instead of "nskb"?  I'm not positive.

I actually think it is a miss understanding about skb_put_padto()
actually does. If it returns an error code, it also frees the skb it
has been asked to modify. So free'ing nsbk is wrong. So line 64 simply
needs to be removed.

      Andrew

^ permalink raw reply

* Re: [patch net-next 1/3] idr: Use unsigned long instead of int
From: Alexey Dobriyan @ 2017-08-16 21:44 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, linux-kernel
In-Reply-To: <1502849538-14284-2-git-send-email-chrism@mellanox.com>

	[CC trimmed]

On Tue, Aug 15, 2017 at 10:12:16PM -0400, Chris Mi wrote:
> IDR uses internally radix tree which uses unsigned long. It doesn't
> makes sense to have index as signed value.

It doesn't. But it makes sense to use "unsigned int" because it
generates smaller code on x86_64.

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: John Fastabend @ 2017-08-16 21:37 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, David Miller; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <8878cb28-a12e-cbe3-92c8-274d6c12d176@gmail.com>

On 08/16/2017 02:35 PM, David Ahern wrote:
> On 8/16/17 1:34 PM, John Fastabend wrote:
>>> I also have a build error.
>>>
>>> $ git grep -n __sock_map_lookup_elem
>>> include/linux/bpf.h:316:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
>>> kernel/bpf/sockmap.c:558:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
>>> net/core/filter.c:1881:         sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>>
>>>
>>>
>>> $ make ...
>>> ...
>>> net/core/filter.c: In function ‘do_sk_redirect_map’:
>>> net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
>>>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>>    ^
>>> net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
>>>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>>       ^
>>> cc1: some warnings being treated as errors
>>> make[2]: *** [net/core/filter.o] Error 1
>>> make[2]: *** Waiting for unfinished jobs....
>>>
>>>
>>
>> Thanks Eric, I'll have a fix shortly.
>>
> 
> And I have a different build error:
> 
> $ make O=kbuild/rcu-lock-debug/ -j24 -s
> scripts/Makefile.kasan:25: CONFIG_KASAN: compiler does not support all
> options. Trying minimal configuration
> scripts/Makefile.kasan:25: CONFIG_KASAN: compiler does not support all
> options. Trying minimal configuration
> kernel/bpf/sockmap.o: In function `smap_stop_sock':
> /home/dsa/kernel-2.git/kernel/bpf/sockmap.c:297: undefined reference to
> `strp_stop'
> kernel/bpf/sockmap.o: In function `smap_gc_work':
> /home/dsa/kernel-2.git/kernel/bpf/sockmap.c:419: undefined reference to
> `strp_done'
> kernel/bpf/sockmap.o: In function `smap_data_ready':
> /home/dsa/kernel-2.git/kernel/bpf/sockmap.c:216: undefined reference to
> `strp_data_ready'
> kernel/bpf/sockmap.o: In function `smap_init_sock':
> /home/dsa/kernel-2.git/kernel/bpf/sockmap.c:373: undefined reference to
> `strp_init'
> /home/dsa/kernel-2.git/Makefile:1000: recipe for target 'vmlinux' failed
> make[1]: *** [vmlinux] Error 1
> Makefile:145: recipe for target 'sub-make' failed
> make: *** [sub-make] Error 2
> 
> I'm guessing a missing CONFIG tie in.
> 

Yep those two are related we have the fix now just running a couple extra build
tests now to be sure. For the future I think we will tie into kbuild bot earlier.

Thanks,
John

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Ahern @ 2017-08-16 21:35 UTC (permalink / raw)
  To: John Fastabend, Eric Dumazet, David Miller
  Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <59949E54.8050202@gmail.com>

On 8/16/17 1:34 PM, John Fastabend wrote:
>> I also have a build error.
>>
>> $ git grep -n __sock_map_lookup_elem
>> include/linux/bpf.h:316:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
>> kernel/bpf/sockmap.c:558:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
>> net/core/filter.c:1881:         sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>
>>
>>
>> $ make ...
>> ...
>> net/core/filter.c: In function ‘do_sk_redirect_map’:
>> net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
>>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>    ^
>> net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
>>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>>       ^
>> cc1: some warnings being treated as errors
>> make[2]: *** [net/core/filter.o] Error 1
>> make[2]: *** Waiting for unfinished jobs....
>>
>>
> 
> Thanks Eric, I'll have a fix shortly.
> 

And I have a different build error:

$ make O=kbuild/rcu-lock-debug/ -j24 -s
scripts/Makefile.kasan:25: CONFIG_KASAN: compiler does not support all
options. Trying minimal configuration
scripts/Makefile.kasan:25: CONFIG_KASAN: compiler does not support all
options. Trying minimal configuration
kernel/bpf/sockmap.o: In function `smap_stop_sock':
/home/dsa/kernel-2.git/kernel/bpf/sockmap.c:297: undefined reference to
`strp_stop'
kernel/bpf/sockmap.o: In function `smap_gc_work':
/home/dsa/kernel-2.git/kernel/bpf/sockmap.c:419: undefined reference to
`strp_done'
kernel/bpf/sockmap.o: In function `smap_data_ready':
/home/dsa/kernel-2.git/kernel/bpf/sockmap.c:216: undefined reference to
`strp_data_ready'
kernel/bpf/sockmap.o: In function `smap_init_sock':
/home/dsa/kernel-2.git/kernel/bpf/sockmap.c:373: undefined reference to
`strp_init'
/home/dsa/kernel-2.git/Makefile:1000: recipe for target 'vmlinux' failed
make[1]: *** [vmlinux] Error 1
Makefile:145: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2

I'm guessing a missing CONFIG tie in.

^ permalink raw reply

* [PATCH net-next] macvlan: add offload features for encapsulation
From: Dimitris Michailidis @ 2017-08-16 21:34 UTC (permalink / raw)
  To: davem; +Cc: netdev

Currently macvlan devices do not set their hw_enc_features making
encapsulated Tx packets resort to SW fallbacks. Add encapsulation GSO
offloads to ->features as is done for the other GSOs and set
->hw_enc_features.

Signed-off-by: Dimitris Michailidis <dmichail@google.com>
---
 drivers/net/macvlan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index ca35c6ba7947..d2aea961e0f4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -835,7 +835,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
 
 #define ALWAYS_ON_OFFLOADS \
 	(NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \
-	 NETIF_F_GSO_ROBUST)
+	 NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL)
 
 #define ALWAYS_ON_FEATURES (ALWAYS_ON_OFFLOADS | NETIF_F_LLTX)
 
@@ -874,6 +874,7 @@ static int macvlan_init(struct net_device *dev)
 	dev->hw_features	|= NETIF_F_LRO;
 	dev->vlan_features	= lowerdev->vlan_features & MACVLAN_FEATURES;
 	dev->vlan_features	|= ALWAYS_ON_OFFLOADS;
+	dev->hw_enc_features    |= dev->features;
 	dev->gso_max_size	= lowerdev->gso_max_size;
 	dev->gso_max_segs	= lowerdev->gso_max_segs;
 	dev->hard_header_len	= lowerdev->hard_header_len;
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers
From: David Miller @ 2017-08-16 21:31 UTC (permalink / raw)
  To: dcbw
  Cc: james, futur.andy, kvalo, arend.vanspriel, maheshb, andy, netdev,
	linux-wireless, greearb
In-Reply-To: <1502918561.30484.1.camel@redhat.com>

From: Dan Williams <dcbw@redhat.com>
Date: Wed, 16 Aug 2017 16:22:41 -0500

> My biggest suggestion is that perhaps bonding should grow hysteresis
> for link speeds. Since WiFi can change speed every packet, you probably
> don't want the bond characteristics changing every couple seconds just
> in case your WiFi link is jumping around.  Ethernet won't bounce around
> that much, so the hysteresis would have no effect there.  Or, if people
> are concerned about response time to speed changes on ethernet (where
> you probably do want an instant switch-over) some new flag to indicate
> that certain devices don't have stable speeds over time.

Or just report the average of the range the wireless link can hit, and
be done with it.

I think you guys are overcomplicating things.

^ permalink raw reply

* Re: [patch net-next] net: sched: cls_flower: fix ndo_setup_tc type for stats call
From: David Miller @ 2017-08-16 21:30 UTC (permalink / raw)
  To: jiri; +Cc: netdev, jhs, xiyou.wangcong, mlxsw
In-Reply-To: <20170816151518.4554-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 16 Aug 2017 17:15:18 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> I made a stupid mistake using TC_CLSFLOWER_STATS instead of
> TC_SETUP_CLSFLOWER. Funny thing is that both are defined as "2" so it
> actually did not cause any harm. Anyway, fixing it now.
> 
> Fixes: 2572ac53c46f ("net: sched: make type an argument for ndo_setup_tc")
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied, thanks Jiri.

^ permalink raw reply

* [bug report] net: dsa: ksz: fix skb freeing
From: Dan Carpenter @ 2017-08-16 21:27 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev

Hello Vivien Didelot,

The patch e71cb9e00922: "net: dsa: ksz: fix skb freeing" from Aug 9,
2017, leads to the following static checker warning:

	net/dsa/tag_ksz.c:64 ksz_xmit()
	warn: 'nskb' was already freed.

net/dsa/tag_ksz.c
    35  static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
    36  {
    37          struct dsa_slave_priv *p = netdev_priv(dev);
    38          struct sk_buff *nskb;
    39          int padlen;
    40          u8 *tag;
    41  
    42          padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
    43  
    44          if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
    45                  if (skb_put_padto(skb, skb->len + padlen))
    46                          return NULL;
    47  
    48                  nskb = skb;
    49          } else {
    50                  nskb = alloc_skb(NET_IP_ALIGN + skb->len +
    51                                   padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
    52                  if (!nskb)
    53                          return NULL;
    54                  skb_reserve(nskb, NET_IP_ALIGN);
    55  
    56                  skb_reset_mac_header(nskb);
    57                  skb_set_network_header(nskb,
    58                                         skb_network_header(skb) - skb->head);
    59                  skb_set_transport_header(nskb,
    60                                           skb_transport_header(skb) - skb->head);
    61                  skb_copy_and_csum_dev(skb, skb_put(nskb, skb->len));
    62  
    63                  if (skb_put_padto(nskb, nskb->len + padlen)) {
    64                          kfree_skb(nskb);
                                ^^^^^^^^^^^^^^
This patch deliberately adds a kfree_skb() here, but it looks like a
double free, and my static checker complains.  My guess is you had
intended to free "skb" instead of "nskb"?  I'm not positive.

    65                          return NULL;
    66                  }
    67  
    68                  kfree_skb(skb);
    69          }
    70  
    71          tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
    72          tag[0] = 0;
    73          tag[1] = 1 << p->dp->index; /* destination port */
    74  
    75          return nskb;
    76  }

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH] tun: make tun_build_skb() thread safe
From: David Miller @ 2017-08-16 21:27 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, linux-kernel, mst, eric.dumazet
In-Reply-To: <1502892873-10770-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 16 Aug 2017 22:14:33 +0800

> From: Eric Dumazet <eric.dumazet@gmail.com>
> 
> tun_build_skb() is not thread safe since it uses per queue page frag,
> this will break things when multiple threads are sending through same
> queue. Switch to use per-thread generator (no lock involved).
> 
> Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
> Tested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] dccp: defer ccid_hc_tx_delete() at dismantle time
From: David Miller @ 2017-08-16 21:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, gerrit
In-Reply-To: <1502892195.4936.111.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Aug 2017 07:03:15 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> syszkaller team reported another problem in DCCP [1]
> 
> Problem here is that the structure holding RTO timer
> (ccid2_hc_tx_rto_expire() handler) is freed too soon.
> 
> We can not use del_timer_sync() to cancel the timer
> since this timer wants to grab socket lock (that would risk a dead lock)
> 
> Solution is to defer the freeing of memory when all references to
> the socket were released. Socket timers do own a reference, so this
> should fix the issue.
> 
> [1]
> ==================================================================
> BUG: KASAN: use-after-free in ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH][V2] net/mlx4: fix spelling mistake: "availible" -> "available"
From: David Miller @ 2017-08-16 21:24 UTC (permalink / raw)
  To: colin.king; +Cc: tariqt, netdev, linux-rdma, linux-kernel
In-Reply-To: <20170816134250.6689-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Wed, 16 Aug 2017 14:42:50 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistakes in the mlx4 driver.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

When respinning, unless you make significant changes, always integrate
the ACK's and SIGNOFF's your change has accumulated.

^ permalink raw reply

* Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers
From: Dan Williams @ 2017-08-16 21:22 UTC (permalink / raw)
  To: james, Andreas Born
  Cc: Kalle Valo, Arend van Spriel, Mahesh Bandewar, Andy Gospodarek,
	David Miller, netdev, linux-wireless, Ben Greear
In-Reply-To: <2b6fd91a-f7cd-c2bf-6394-060d9b1f5d23@nurealm.net>

On Wed, 2017-08-16 at 14:44 -0600, James Feeney wrote:
> On 08/13/2017 11:42 AM, Andreas Born wrote:
> > On a side note I would recommend some of my own reading to you
> > about
> > patch submission in general [1] and on netdev specifically [2].
> 
> Mmm - [2] and [3], I suspect.  Thanks Andreas.  I'll be studying
> those.  Yeah,
> I'm still learning what is needed and what works.  Sometimes, just a
> note to the
> author is more than enough to resolve a problem.  Sometimes,
> discussion is
> needed.  And other times... well, certain people are infamous... but
> no problem
> here, thankfully.
> 
> > And, just wondering, who's going to eventually close that
> > bugreport?
> > https://bugzilla.kernel.org/show_bug.cgi?id=196547
> 
> I can close it when the patches actually land in the kernel.  I'm
> glad to see
> that there was an "Ack" from Mahesh.
> 
> On the topic of wireless support for kernel ethtool reporting, I'm
> wondering, is
> there is any consensus about that?
> 
> And, for instance, is there any *other* way for the bonding module to
> make
> "better link" decisions for wireless links?  As "wireless" becomes
> more capable,
> possibly more diverse, and probably more essential for computing,
> this is likely
> to become a bigger issue.
> 
> Ben Greear mentioned that he had added some support to the ath10k
> driver.  Dan
> Williams mentioned the possibility of updating the mac80211 stack for
> support.
> And Arend van Spriel suggested that the issue might best be left for
> the next
> Netconf.
> 
> Immediate problem solved, but maybe a larger issue still needs to be
> addressed?

Again, it's technically possible to add the link settings support to
wireless drivers.  But the issue is around what bonding would do with
that information in its various modes.

My biggest suggestion is that perhaps bonding should grow hysteresis
for link speeds. Since WiFi can change speed every packet, you probably
don't want the bond characteristics changing every couple seconds just
in case your WiFi link is jumping around.  Ethernet won't bounce around
that much, so the hysteresis would have no effect there.  Or, if people
are concerned about response time to speed changes on ethernet (where
you probably do want an instant switch-over) some new flag to indicate
that certain devices don't have stable speeds over time.

Dan

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 21:22 UTC (permalink / raw)
  To: john.fastabend; +Cc: eric.dumazet, daniel, ast, tgraf, netdev, tom
In-Reply-To: <59949E54.8050202@gmail.com>

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 16 Aug 2017 12:34:44 -0700

> Thanks Eric, I'll have a fix shortly.

The faster you fix these regressions the better :-)

^ permalink raw reply

* Re: [PATCH net V3] openvswitch: fix skb_panic due to the incorrect actions attrlen
From: David Miller @ 2017-08-16 21:13 UTC (permalink / raw)
  To: zlpnobody; +Cc: pshelar, netdev, zlpnobody, neil.mckee
In-Reply-To: <20170816053007.13991-1-zlpnobody@163.com>

From: Liping Zhang <zlpnobody@163.com>
Date: Wed, 16 Aug 2017 13:30:07 +0800

> From: Liping Zhang <zlpnobody@gmail.com>
> 
> For sw_flow_actions, the actions_len only represents the kernel part's
> size, and when we dump the actions to the userspace, we will do the
> convertions, so it's true size may become bigger than the actions_len.
> 
> But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
> to alloc the skbuff, so the user_skb's size may become insufficient and
> oops will happen like this:
 ...
> Also we can find that the actions_len is much little than the orig_len:
>   crash> struct sw_flow_actions 0xffff8812f539d000
>   struct sw_flow_actions {
>     rcu = {
>       next = 0xffff8812f5398800,
>       func = 0xffffe3b00035db32
>     },
>     orig_len = 1384,
>     actions_len = 592,
>     actions = 0xffff8812f539d01c
>   }
> 
> So as a quick fix, use the orig_len instead of the actions_len to alloc
> the user_skb.
> 
> Last, this oops happened on our system running a relative old kernel, but
> the same risk still exists on the mainline, since we use the wrong
> actions_len from the beginning.
> 
> Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
> Cc: Neil McKee <neil.mckee@inmon.com>
> Signed-off-by: Liping Zhang <zlpnobody@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] qdisc: add tracepoint qdisc:qdisc_dequeue for dequeued SKBs
From: David Miller @ 2017-08-16 21:09 UTC (permalink / raw)
  To: brouer; +Cc: netdev, pstaszewski
In-Reply-To: <150282426296.20350.8045788556432194444.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Tue, 15 Aug 2017 21:11:03 +0200

> The main purpose of this tracepoint is to monitor bulk dequeue
> in the network qdisc layer, as it cannot be deducted from the
> existing qdisc stats.
> 
> The txq_state can be used for determining the reason for zero packet
> dequeues, see enum netdev_queue_state_t.
> 
> Notice all packets doesn't necessary activate this tracepoint. As
> qdiscs with flag TCQ_F_CAN_BYPASS, can directly invoke
> sch_direct_xmit() when qdisc_qlen is zero.
> 
> Remember that perf record supports filters like:
> 
>  perf record -e qdisc:qdisc_dequeue \
>   --filter 'ifindex == 4 && (packets > 1 || txq_state > 0)'
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

I'll apply this to net-next, thanks Jesper.

^ permalink raw reply

* Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers
From: David Miller @ 2017-08-16 21:01 UTC (permalink / raw)
  To: james
  Cc: futur.andy, kvalo, arend.vanspriel, maheshb, andy, netdev,
	linux-wireless, greearb, dcbw
In-Reply-To: <2b6fd91a-f7cd-c2bf-6394-060d9b1f5d23@nurealm.net>

From: James Feeney <james@nurealm.net>
Date: Wed, 16 Aug 2017 14:44:27 -0600

> On 08/13/2017 11:42 AM, Andreas Born wrote:
>> On a side note I would recommend some of my own reading to you about
>> patch submission in general [1] and on netdev specifically [2].
> 
> Mmm - [2] and [3], I suspect.  Thanks Andreas.  I'll be studying those.  Yeah,
> I'm still learning what is needed and what works.  Sometimes, just a note to the
> author is more than enough to resolve a problem.  Sometimes, discussion is
> needed.  And other times... well, certain people are infamous... but no problem
> here, thankfully.
> 
>> And, just wondering, who's going to eventually close that bugreport?
>> https://bugzilla.kernel.org/show_bug.cgi?id=196547
> 
> I can close it when the patches actually land in the kernel.

All of the patches are in Linus's tree.

^ permalink raw reply

* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
From: David Miller @ 2017-08-16 20:59 UTC (permalink / raw)
  To: helgaas
  Cc: thierry.reding, dingtianhong, mark.rutland, gabriele.paoloni,
	asit.k.mallick, catalin.marinas, will.deacon, linuxarm,
	alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher,
	linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas,
	werner, linux-arm-kernel, amira, netdev, linux-kernel,
	David.Laight, Suravee.Suthikulpanit, robin.murphy, l.stach
In-Reply-To: <20170816200237.GE28977@bhelgaas-glaptop.roam.corp.google.com>

From: Bjorn Helgaas <helgaas@kernel.org>
Date: Wed, 16 Aug 2017 15:02:37 -0500

> Your fix looks right to me.

Someone please submit this fix formally because this change is now in
Linus's tree.

Thank you.

^ permalink raw reply

* Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers
From: James Feeney @ 2017-08-16 20:44 UTC (permalink / raw)
  To: Andreas Born
  Cc: Kalle Valo, Arend van Spriel, Mahesh Bandewar, Andy Gospodarek,
	David Miller, netdev, linux-wireless, Ben Greear, Dan Williams
In-Reply-To: <CADps9UCgqeUh41iHxCM+CTJh=Tffn-tiSYL2Q+NBb_8EppqRMA@mail.gmail.com>

On 08/13/2017 11:42 AM, Andreas Born wrote:
> On a side note I would recommend some of my own reading to you about
> patch submission in general [1] and on netdev specifically [2].

Mmm - [2] and [3], I suspect.  Thanks Andreas.  I'll be studying those.  Yeah,
I'm still learning what is needed and what works.  Sometimes, just a note to the
author is more than enough to resolve a problem.  Sometimes, discussion is
needed.  And other times... well, certain people are infamous... but no problem
here, thankfully.

> And, just wondering, who's going to eventually close that bugreport?
> https://bugzilla.kernel.org/show_bug.cgi?id=196547

I can close it when the patches actually land in the kernel.  I'm glad to see
that there was an "Ack" from Mahesh.

On the topic of wireless support for kernel ethtool reporting, I'm wondering, is
there is any consensus about that?

And, for instance, is there any *other* way for the bonding module to make
"better link" decisions for wireless links?  As "wireless" becomes more capable,
possibly more diverse, and probably more essential for computing, this is likely
to become a bigger issue.

Ben Greear mentioned that he had added some support to the ath10k driver.  Dan
Williams mentioned the possibility of updating the mac80211 stack for support.
And Arend van Spriel suggested that the issue might best be left for the next
Netconf.

Immediate problem solved, but maybe a larger issue still needs to be addressed?

James

^ permalink raw reply

* Re: [PATCH 1/6] bridge: learn dst metadata in FDB
From: Nikolay Aleksandrov @ 2017-08-16 20:38 UTC (permalink / raw)
  To: David Lamparter, netdev
  Cc: amine.kherbouche, roopa, stephen,
	bridge@lists.linux-foundation.org
In-Reply-To: <20170816170202.456851-2-equinox@diac24.net>

On 16/08/17 20:01, David Lamparter wrote:
> This implements holding dst metadata information in the bridge layer,
> but only for unicast entries in the MAC table.  Multicast is still left
> to design and implement.
> 
> Signed-off-by: David Lamparter <equinox@diac24.net>
> ---

Hi David,
Sorry but I do not agree with this change, adding a special case for VPLS in the bridge code and
hitting the fast path for everyone in a few different places for a feature that the majority
will not use does not sound acceptable to me. We've been trying hard to optimize it, trying to
avoid additional cache lines, removing tests and keeping special cases to a minimum.
I understand that you want to use the fdb tables and avoid duplication, but this is not
worth it. There're other similar use cases and they have their own private fdb tables,
that way the user can opt out and is much cleaner and separated.

As you've noted this is only an RFC so I will not point out every issue, but there seems
to be a major problem with br_fdb_update(), note that it runs without any locks except RCU.

By the way I've added the bridge maintainers to the CC list.

Thanks,
 Nik

>  include/net/dst_metadata.h | 19 +++++++++++++------
>  net/bridge/br_device.c     |  4 ++++
>  net/bridge/br_fdb.c        | 45 +++++++++++++++++++++++++++++++--------------
>  net/bridge/br_input.c      |  6 ++++--
>  net/bridge/br_private.h    |  4 +++-
>  5 files changed, 55 insertions(+), 23 deletions(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index a803129a4849..8858dc441458 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -56,16 +56,15 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
>  	return dst && !(dst->flags & DST_METADATA);
>  }
>  
> -static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
> -				       const struct sk_buff *skb_b)
> +static inline int dst_metadata_cmp(const struct dst_entry *dst_a,
> +				   const struct dst_entry *dst_b)
>  {
>  	const struct metadata_dst *a, *b;
> -
> -	if (!(skb_a->_skb_refdst | skb_b->_skb_refdst))
> +	if (!(dst_a || dst_b))
>  		return 0;
>  
> -	a = (const struct metadata_dst *) skb_dst(skb_a);
> -	b = (const struct metadata_dst *) skb_dst(skb_b);
> +	a = (const struct metadata_dst *) dst_a;
> +	b = (const struct metadata_dst *) dst_b;
>  
>  	if (!a != !b || a->type != b->type)
>  		return 1;
> @@ -83,6 +82,14 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
>  	}
>  }
>  
> +static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
> +				       const struct sk_buff *skb_b)
> +{
> +	if (!(skb_a->_skb_refdst | skb_b->_skb_refdst))
> +		return 0;
> +	return dst_metadata_cmp(skb_dst(skb_a), skb_dst(skb_b));
> +}
> +
>  void metadata_dst_free(struct metadata_dst *);
>  struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
>  					gfp_t flags);
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 861ae2a165f4..534cacf02f8d 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -53,6 +53,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>  	brstats->tx_bytes += skb->len;
>  	u64_stats_update_end(&brstats->syncp);
>  
> +	skb_dst_drop(skb);
>  	BR_INPUT_SKB_CB(skb)->brdev = dev;
>  
>  	skb_reset_mac_header(skb);
> @@ -81,6 +82,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>  		else
>  			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
>  	} else if ((dst = br_fdb_find_rcu(br, dest, vid)) != NULL) {
> +		struct dst_entry *md_dst = rcu_dereference(dst->md_dst);
> +		if (md_dst)
> +			skb_dst_set_noref(skb, md_dst);
>  		br_forward(dst->dst, skb, false, true);
>  	} else {
>  		br_flood(br, skb, BR_PKT_UNICAST, false, true);
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index a79b648aac88..0751fcb89699 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -25,11 +25,13 @@
>  #include <asm/unaligned.h>
>  #include <linux/if_vlan.h>
>  #include <net/switchdev.h>
> +#include <net/dst_metadata.h>
>  #include "br_private.h"
>  
>  static struct kmem_cache *br_fdb_cache __read_mostly;
>  static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
> -		      const unsigned char *addr, u16 vid);
> +		      struct dst_entry *md_dst, const unsigned char *addr,
> +		      u16 vid);
>  static void fdb_notify(struct net_bridge *br,
>  		       const struct net_bridge_fdb_entry *, int);
>  
> @@ -174,6 +176,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
>  	if (f->is_static)
>  		fdb_del_hw_addr(br, f->addr.addr);
>  
> +	dst_release(rcu_access_pointer(f->md_dst));
> +
>  	hlist_del_init_rcu(&f->hlist);
>  	fdb_notify(br, f, RTM_DELNEIGH);
>  	call_rcu(&f->rcu, fdb_rcu_free);
> @@ -260,7 +264,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
>  
>  insert:
>  	/* insert new address,  may fail if invalid address or dup. */
> -	fdb_insert(br, p, newaddr, 0);
> +	fdb_insert(br, p, NULL, newaddr, 0);
>  
>  	if (!vg || !vg->num_vlans)
>  		goto done;
> @@ -270,7 +274,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
>  	 * from under us.
>  	 */
>  	list_for_each_entry(v, &vg->vlan_list, vlist)
> -		fdb_insert(br, p, newaddr, v->vid);
> +		fdb_insert(br, p, NULL, newaddr, v->vid);
>  
>  done:
>  	spin_unlock_bh(&br->hash_lock);
> @@ -289,10 +293,11 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
>  	if (f && f->is_local && !f->dst && !f->added_by_user)
>  		fdb_delete_local(br, NULL, f);
>  
> -	fdb_insert(br, NULL, newaddr, 0);
> +	fdb_insert(br, NULL, NULL, newaddr, 0);
>  	vg = br_vlan_group(br);
>  	if (!vg || !vg->num_vlans)
>  		goto out;
> +
>  	/* Now remove and add entries for every VLAN configured on the
>  	 * bridge.  This function runs under RTNL so the bitmap will not
>  	 * change from under us.
> @@ -303,7 +308,7 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
>  		f = br_fdb_find(br, br->dev->dev_addr, v->vid);
>  		if (f && f->is_local && !f->dst && !f->added_by_user)
>  			fdb_delete_local(br, NULL, f);
> -		fdb_insert(br, NULL, newaddr, v->vid);
> +		fdb_insert(br, NULL, NULL, newaddr, v->vid);
>  	}
>  out:
>  	spin_unlock_bh(&br->hash_lock);
> @@ -477,6 +482,7 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf,
>  
>  static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
>  					       struct net_bridge_port *source,
> +					       struct dst_entry *md_dst,
>  					       const unsigned char *addr,
>  					       __u16 vid,
>  					       unsigned char is_local,
> @@ -488,6 +494,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
>  	if (fdb) {
>  		memcpy(fdb->addr.addr, addr, ETH_ALEN);
>  		fdb->dst = source;
> +		rcu_assign_pointer(fdb->md_dst, dst_clone(md_dst));
>  		fdb->vlan_id = vid;
>  		fdb->is_local = is_local;
>  		fdb->is_static = is_static;
> @@ -501,7 +508,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
>  }
>  
>  static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
> -		  const unsigned char *addr, u16 vid)
> +		      struct dst_entry *md_dst, const unsigned char *addr,
> +		      u16 vid)
>  {
>  	struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)];
>  	struct net_bridge_fdb_entry *fdb;
> @@ -521,7 +529,7 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
>  		fdb_delete(br, fdb);
>  	}
>  
> -	fdb = fdb_create(head, source, addr, vid, 1, 1);
> +	fdb = fdb_create(head, source, md_dst, addr, vid, 1, 1);
>  	if (!fdb)
>  		return -ENOMEM;
>  
> @@ -537,13 +545,14 @@ int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
>  	int ret;
>  
>  	spin_lock_bh(&br->hash_lock);
> -	ret = fdb_insert(br, source, addr, vid);
> +	ret = fdb_insert(br, source, NULL, addr, vid);
>  	spin_unlock_bh(&br->hash_lock);
>  	return ret;
>  }
>  
>  void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
> -		   const unsigned char *addr, u16 vid, bool added_by_user)
> +		   struct dst_entry *md_dst, const unsigned char *addr,
> +		   u16 vid, bool added_by_user)
>  {
>  	struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)];
>  	struct net_bridge_fdb_entry *fdb;
> @@ -558,6 +567,9 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>  	      source->state == BR_STATE_FORWARDING))
>  		return;
>  
> +	if (md_dst && !(md_dst->flags & DST_METADATA))
> +		md_dst = NULL;
> +
>  	fdb = fdb_find_rcu(head, addr, vid);
>  	if (likely(fdb)) {
>  		/* attempt to update an entry for a local interface */
> @@ -567,10 +579,15 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>  					source->dev->name, addr, vid);
>  		} else {
>  			unsigned long now = jiffies;
> +			struct dst_entry *ref_md = rcu_access_pointer(fdb->md_dst);
>  
>  			/* fastpath: update of existing entry */
> -			if (unlikely(source != fdb->dst)) {
> +			if (unlikely(source != fdb->dst ||
> +			    dst_metadata_cmp(md_dst, ref_md))) {
>  				fdb->dst = source;
> +				dst_release(ref_md);
> +				rcu_assign_pointer(fdb->md_dst,
> +						dst_clone(md_dst));
>  				fdb_modified = true;
>  				/* Take over HW learned entry */
>  				if (unlikely(fdb->added_by_external_learn))
> @@ -586,7 +603,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>  	} else {
>  		spin_lock(&br->hash_lock);
>  		if (likely(!fdb_find_rcu(head, addr, vid))) {
> -			fdb = fdb_create(head, source, addr, vid, 0, 0);
> +			fdb = fdb_create(head, source, md_dst, addr, vid, 0, 0);
>  			if (fdb) {
>  				if (unlikely(added_by_user))
>  					fdb->added_by_user = 1;
> @@ -781,7 +798,7 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>  		if (!(flags & NLM_F_CREATE))
>  			return -ENOENT;
>  
> -		fdb = fdb_create(head, source, addr, vid, 0, 0);
> +		fdb = fdb_create(head, source, NULL, addr, vid, 0, 0);
>  		if (!fdb)
>  			return -ENOMEM;
>  
> @@ -844,7 +861,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
>  		}
>  		local_bh_disable();
>  		rcu_read_lock();
> -		br_fdb_update(br, p, addr, vid, true);
> +		br_fdb_update(br, p, NULL, addr, vid, true);
>  		rcu_read_unlock();
>  		local_bh_enable();
>  	} else if (ndm->ndm_flags & NTF_EXT_LEARNED) {
> @@ -1071,7 +1088,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>  	head = &br->hash[br_mac_hash(addr, vid)];
>  	fdb = br_fdb_find(br, addr, vid);
>  	if (!fdb) {
> -		fdb = fdb_create(head, p, addr, vid, 0, 0);
> +		fdb = fdb_create(head, p, NULL, addr, vid, 0, 0);
>  		if (!fdb) {
>  			err = -ENOMEM;
>  			goto err_unlock;
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 7637f58c1226..df10c87b2499 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -150,7 +150,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>  	/* insert into forwarding database after filtering to avoid spoofing */
>  	br = p->br;
>  	if (p->flags & BR_LEARNING)
> -		br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, false);
> +		br_fdb_update(br, p, skb_dst(skb), eth_hdr(skb)->h_source,
> +			      vid, false);
>  
>  	local_rcv = !!(br->dev->flags & IFF_PROMISC);
>  	dest = eth_hdr(skb)->h_dest;
> @@ -230,7 +231,8 @@ static void __br_handle_local_finish(struct sk_buff *skb)
>  
>  	/* check if vlan is allowed, to avoid spoofing */
>  	if (p->flags & BR_LEARNING && br_should_learn(p, skb, &vid))
> -		br_fdb_update(p->br, p, eth_hdr(skb)->h_source, vid, false);
> +		br_fdb_update(p->br, p, skb_dst(skb),
> +				eth_hdr(skb)->h_source, vid, false);
>  }
>  
>  /* note: already called with rcu_read_lock */
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index fd9ee73e0a6d..b73f34ed765f 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -164,6 +164,7 @@ struct net_bridge_vlan_group {
>  struct net_bridge_fdb_entry {
>  	struct hlist_node		hlist;
>  	struct net_bridge_port		*dst;
> +	struct dst_entry __rcu		*md_dst;
>  
>  	mac_addr			addr;
>  	__u16				vlan_id;
> @@ -524,7 +525,8 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count,
>  int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
>  		  const unsigned char *addr, u16 vid);
>  void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
> -		   const unsigned char *addr, u16 vid, bool added_by_user);
> +		   struct dst_entry *md_dst, const unsigned char *addr,
> +		   u16 vid, bool added_by_user);
>  
>  int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>  		  struct net_device *dev, const unsigned char *addr, u16 vid);
> 

^ permalink raw reply

* Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
From: Paolo Abeni @ 2017-08-16 20:20 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Matthew Dawson, Network Development, Macieira, Thiago
In-Reply-To: <CAF=yD-K8FMicoSS6u-0r_J0p0fTyn4GNwhXn7_gRSSMYmAzw5g@mail.gmail.com>

On Wed, 2017-08-16 at 11:18 -0400, Willem de Bruijn wrote:
> > If I read the above correctly, you are arguining in favor of the
> > addittional flag version, right?
> 
> I was. Though if we are going to thread the argument from the caller
> to __skb_try_recv_from_queue to avoid rereading sk->sk_peek_off,
> on second thought it might be simpler to do it through off:
[...]
> This, of course, requires restricting sk_peek_off to protect against overflow.

Ok, even if I'm not 100% sure overall this will be simpler when adding
also the overflow check.

> If I'm not mistaken, the test in udp_recvmsg currently incorrectly sets
> peeking to false when peeking at offset zero:
> 
>         peeking = off = sk_peek_offset(sk, flags);

I think you are right, does not look correct.

> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2408,9 +2408,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim);
> > 
> >  int sk_set_peek_off(struct sock *sk, int val)
> >  {
> > -       if (val < 0)
> > -               return -EINVAL;
> > -
> > +       /* a negative value will disable peeking with offset */
> >         sk->sk_peek_off = val;
> >         return 0;
> >  }
> 
> Separate patch to net-next?

Agreed.

Paolo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox