Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache
From: Paolo Abeni @ 2018-04-18 10:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet
In-Reply-To: <cover.1524045911.git.pabeni@redhat.com>

This changeset extends the idea behind commit c8c8b127091b ("udp:
under rx pressure, try to condense skbs"), trading more BH cpu
time and memory bandwidth to decrease the load on the user space
receiver.

At boot time we allocate a limited amount of skbs with small
data buffer, storing them in per cpu arrays. Such skbs are never
freed.

At run time, under rx pressure, the BH tries to copy the current
skb contents into the cache - if the current cache skb is available,
and the ingress skb is small enough and without any head states.

When using the cache skb, the ingress skb is dropped by the BH
- while still hot on cache - and the cache skb is inserted into
the rx queue, after increasing its usage count. Also, the cache
array index is moved to the next entry.

The receive side is unmodified: in udp_rcvmsg() the usage skb
usage count is decreased and the skb is _not_ freed - since the
cache keeps usage > 0. Since skb->usage is hot in the cache of the
receiver at consume time - the receiver has just read skb->data,
which lies in the same cacheline - the whole skb_consume_udp() becomes
really cheap.

UDP receive performances under flood improve as follow:

NR RX queues	Kpps	Kpps	Delta (%)
		Before	After

1		2252	2305	2
2		2151	2569	19
4		2033	2396	17
8		1969	2329	18

Overall performances of knotd DNS server under real traffic flood
improves as follow:

		Kpps	Kpps	Delta (%)
		Before	After

		3777	3981	5

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
Performances figures are with both PAGE_TABLE_ISOLATION and
RETPOLINES enabled, this is way the baseline
---
 net/ipv4/udp.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3fb0fbf4977d..bb1879cd51b4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -125,6 +125,26 @@ EXPORT_SYMBOL(sysctl_udp_mem);
 atomic_long_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+struct skb_cache_entry {
+	int size;
+	int head;
+	struct sk_buff *skbs[0];
+};
+
+static struct skb_cache_entry __percpu *skb_cache;
+
+/* Under socket memory pressure, small packets are copied to a percpu cache
+ * before enqueuing them, do decrease the load on the receiver process.
+ * To avoid excessive copy overhead we use a small skb size threshold.
+ * Each percpu cache should be able to cope with at least a socket under
+ * memory pressure. It doesn't need to handle many of them: if there are
+ * more than a few sockets under memory pressure, the user-space is most
+ * probably too lazy and there is no gain using the cache
+ */
+#define UDP_CACHE_MAX_SKB_LEN		512
+#define UDP_CACHE_MIN_SIZE		_SK_MEM_PACKETS
+#define UDP_CACHE_MAX_SIZE		(_SK_MEM_PACKETS * 3)
+
 #define MAX_UDP_PORTS 65536
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
@@ -1246,6 +1266,82 @@ static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
 	udp_rmem_release(sk, udp_skb_truesize(skb), 1, true);
 }
 
+static inline struct sk_buff *udp_cache_get_skb(void)
+{
+	struct skb_cache_entry *cache;
+	struct sk_buff *skb;
+
+	if (unlikely(!skb_cache))
+		return NULL;
+
+	cache = this_cpu_ptr(skb_cache);
+	skb = cache->skbs[cache->head];
+	if (refcount_read(&skb->users) != 1)
+		return NULL;
+
+	/* peeking with offset clones the queued skbs, we must check that all
+	 * the cloned references are gone.
+	 * This barrier is paried with the implicit one in skb_unref(), while
+	 * decrementing skb->users.
+	 */
+	rmb();
+	if (unlikely(skb->cloned)) {
+		if (atomic_read(&skb_shinfo(skb)->dataref) != 1)
+			return NULL;
+		skb->cloned = 0;
+	}
+
+	cache->head++;
+	if (cache->head == cache->size)
+		cache->head = 0;
+	refcount_inc(&skb->users);
+	return skb;
+}
+
+static bool udp_copy_to_cache(struct sk_buff **s)
+{
+	struct sk_buff *skb2, *skb = *s;
+	int hlen;
+
+	/* check if we can copy the specified skb into the cache: data + l3 +
+	 * l4 must be below the the cached skb size and no head states must
+	 * be attached.
+	 */
+	hlen = skb_network_header_len(skb) + sizeof(struct udphdr);
+	if ((hlen + skb->len) >= UDP_CACHE_MAX_SKB_LEN || skb_sec_path(skb))
+		return false;
+
+	skb2 = udp_cache_get_skb();
+	if (!skb2)
+		return false;
+
+	/* copy the relevant header: we skip the head states - we know no state
+	 * is attached to 'skb' - the unrelevant part of the CB, and
+	 * skb->dev - will be overwritten later by udp_set_dev_scratch()
+	 */
+	skb2->tstamp	    = skb->tstamp;
+	*UDP_SKB_CB(skb2)   = *UDP_SKB_CB(skb);
+	skb2->queue_mapping = skb->queue_mapping;
+	memcpy(&skb2->headers_start, &skb->headers_start,
+	       offsetof(struct sk_buff, headers_end) -
+	       offsetof(struct sk_buff, headers_start));
+
+	/* skip the mac header, we don't need it */
+	skb_copy_bits(skb, -hlen, skb2->head, skb->len + hlen);
+
+	/* override the relevant offsets: skb2 starts from the network hdr */
+	skb2->transport_header = hlen - sizeof(struct udphdr);
+	skb2->network_header  = 0;
+	skb2->mac_header = 0;
+	skb2->data = skb2->head + hlen;
+	skb_set_tail_pointer(skb2, skb->len);
+	skb2->len = skb->len;
+	consume_skb(skb);
+
+	*s = skb2;
+	return true;
+}
+
 /* Idea of busylocks is to let producers grab an extra spinlock
  * to relieve pressure on the receive_queue spinlock shared by consumer.
  * Under flood, this means that only one producer can be in line
@@ -1290,9 +1386,12 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 	 * - Reduce memory overhead and thus increase receive queue capacity
 	 * - Less cache line misses at copyout() time
 	 * - Less work at consume_skb() (less alien page frag freeing)
+	 * Additionally, processing skbs from the cache allows udp_recvmsg()
+	 * to 'free' them with a single atomic operation on a hot cacheline
 	 */
 	if (rmem > (sk->sk_rcvbuf >> 1)) {
-		skb_condense(skb);
+		if (!udp_copy_to_cache(&skb))
+			skb_condense(skb);
 
 		busy = busylock_acquire(sk);
 	}
@@ -2858,6 +2957,64 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = {
 	.init	= udp_sysctl_init,
 };
 
+static void udp_free_cache(int nr)
+{
+	int i, cpu;
+
+	for_each_possible_cpu(cpu)
+		for (i = 0; i < nr; ++i)
+			kfree_skb(per_cpu_ptr(skb_cache, cpu)->skbs[i]);
+
+	free_percpu(skb_cache);
+	skb_cache = NULL;
+}
+
+static void udp_init_cache(unsigned long max_size)
+{
+	size_t skb_guessed_size, per_cpu_size;
+	unsigned long total_size = 0;
+	struct sk_buff *skb;
+	int i, nr, cpu = 0;
+
+	/* try to fill the cache only if we can allocate a reasonable number
+	 * of skbs
+	 */
+	skb_guessed_size = SKB_TRUESIZE(UDP_CACHE_MAX_SKB_LEN);
+	nr = min_t(unsigned long, UDP_CACHE_MAX_SIZE,
+		   max_size / (nr_cpu_ids * skb_guessed_size));
+	if (nr < UDP_CACHE_MIN_SIZE) {
+		pr_info("low memory, UDP skbs cache will not be allocated\n");
+		return;
+	}
+
+	per_cpu_size = nr * sizeof(void *) + sizeof(struct skb_cache_entry);
+	skb_cache = __alloc_percpu_gfp(per_cpu_size, L1_CACHE_BYTES,
+				       GFP_KERNEL | __GFP_ZERO);
+	if (!skb_cache) {
+		pr_warn("Can't allocate UDP skb cache\n");
+		return;
+	}
+
+	pr_info("allocating %d skbs on %d CPUs for rx cache\n", nr, nr_cpu_ids);
+	for (i = 0; i < nr && total_size < max_size; ++i) {
+		for_each_possible_cpu(cpu) {
+			skb = __alloc_skb(UDP_CACHE_MAX_SKB_LEN, GFP_KERNEL,
+					  0, cpu_to_node(cpu));
+			if (!skb) {
+				pr_warn("allocation failure, cache disabled");
+				udp_free_cache(nr);
+				return;
+			}
+
+			total_size += skb->truesize;
+			per_cpu_ptr(skb_cache, cpu)->skbs[i] = skb;
+		}
+	}
+
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(skb_cache, cpu)->size = nr;
+}
+
 void __init udp_init(void)
 {
 	unsigned long limit;
@@ -2871,6 +3028,7 @@ void __init udp_init(void)
 	sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
 
 	__udp_sysctl_init(&init_net);
+	udp_init_cache(sysctl_udp_mem[0] / 100 * PAGE_SIZE);
 
 	/* 16 spinlocks per cpu */
 	udp_busylocks_log = ilog2(nr_cpu_ids) + 4;
-- 
2.14.3

^ permalink raw reply related

* Re: tcp hang when socket fills up ?
From: Jozsef Kadlecsik @ 2018-04-18 10:27 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Florian Westphal, Michal Kubecek, netdev, Marcelo Ricardo Leitner,
	Eric Dumazet
In-Reply-To: <20180418093622.GB7492@nautica>

On Wed, 18 Apr 2018, Dominique Martinet wrote:

> Dominique Martinet wrote on Wed, Apr 18, 2018:
> > Jozsef Kadlecsik wrote on Wed, Apr 18, 2018:
> > > Yes, the state transition is wrong for simultaneous open, because the 
> > > tcp_conntracks table is not (cannot be) smart enough. Could you verify the 
> > > next untested patch?
> > 
> > Thanks for the patch; I'll give it a try (probably won't make it today
> > so will report tomorrow)
> 
> Actually had time; I can confirm (added printks) we did get in that if 
> that was pointed at, and we no longer get there now. The connection no 
> longer gets in invalid state, so that looks like it nailed it.
>
> I'm now confused what this has to do with tcp_timestamp though, since 
> setting that off also seemed to work around the issue, but if we get 
> something like that in I'll be happy anyway.

Thanks for the testing! One more line is required, however: we have to get 
the assured bit set for the connection, see the new patch below.

The tcp_conntracks state table could be fixed with introducing a new 
state, but that part is exposed to userspace (ctnetlink) and ugly 
compatibility code would be required for backward compatibility.
 
diff --git a/include/uapi/linux/netfilter/nf_conntrack_tcp.h b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
index 74b9115..bcba72d 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
@@ -46,6 +46,9 @@ enum tcp_conntrack {
 /* Marks possibility for expected RFC5961 challenge ACK */
 #define IP_CT_EXP_CHALLENGE_ACK 		0x40
 
+/* Simultaneous open initialized */
+#define IP_CT_TCP_SIMULTANEOUS_OPEN		0x80
+
 struct nf_ct_tcp_flags {
 	__u8 flags;
 	__u8 mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index e97cdc1..2c1fc7e 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -981,6 +981,20 @@ static int tcp_packet(struct nf_conn *ct,
 			return NF_ACCEPT; /* Don't change state */
 		}
 		break;
+	case TCP_CONNTRACK_SYN_SENT2:
+		/* tcp_conntracks table is not smart enough to handle
+		 * simultaneous open.
+		 */
+		ct->proto.tcp.last_flags |= IP_CT_TCP_SIMULTANEOUS_OPEN;
+		break;
+	case TCP_CONNTRACK_SYN_RECV:
+		if (dir == IP_CT_DIR_REPLY && index == TCP_ACK_SET &&
+		    ct->proto.tcp.last_flags & IP_CT_TCP_SIMULTANEOUS_OPEN) {
+			/* We want to set the assured bit */
+			old_state = TCP_CONNTRACK_SYN_RECV;
+			new_state = TCP_CONNTRACK_ESTABLISHED;
+		}
+		break;
 	case TCP_CONNTRACK_CLOSE:
 		if (index == TCP_RST_SET
 		    && (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related

* Re: [PATCH bpf-next v2 03/11] bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail
From: Tariq Toukan @ 2018-04-18 10:53 UTC (permalink / raw)
  To: Nikita V. Shirokov, Alexei Starovoitov, Daniel Borkmann,
	Tariq Toukan
  Cc: netdev
In-Reply-To: <20180418042951.17183-4-tehnerd@tehnerd.com>



On 18/04/2018 7:29 AM, Nikita V. Shirokov wrote:
> w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
> well (only "decrease" of pointer's location is going to be supported).
> changing of this pointer will change packet's size.
> for mlx4 driver we will just calculate packet's length unconditionally
> (the same way as it's already being done in mlx5)
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 5c613c6663da..efc55feddc5c 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -775,8 +775,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   
>   			act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   
> +			length = xdp.data_end - xdp.data;
>   			if (xdp.data != orig_data) {
> -				length = xdp.data_end - xdp.data;
>   				frags[0].page_offset = xdp.data -
>   					xdp.data_hard_start;
>   				va = xdp.data;
> 

Acked-by: Tariq Toukan <tariqt@mellanox.com>

Thanks.

^ permalink raw reply

* [PATCH] net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
From: Colin King @ 2018-04-18 11:00 UTC (permalink / raw)
  To: Dmitry Tarnyagin, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/caif/chnl_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 53ecda10b790..13e2ae6be620 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -174,7 +174,7 @@ static void chnl_flowctrl_cb(struct cflayer *layr, enum caif_ctrlcmd flow,
 		flow == CAIF_CTRLCMD_DEINIT_RSP ? "CLOSE/DEINIT" :
 		flow == CAIF_CTRLCMD_INIT_FAIL_RSP ? "OPEN_FAIL" :
 		flow == CAIF_CTRLCMD_REMOTE_SHUTDOWN_IND ?
-		 "REMOTE_SHUTDOWN" : "UKNOWN CTRL COMMAND");
+		 "REMOTE_SHUTDOWN" : "UNKNOWN CTRL COMMAND");
 
 
 
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH RFC net-next 00/11] udp gso
From: Paolo Abeni @ 2018-04-18 11:17 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: Willem de Bruijn
In-Reply-To: <20180417200059.30154-1-willemdebruijn.kernel@gmail.com>

On Tue, 2018-04-17 at 16:00 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Segmentation offload reduces cycles/byte for large packets by
> amortizing the cost of protocol stack traversal.
> 
> This patchset implements GSO for UDP. A process can concatenate and
> submit multiple datagrams to the same destination in one send call
> by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
> or passing an analogous cmsg at send time.
> 
> The stack will send the entire large (up to network layer max size)
> datagram through the protocol layer. At the GSO layer, it is broken
> up in individual segments. All receive the same network layer header
> and UDP src and dst port. All but the last segment have the same UDP
> header, but the last may differ in length and checksum.

This is interesting, thanks for sharing!

I have some local patches somewhere implementing UDP GRO, but I never
tried to upstream them, since I lacked the associated GSO and I thought
that the use-case was not too relevant.

Given that your use-case is a connected socket - no per packet route
lookup - how does GSO performs compared to plain sendmmsg()? Have you
considered using and/or improving the latter?

When testing with Spectre/Meltdown mitigation in places, I expect that
the most relevant part of the gain is due to the single syscall per
burst.

Cheers,

Paolo

^ permalink raw reply

* Re: tcp hang when socket fills up ?
From: Dominique Martinet @ 2018-04-18 11:30 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Florian Westphal, Michal Kubecek, netdev, Marcelo Ricardo Leitner,
	Eric Dumazet
In-Reply-To: <alpine.DEB.2.11.1804181220320.4316@blackhole.kfki.hu>

Jozsef Kadlecsik wrote on Wed, Apr 18, 2018:
> Thanks for the testing! One more line is required, however: we have to get 
> the assured bit set for the connection, see the new patch below.

I think it actually was better before. If I understand things correctly
at this point (when we get in the case TCP_CONNTRACK_SYN_RECV) we will
have seen SYN(out) SYN(in) SYNACK(out), but not the final ACK(in) yet.

Leaving old state as it was will not set the assured bit, but that will
be set on the next packet because old_state == new_state == established
at that point and the connection will really be setup then.

I don't think anything will blow up if we do either way, but strictly
speaking I'm more comfortable with the former.
I'll test the new patch regardless, I left work so can't reproduce
anymore but will yell tomorrow if it does explode ;)

> The tcp_conntracks state table could be fixed with introducing a new 
> state, but that part is exposed to userspace (ctnetlink) and ugly 
> compatibility code would be required for backward compatibility.

I agree a new state is more work than it is worth, I'm happy to leave it
as is.

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply

* Re: tcp hang when socket fills up ?
From: Jozsef Kadlecsik @ 2018-04-18 11:37 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Florian Westphal, Michal Kubecek, netdev, Marcelo Ricardo Leitner,
	Eric Dumazet
In-Reply-To: <20180418113058.GA9675@nautica>

On Wed, 18 Apr 2018, Dominique Martinet wrote:

> Jozsef Kadlecsik wrote on Wed, Apr 18, 2018:
> > Thanks for the testing! One more line is required, however: we have to get 
> > the assured bit set for the connection, see the new patch below.
> 
> I think it actually was better before. If I understand things correctly
> at this point (when we get in the case TCP_CONNTRACK_SYN_RECV) we will
> have seen SYN(out) SYN(in) SYNACK(out), but not the final ACK(in) yet.
> 
> Leaving old state as it was will not set the assured bit, but that will 
> be set on the next packet because old_state == new_state == established 
> at that point and the connection will really be setup then.

Yes, you are right: the first patch is better than the second one. 
Overthinking :-)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: Fw: [Bug 199429] New: smc_shutdown(net/smc/af_smc.c) has a UAF causing null pointer vulnerability.
From: Ursula Braun @ 2018-04-18 11:46 UTC (permalink / raw)
  To: Stephen Hemminger, Ursula Braun; +Cc: netdev
In-Reply-To: <20180417195644.7d04aff0@xeon-e3>



On 04/18/2018 04:56 AM, Stephen Hemminger wrote:
> This may already be fixed.
> 
> Begin forwarded message:
> 
> Date: Wed, 18 Apr 2018 01:52:59 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 199429] New: smc_shutdown(net/smc/af_smc.c) has a UAF causing null pointer vulnerability.
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=199429
> 
>             Bug ID: 199429
>            Summary: smc_shutdown(net/smc/af_smc.c) has a UAF causing null
>                     pointer vulnerability.
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.16.0-rc7
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: 1773876454@qq.com
>         Regression: No
> 
> Created attachment 275431
>   --> https://bugzilla.kernel.org/attachment.cgi?id=275431&action=edit  
> POC
> 
> Syzkaller hit 'general protection fault in kernel_sock_shutdown' bug.
> 
> NET: Registered protocol family 43

Thanks for reporting. This fix is needed here:

 net/smc/af_smc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1314,7 +1314,7 @@ static int smc_shutdown(struct socket *s
 	    (sk->sk_state != SMC_APPCLOSEWAIT2) &&
 	    (sk->sk_state != SMC_APPFINCLOSEWAIT))
 		goto out;
-	if (smc->use_fallback) {
+	if (smc->use_fallback || sk->sk_state == SMC_LISTEN) {
 		rc = kernel_sock_shutdown(smc->clcsock, how);
 		sk->sk_shutdown = smc->clcsock->sk->sk_shutdown;
 		if (sk->sk_shutdown == SHUTDOWN_MASK)

Kind regards, Ursula

^ permalink raw reply

* [PATCH] rt2x00: fix spelling mistake in various macros, UKNOWN -> UNKNOWN
From: Colin King @ 2018-04-18 11:47 UTC (permalink / raw)
  To: Stanislaw Gruszka, Helmut Schaa, Kalle Valo, linux-wireless,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Rename several macros that contain mispellings of UNKNOWN

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ralink/rt2x00/rt2800.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800.h b/drivers/net/wireless/ralink/rt2x00/rt2800.h
index 6a8c93fb6a43..8eccfbb5d6f8 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800.h
@@ -1193,10 +1193,10 @@
 #define TX_PWR_CFG_3_MCS13		FIELD32(0x000000f0)
 #define TX_PWR_CFG_3_MCS14		FIELD32(0x00000f00)
 #define TX_PWR_CFG_3_MCS15		FIELD32(0x0000f000)
-#define TX_PWR_CFG_3_UKNOWN1		FIELD32(0x000f0000)
-#define TX_PWR_CFG_3_UKNOWN2		FIELD32(0x00f00000)
-#define TX_PWR_CFG_3_UKNOWN3		FIELD32(0x0f000000)
-#define TX_PWR_CFG_3_UKNOWN4		FIELD32(0xf0000000)
+#define TX_PWR_CFG_3_UNKNOWN1		FIELD32(0x000f0000)
+#define TX_PWR_CFG_3_UNKNOWN2		FIELD32(0x00f00000)
+#define TX_PWR_CFG_3_UNKNOWN3		FIELD32(0x0f000000)
+#define TX_PWR_CFG_3_UNKNOWN4		FIELD32(0xf0000000)
 /* bits for 3T devices */
 #define TX_PWR_CFG_3_MCS12_CH0		FIELD32(0x0000000f)
 #define TX_PWR_CFG_3_MCS12_CH1		FIELD32(0x000000f0)
@@ -1216,10 +1216,10 @@
  * TX_PWR_CFG_4:
  */
 #define TX_PWR_CFG_4			0x1324
-#define TX_PWR_CFG_4_UKNOWN5		FIELD32(0x0000000f)
-#define TX_PWR_CFG_4_UKNOWN6		FIELD32(0x000000f0)
-#define TX_PWR_CFG_4_UKNOWN7		FIELD32(0x00000f00)
-#define TX_PWR_CFG_4_UKNOWN8		FIELD32(0x0000f000)
+#define TX_PWR_CFG_4_UNKNOWN5		FIELD32(0x0000000f)
+#define TX_PWR_CFG_4_UNKNOWN6		FIELD32(0x000000f0)
+#define TX_PWR_CFG_4_UNKNOWN7		FIELD32(0x00000f00)
+#define TX_PWR_CFG_4_UNKNOWN8		FIELD32(0x0000f000)
 /* bits for 3T devices */
 #define TX_PWR_CFG_4_STBC4_CH0		FIELD32(0x0000000f)
 #define TX_PWR_CFG_4_STBC4_CH1		FIELD32(0x000000f0)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] rt2x00: fix spelling mistake in various macros, UKNOWN -> UNKNOWN
From: Stanislaw Gruszka @ 2018-04-18 11:55 UTC (permalink / raw)
  To: Colin King
  Cc: Helmut Schaa, Kalle Valo, linux-wireless, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20180418114750.1978-1-colin.king@canonical.com>

On Wed, Apr 18, 2018 at 12:47:50PM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Rename several macros that contain mispellings of UNKNOWN
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>

^ permalink raw reply

* [RFC PATCH] net: bridge: multicast querier per VLAN support
From: Joachim Nilsson @ 2018-04-18 12:07 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Nikolay Aleksandrov

This RFC patch¹ is an attempt to add multicast querier per VLAN support
to a VLAN aware bridge.  I'm posting it as RFC for now since non-VLAN
aware bridges are not handled, and one of my questions is if that is
complexity we need to continue supporting?

>From what I understand, multicast join/report already support per VLAN
operation, and the MDB as well support filtering per VLAN, but queries
are currently limited to per-port operation on VLAN-aware bridges.

The naive² approach of this patch relocates query timers from the bridge
to operate per VLAN, on timer expiry we send queries to all bridge ports
in the same VLAN.  Tagged port members have tagged VLAN queries.

Unlike the original patch¹, which uses a sysfs entry to set the querier
address of each VLAN, this use the IP address of the VLAN interface when
initiating a per VLAN query.  A version of inet_select_addr() is used
for this, called inet_select_dev_addr(), not included in this patch.

Open questions/TODO:

- First of all, is this patch useful to anyone?
- The current br_multicast.c is very complex.  The support for both IPv4
  and IPv6 is a no-brainer, but it also has #ifdef VLAN_FILTERING and
  'br->vlan_enabled' ... this has likely been discussed before, but if
  we could remove those code paths I believe what's left would be quite
  a bit easier to read and maintain.
- Many per-bridge specific multicast sysfs settings may need to have a
  corresponding per-VLAN setting, e.g. snooping, query_interval, etc.
  How should we go about that? (For status reporting I have a proposal)
- Dito per-port specific multicast sysfs settings, e.g. multicast_router
- The MLD support has been kept in sync with the rest but is completely
  untested.  In particular I suspect the wrong source IP will be used.

¹) Initially based on a patch by Cumulus Networks
   http://repo3.cumulusnetworks.com/repo/pool/cumulus/l/linux/linux-source-4.1_4.1.33-1+cl3u11_all.deb
²) This patch is currently limited to work only on bridges with VLAN
   enabled.  Care has been taken to support MLD snooping, but it is
   completely untested.

Thank you for reading this far!

Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
---
 net/bridge/br_device.c    |   2 +-
 net/bridge/br_input.c     |   2 +-
 net/bridge/br_multicast.c | 456 ++++++++++++++++++++++++--------------
 net/bridge/br_private.h   |  38 +++-
 net/bridge/br_stp.c       |   5 +-
 net/bridge/br_vlan.c      |   3 +
 6 files changed, 327 insertions(+), 179 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 02f9f8aab047..ba35485032d8 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -98,7 +98,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb)))
+		    br_multicast_querier_exists(br, vid, eth_hdr(skb)))
 			br_multicast_flood(mdst, skb, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 56bb9189c374..13d48489e0e1 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -137,7 +137,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 		mdst = br_mdb_get(br, skb, vid);
 		if ((mdst && mdst->addr.proto == htons(ETH_P_ALL)) ||
 		    ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		     br_multicast_querier_exists(br, eth_hdr(skb)))) {
+		     br_multicast_querier_exists(br, vid, eth_hdr(skb)))) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(br)) {
 				local_rcv = true;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 277ecd077dc4..72e47d500972 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -13,6 +13,7 @@
 #include <linux/err.h>
 #include <linux/export.h>
 #include <linux/if_ether.h>
+#include <linux/if_vlan.h>
 #include <linux/igmp.h>
 #include <linux/jhash.h>
 #include <linux/kernel.h>
@@ -37,7 +38,7 @@
 
 #include "br_private.h"
 
-static void br_multicast_start_querier(struct net_bridge *br,
+static void br_multicast_start_querier(struct net_bridge_vlan *vlan,
 				       struct bridge_mcast_own_query *query);
 static void br_multicast_add_router(struct net_bridge *br,
 				    struct net_bridge_port *port);
@@ -46,13 +47,14 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
 					 __be32 group,
 					 __u16 vid,
 					 const unsigned char *src);
-
+static void br_ip4_multicast_query_expired(struct timer_list *t);
 static void __del_port_router(struct net_bridge_port *p);
 #if IS_ENABLED(CONFIG_IPV6)
 static void br_ip6_multicast_leave_group(struct net_bridge *br,
 					 struct net_bridge_port *port,
 					 const struct in6_addr *group,
 					 __u16 vid, const unsigned char *src);
+static void br_ip6_multicast_query_expired(struct timer_list *t);
 #endif
 unsigned int br_mdb_rehash_seq;
 
@@ -381,8 +383,30 @@ static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
 	return 0;
 }
 
+__be32 br_multicast_inet_addr(struct net_bridge *br, u16 vid)
+{
+	struct net_device *dev;
+
+	if (!br->multicast_query_use_ifaddr)
+		return 0;
+
+	if (!vid)
+		return inet_select_addr(br->dev, 0, RT_SCOPE_LINK);
+
+	rcu_read_lock();
+	dev = __vlan_find_dev_deep_rcu(br->dev, htons(ETH_P_8021Q), vid);
+	rcu_read_unlock();
+
+	if (!dev)
+		return 0;
+
+	return inet_select_dev_addr(dev, 0, RT_SCOPE_LINK);
+}
+
 static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 						    __be32 group,
+						    __u16 vid,
+						    bool tagged,
 						    u8 *igmp_type)
 {
 	struct igmpv3_query *ihv3;
@@ -391,12 +415,17 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	struct igmphdr *ih;
 	struct ethhdr *eth;
 	struct iphdr *iph;
+	int vh_size = 0;
+
+	/* if vid is non-zero, insert the 1Q header also */
+	if (vid && tagged)
+		vh_size = sizeof(struct vlan_hdr);
 
 	igmp_hdr_size = sizeof(*ih);
 	if (br->multicast_igmp_version == 3)
 		igmp_hdr_size = sizeof(*ihv3);
 	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
-						 igmp_hdr_size + 4);
+						 vh_size + igmp_hdr_size + 4);
 	if (!skb)
 		goto out;
 
@@ -415,6 +444,15 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	eth->h_proto = htons(ETH_P_IP);
 	skb_put(skb, sizeof(*eth));
 
+	if (vid && tagged) {
+		skb = vlan_insert_tag_set_proto(skb, htons(ETH_P_8021Q), vid);
+		if (!skb) {
+			kfree_skb(skb);
+			br_err(br, "Failed adding VLAN tag to IGMP query, vid:%d\n", vid);
+			return NULL;
+		}
+	}
+
 	skb_set_network_header(skb, skb->len);
 	iph = ip_hdr(skb);
 
@@ -426,8 +464,7 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	iph->frag_off = htons(IP_DF);
 	iph->ttl = 1;
 	iph->protocol = IPPROTO_IGMP;
-	iph->saddr = br->multicast_query_use_ifaddr ?
-		     inet_select_addr(br->dev, 0, RT_SCOPE_LINK) : 0;
+	iph->saddr = br_multicast_inet_addr(br, vid);
 	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
 	((u8 *)&iph[1])[0] = IPOPT_RA;
 	((u8 *)&iph[1])[1] = 4;
@@ -477,6 +514,8 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 #if IS_ENABLED(CONFIG_IPV6)
 static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 						    const struct in6_addr *grp,
+						    __u16 vid,
+						    bool tagged,
 						    u8 *igmp_type)
 {
 	struct mld2_query *mld2q;
@@ -486,13 +525,18 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 	size_t mld_hdr_size;
 	struct sk_buff *skb;
 	struct ethhdr *eth;
+	int vh_size = 0;
 	u8 *hopopt;
 
+	/* if vid is non-zero, insert the 1Q header also */
+	if (vid && tagged)
+		vh_size = sizeof(struct vlan_hdr);
+
 	mld_hdr_size = sizeof(*mldq);
 	if (br->multicast_mld_version == 2)
 		mld_hdr_size = sizeof(*mld2q);
 	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*ip6h) +
-						 8 + mld_hdr_size);
+						 vh_size + 8 + mld_hdr_size);
 	if (!skb)
 		goto out;
 
@@ -506,6 +550,15 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 	eth->h_proto = htons(ETH_P_IPV6);
 	skb_put(skb, sizeof(*eth));
 
+	if (vid && tagged) {
+		skb = vlan_insert_tag_set_proto(skb, htons(ETH_P_8021Q), vid);
+		if (!skb) {
+			kfree_skb(skb);
+			br_err(br, "Failed adding VLAN tag to MLD query, vid:%d\n", vid);
+			return NULL;
+		}
+	}
+
 	/* IPv6 header + HbH option */
 	skb_set_network_header(skb, skb->len);
 	ip6h = ipv6_hdr(skb);
@@ -590,15 +643,17 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 
 static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
 						struct br_ip *addr,
+						bool tagged,
 						u8 *igmp_type)
 {
 	switch (addr->proto) {
 	case htons(ETH_P_IP):
-		return br_ip4_multicast_alloc_query(br, addr->u.ip4, igmp_type);
+		return br_ip4_multicast_alloc_query(br, addr->u.ip4, addr->vid,
+						    tagged, igmp_type);
 #if IS_ENABLED(CONFIG_IPV6)
 	case htons(ETH_P_IPV6):
-		return br_ip6_multicast_alloc_query(br, &addr->u.ip6,
-						    igmp_type);
+		return br_ip6_multicast_alloc_query(br, &addr->u.ip6, addr->vid,
+						    tagged, igmp_type);
 #endif
 	}
 	return NULL;
@@ -905,14 +960,16 @@ static void br_multicast_local_router_expired(struct timer_list *t)
 	spin_unlock(&br->multicast_lock);
 }
 
-static void br_multicast_querier_expired(struct net_bridge *br,
+static void br_multicast_querier_expired(struct net_bridge_vlan *vlan,
 					 struct bridge_mcast_own_query *query)
 {
+	struct net_bridge *br = vlan->br;
+
 	spin_lock(&br->multicast_lock);
 	if (!netif_running(br->dev) || br->multicast_disabled)
 		goto out;
 
-	br_multicast_start_querier(br, query);
+	br_multicast_start_querier(vlan, query);
 
 out:
 	spin_unlock(&br->multicast_lock);
@@ -920,17 +977,17 @@ static void br_multicast_querier_expired(struct net_bridge *br,
 
 static void br_ip4_multicast_querier_expired(struct timer_list *t)
 {
-	struct net_bridge *br = from_timer(br, t, ip4_other_query.timer);
+	struct net_bridge_vlan *v = from_timer(v, t, ip4_other_query.timer);
 
-	br_multicast_querier_expired(br, &br->ip4_own_query);
+	br_multicast_querier_expired(v, &v->ip4_own_query);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static void br_ip6_multicast_querier_expired(struct timer_list *t)
 {
-	struct net_bridge *br = from_timer(br, t, ip6_other_query.timer);
+	struct net_bridge_vlan *v = from_timer(v, t, ip6_other_query.timer);
 
-	br_multicast_querier_expired(br, &br->ip6_own_query);
+	br_multicast_querier_expired(v, &v->ip6_own_query);
 }
 #endif
 
@@ -938,11 +995,17 @@ static void br_multicast_select_own_querier(struct net_bridge *br,
 					    struct br_ip *ip,
 					    struct sk_buff *skb)
 {
+	struct net_bridge_vlan *v;
+
+	v = br_vlan_find(br_vlan_group(br), ip->vid);
+	if (!v)
+		return;
+
 	if (ip->proto == htons(ETH_P_IP))
-		br->ip4_querier.addr.u.ip4 = ip_hdr(skb)->saddr;
+		v->ip4_querier.addr.u.ip4 = ip_hdr(skb)->saddr;
 #if IS_ENABLED(CONFIG_IPV6)
 	else
-		br->ip6_querier.addr.u.ip6 = ipv6_hdr(skb)->saddr;
+		v->ip6_querier.addr.u.ip6 = ipv6_hdr(skb)->saddr;
 #endif
 }
 
@@ -951,9 +1014,27 @@ static void __br_multicast_send_query(struct net_bridge *br,
 				      struct br_ip *ip)
 {
 	struct sk_buff *skb;
+	bool tagged = false;
 	u8 igmp_type;
 
-	skb = br_multicast_alloc_query(br, ip, &igmp_type);
+	if (port->state == BR_STATE_DISABLED ||
+	    port->state == BR_STATE_BLOCKING)
+		return;
+
+#ifdef CONFIG_BRIDGE_VLAN_FILTERING
+	if (port && ip->vid) {
+		struct net_bridge_vlan *v;
+
+		v = br_vlan_find(nbp_vlan_group_rcu(port), ip->vid);
+		if (!br->vlan_enabled || !v)
+			return;
+
+		if (!(v->flags & BRIDGE_VLAN_INFO_UNTAGGED))
+			tagged = true;
+	}
+#endif
+
+	skb = br_multicast_alloc_query(br, ip, tagged, &igmp_type);
 	if (!skb)
 		return;
 
@@ -972,11 +1053,12 @@ static void __br_multicast_send_query(struct net_bridge *br,
 	}
 }
 
-static void br_multicast_send_query(struct net_bridge *br,
+static void br_multicast_send_query(struct net_bridge_vlan *vlan,
 				    struct net_bridge_port *port,
 				    struct bridge_mcast_own_query *own_query)
 {
 	struct bridge_mcast_other_query *other_query = NULL;
+	struct net_bridge *br = vlan->br;
 	struct br_ip br_group;
 	unsigned long time;
 
@@ -985,22 +1067,27 @@ static void br_multicast_send_query(struct net_bridge *br,
 		return;
 
 	memset(&br_group.u, 0, sizeof(br_group.u));
-
-	if (port ? (own_query == &port->ip4_own_query) :
-		   (own_query == &br->ip4_own_query)) {
-		other_query = &br->ip4_other_query;
+	br_group.vid = vlan->vid;
+	if (own_query == &vlan->ip4_own_query) {
+		other_query = &vlan->ip4_other_query;
 		br_group.proto = htons(ETH_P_IP);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
-		other_query = &br->ip6_other_query;
+		other_query = &vlan->ip6_other_query;
 		br_group.proto = htons(ETH_P_IPV6);
 #endif
 	}
 
+	if (port) {
+		__br_multicast_send_query(br, port, &br_group);
+		return;
+	}
+
 	if (!other_query || timer_pending(&other_query->timer))
 		return;
 
-	__br_multicast_send_query(br, port, &br_group);
+	list_for_each_entry(port, &br->port_list, list)
+		__br_multicast_send_query(br, port, &br_group);
 
 	time = jiffies;
 	time += own_query->startup_sent < br->multicast_startup_query_count ?
@@ -1009,42 +1096,6 @@ static void br_multicast_send_query(struct net_bridge *br,
 	mod_timer(&own_query->timer, time);
 }
 
-static void
-br_multicast_port_query_expired(struct net_bridge_port *port,
-				struct bridge_mcast_own_query *query)
-{
-	struct net_bridge *br = port->br;
-
-	spin_lock(&br->multicast_lock);
-	if (port->state == BR_STATE_DISABLED ||
-	    port->state == BR_STATE_BLOCKING)
-		goto out;
-
-	if (query->startup_sent < br->multicast_startup_query_count)
-		query->startup_sent++;
-
-	br_multicast_send_query(port->br, port, query);
-
-out:
-	spin_unlock(&br->multicast_lock);
-}
-
-static void br_ip4_multicast_port_query_expired(struct timer_list *t)
-{
-	struct net_bridge_port *port = from_timer(port, t, ip4_own_query.timer);
-
-	br_multicast_port_query_expired(port, &port->ip4_own_query);
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static void br_ip6_multicast_port_query_expired(struct timer_list *t)
-{
-	struct net_bridge_port *port = from_timer(port, t, ip6_own_query.timer);
-
-	br_multicast_port_query_expired(port, &port->ip6_own_query);
-}
-#endif
-
 static void br_mc_disabled_update(struct net_device *dev, bool value)
 {
 	struct switchdev_attr attr = {
@@ -1063,12 +1114,6 @@ int br_multicast_add_port(struct net_bridge_port *port)
 
 	timer_setup(&port->multicast_router_timer,
 		    br_multicast_router_expired, 0);
-	timer_setup(&port->ip4_own_query.timer,
-		    br_ip4_multicast_port_query_expired, 0);
-#if IS_ENABLED(CONFIG_IPV6)
-	timer_setup(&port->ip6_own_query.timer,
-		    br_ip6_multicast_port_query_expired, 0);
-#endif
 	br_mc_disabled_update(port->dev, port->br->multicast_disabled);
 
 	port->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats);
@@ -1109,15 +1154,47 @@ static void __br_multicast_enable_port(struct net_bridge_port *port)
 	if (br->multicast_disabled || !netif_running(br->dev))
 		return;
 
-	br_multicast_enable(&port->ip4_own_query);
-#if IS_ENABLED(CONFIG_IPV6)
-	br_multicast_enable(&port->ip6_own_query);
-#endif
 	if (port->multicast_router == MDB_RTR_TYPE_PERM &&
 	    hlist_unhashed(&port->rlist))
 		br_multicast_add_router(br, port);
 }
 
+static void __br_multicast_vlan_init(struct net_bridge_vlan *vlan)
+{
+	vlan->ip4_querier.port = NULL;
+	vlan->ip4_other_query.delay_time = 0;
+
+	timer_setup(&vlan->ip4_other_query.timer,
+		    br_ip4_multicast_querier_expired, 0);
+	timer_setup(&vlan->ip4_own_query.timer,
+		    br_ip4_multicast_query_expired, 0);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	vlan->ip6_querier.port = NULL;
+	vlan->ip6_other_query.delay_time = 0;
+	timer_setup(&vlan->ip6_other_query.timer,
+		    br_ip6_multicast_querier_expired, 0);
+	timer_setup(&vlan->ip6_own_query.timer,
+		    br_ip6_multicast_query_expired, 0);
+ #endif
+}
+
+void br_multicast_enable_vlan(struct net_bridge *br, u16 vid)
+{
+	struct net_bridge_vlan *v;
+
+	v = br_vlan_find(br_vlan_group(br), vid);
+	if (!v)
+		return;
+
+	__br_multicast_vlan_init(v);
+	br_multicast_enable(&v->ip4_own_query);
+#if IS_ENABLED(CONFIG_IPV6)
+	br_multicast_enable(&v->ip6_own_query);
+#endif
+}
+
+/* called by stp to enable timers, only use it to enable router port? -jnn */
 void br_multicast_enable_port(struct net_bridge_port *port)
 {
 	struct net_bridge *br = port->br;
@@ -1127,6 +1204,7 @@ void br_multicast_enable_port(struct net_bridge_port *port)
 	spin_unlock(&br->multicast_lock);
 }
 
+/* called by stp_if */
 void br_multicast_disable_port(struct net_bridge_port *port)
 {
 	struct net_bridge *br = port->br;
@@ -1139,12 +1217,6 @@ void br_multicast_disable_port(struct net_bridge_port *port)
 			br_multicast_del_pg(br, pg);
 
 	__del_port_router(port);
-
-	del_timer(&port->multicast_router_timer);
-	del_timer(&port->ip4_own_query.timer);
-#if IS_ENABLED(CONFIG_IPV6)
-	del_timer(&port->ip6_own_query.timer);
-#endif
 	spin_unlock(&br->multicast_lock);
 }
 
@@ -1283,65 +1355,66 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
 }
 #endif
 
-static bool br_ip4_multicast_select_querier(struct net_bridge *br,
+static bool br_ip4_multicast_select_querier(struct net_bridge_vlan *vlan,
 					    struct net_bridge_port *port,
 					    __be32 saddr)
 {
-	if (!timer_pending(&br->ip4_own_query.timer) &&
-	    !timer_pending(&br->ip4_other_query.timer))
+
+	if (!timer_pending(&vlan->ip4_own_query.timer) &&
+	    !timer_pending(&vlan->ip4_other_query.timer))
 		goto update;
 
-	if (!br->ip4_querier.addr.u.ip4)
+	if (!vlan->ip4_querier.addr.u.ip4)
 		goto update;
 
-	if (ntohl(saddr) <= ntohl(br->ip4_querier.addr.u.ip4))
+	if (ntohl(saddr) <= ntohl(vlan->ip4_querier.addr.u.ip4))
 		goto update;
 
 	return false;
 
 update:
-	br->ip4_querier.addr.u.ip4 = saddr;
+	vlan->ip4_querier.addr.u.ip4 = saddr;
 
 	/* update protected by general multicast_lock by caller */
-	rcu_assign_pointer(br->ip4_querier.port, port);
+	rcu_assign_pointer(vlan->ip4_querier.port, port);
 
 	return true;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static bool br_ip6_multicast_select_querier(struct net_bridge *br,
+static bool br_ip6_multicast_select_querier(struct net_bridge_vlan *vlan,
 					    struct net_bridge_port *port,
 					    struct in6_addr *saddr)
 {
-	if (!timer_pending(&br->ip6_own_query.timer) &&
-	    !timer_pending(&br->ip6_other_query.timer))
+	if (!timer_pending(&vlan->ip6_own_query.timer) &&
+	    !timer_pending(&vlan->ip6_other_query.timer))
 		goto update;
 
-	if (ipv6_addr_cmp(saddr, &br->ip6_querier.addr.u.ip6) <= 0)
+	if (ipv6_addr_cmp(saddr, &vlan->ip6_querier.addr.u.ip6) <= 0)
 		goto update;
 
 	return false;
 
 update:
-	br->ip6_querier.addr.u.ip6 = *saddr;
+	vlan->ip6_querier.addr.u.ip6 = *saddr;
 
 	/* update protected by general multicast_lock by caller */
-	rcu_assign_pointer(br->ip6_querier.port, port);
+	rcu_assign_pointer(vlan->ip6_querier.port, port);
 
 	return true;
 }
 #endif
 
-static bool br_multicast_select_querier(struct net_bridge *br,
+static bool br_multicast_select_querier(struct net_bridge_vlan *vlan,
 					struct net_bridge_port *port,
 					struct br_ip *saddr)
 {
 	switch (saddr->proto) {
 	case htons(ETH_P_IP):
-		return br_ip4_multicast_select_querier(br, port, saddr->u.ip4);
+		return br_ip4_multicast_select_querier(vlan, port, saddr->u.ip4);
 #if IS_ENABLED(CONFIG_IPV6)
 	case htons(ETH_P_IPV6):
-		return br_ip6_multicast_select_querier(br, port, &saddr->u.ip6);
+		return br_ip6_multicast_select_querier(vlan, port, &saddr->u.ip6);
 #endif
 	}
 
@@ -1425,17 +1498,17 @@ static void br_multicast_mark_router(struct net_bridge *br,
 		  now + br->multicast_querier_interval);
 }
 
-static void br_multicast_query_received(struct net_bridge *br,
+static void br_multicast_query_received(struct net_bridge_vlan *vlan,
 					struct net_bridge_port *port,
 					struct bridge_mcast_other_query *query,
 					struct br_ip *saddr,
 					unsigned long max_delay)
 {
-	if (!br_multicast_select_querier(br, port, saddr))
+	if (!br_multicast_select_querier(vlan, port, saddr))
 		return;
 
-	br_multicast_update_query_timer(br, query, max_delay);
-	br_multicast_mark_router(br, port);
+	br_multicast_update_query_timer(vlan->br, query, max_delay);
+	br_multicast_mark_router(vlan->br, port);
 }
 
 static int br_ip4_multicast_query(struct net_bridge *br,
@@ -1482,10 +1555,17 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	}
 
 	if (!group) {
+		struct net_bridge_vlan *v;
+
+		v = br_vlan_find(br_vlan_group(br), vid);
+		if (!v)
+			goto out;
+
 		saddr.proto = htons(ETH_P_IP);
+		saddr.vid   = vid;
 		saddr.u.ip4 = iph->saddr;
 
-		br_multicast_query_received(br, port, &br->ip4_other_query,
+		br_multicast_query_received(v, port, &v->ip4_other_query,
 					    &saddr, max_delay);
 		goto out;
 	}
@@ -1565,10 +1645,17 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	is_general_query = group && ipv6_addr_any(group);
 
 	if (is_general_query) {
+		struct net_bridge_vlan *v;
+
+		v = br_vlan_find(br_vlan_group(br), vid);
+		if (!v)
+			goto out;
+
 		saddr.proto = htons(ETH_P_IPV6);
+		saddr.vid   = vid;
 		saddr.u.ip6 = ip6h->saddr;
 
-		br_multicast_query_received(br, port, &br->ip6_other_query,
+		br_multicast_query_received(v, port, &v->ip6_other_query,
 					    &saddr, max_delay);
 		goto out;
 	} else if (!group) {
@@ -1716,20 +1803,22 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
 					 __u16 vid,
 					 const unsigned char *src)
 {
+	struct net_bridge_vlan *v;
 	struct br_ip br_group;
-	struct bridge_mcast_own_query *own_query;
 
 	if (ipv4_is_local_multicast(group))
 		return;
 
-	own_query = port ? &port->ip4_own_query : &br->ip4_own_query;
+	v = br_vlan_find(br_vlan_group(br), vid);
+	if (!v)
+		return;
 
 	br_group.u.ip4 = group;
 	br_group.proto = htons(ETH_P_IP);
 	br_group.vid = vid;
 
-	br_multicast_leave_group(br, port, &br_group, &br->ip4_other_query,
-				 own_query, src);
+	br_multicast_leave_group(br, port, &br_group, &v->ip4_other_query,
+				 &v->ip4_own_query, src);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1739,20 +1828,22 @@ static void br_ip6_multicast_leave_group(struct net_bridge *br,
 					 __u16 vid,
 					 const unsigned char *src)
 {
+	struct net_bridge_vlan *v;
 	struct br_ip br_group;
-	struct bridge_mcast_own_query *own_query;
 
 	if (ipv6_addr_is_ll_all_nodes(group))
 		return;
 
-	own_query = port ? &port->ip6_own_query : &br->ip6_own_query;
+	v = br_vlan_find(br_vlan_group(br), vid);
+	if (!v)
+		return;
 
 	br_group.u.ip6 = *group;
 	br_group.proto = htons(ETH_P_IPV6);
 	br_group.vid = vid;
 
-	br_multicast_leave_group(br, port, &br_group, &br->ip6_other_query,
-				 own_query, src);
+	br_multicast_leave_group(br, port, &br_group, &v->ip6_other_query,
+				 &v->ip6_own_query, src);
 }
 #endif
 
@@ -1938,37 +2029,42 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
 	return ret;
 }
 
-static void br_multicast_query_expired(struct net_bridge *br,
+static void br_multicast_query_expired(struct net_bridge_vlan *vlan,
 				       struct bridge_mcast_own_query *query,
 				       struct bridge_mcast_querier *querier)
 {
+	struct net_bridge *br = vlan->br;
+
 	spin_lock(&br->multicast_lock);
 	if (query->startup_sent < br->multicast_startup_query_count)
 		query->startup_sent++;
 
 	RCU_INIT_POINTER(querier->port, NULL);
-	br_multicast_send_query(br, NULL, query);
+	br_multicast_send_query(vlan, NULL, query);
 	spin_unlock(&br->multicast_lock);
 }
 
 static void br_ip4_multicast_query_expired(struct timer_list *t)
 {
-	struct net_bridge *br = from_timer(br, t, ip4_own_query.timer);
+	struct net_bridge_vlan *v = from_timer(v, t, ip4_own_query.timer);
 
-	br_multicast_query_expired(br, &br->ip4_own_query, &br->ip4_querier);
+	br_multicast_query_expired(v, &v->ip4_own_query, &v->ip4_querier);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static void br_ip6_multicast_query_expired(struct timer_list *t)
 {
-	struct net_bridge *br = from_timer(br, t, ip6_own_query.timer);
+	struct net_bridge_vlan *v = from_timer(v, t, ip6_own_query.timer);
 
-	br_multicast_query_expired(br, &br->ip6_own_query, &br->ip6_querier);
+	br_multicast_query_expired(v, &v->ip6_own_query, &v->ip6_querier);
 }
 #endif
 
 void br_multicast_init(struct net_bridge *br)
 {
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+
 	br->hash_elasticity = 4;
 	br->hash_max = 512;
 
@@ -1985,29 +2081,22 @@ void br_multicast_init(struct net_bridge *br)
 	br->multicast_querier_interval = 255 * HZ;
 	br->multicast_membership_interval = 260 * HZ;
 
-	br->ip4_other_query.delay_time = 0;
-	br->ip4_querier.port = NULL;
 	br->multicast_igmp_version = 2;
 #if IS_ENABLED(CONFIG_IPV6)
 	br->multicast_mld_version = 1;
-	br->ip6_other_query.delay_time = 0;
-	br->ip6_querier.port = NULL;
 #endif
 	br->has_ipv6_addr = 1;
 
 	spin_lock_init(&br->multicast_lock);
 	timer_setup(&br->multicast_router_timer,
 		    br_multicast_local_router_expired, 0);
-	timer_setup(&br->ip4_other_query.timer,
-		    br_ip4_multicast_querier_expired, 0);
-	timer_setup(&br->ip4_own_query.timer,
-		    br_ip4_multicast_query_expired, 0);
-#if IS_ENABLED(CONFIG_IPV6)
-	timer_setup(&br->ip6_other_query.timer,
-		    br_ip6_multicast_querier_expired, 0);
-	timer_setup(&br->ip6_own_query.timer,
-		    br_ip6_multicast_query_expired, 0);
-#endif
+
+	vg = br_vlan_group(br);
+	if (!vg || !vg->num_vlans)
+		return;
+
+	list_for_each_entry(v, &vg->vlan_list, vlist)
+		__br_multicast_vlan_init(v);
 }
 
 static void __br_multicast_open(struct net_bridge *br,
@@ -2023,21 +2112,41 @@ static void __br_multicast_open(struct net_bridge *br,
 
 void br_multicast_open(struct net_bridge *br)
 {
-	__br_multicast_open(br, &br->ip4_own_query);
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+
+	vg = br_vlan_group(br);
+	if (!vg || !vg->num_vlans)
+		return;
+
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		__br_multicast_vlan_init(v);
+		__br_multicast_open(br, &v->ip4_own_query);
 #if IS_ENABLED(CONFIG_IPV6)
-	__br_multicast_open(br, &br->ip6_own_query);
+		__br_multicast_open(br, &v->ip6_own_query);
 #endif
+	}
 }
 
 void br_multicast_stop(struct net_bridge *br)
 {
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+
 	del_timer_sync(&br->multicast_router_timer);
-	del_timer_sync(&br->ip4_other_query.timer);
-	del_timer_sync(&br->ip4_own_query.timer);
+
+	vg = br_vlan_group(br);
+	if (!vg || !vg->num_vlans)
+		return;
+
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		del_timer_sync(&v->ip4_other_query.timer);
+		del_timer_sync(&v->ip4_own_query.timer);
 #if IS_ENABLED(CONFIG_IPV6)
-	del_timer_sync(&br->ip6_other_query.timer);
-	del_timer_sync(&br->ip6_own_query.timer);
+		del_timer_sync(&v->ip6_other_query.timer);
+		del_timer_sync(&v->ip6_own_query.timer);
 #endif
+	}
 }
 
 void br_multicast_dev_del(struct net_bridge *br)
@@ -2162,25 +2271,37 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
 	return err;
 }
 
-static void br_multicast_start_querier(struct net_bridge *br,
+/* Must be called with multicast_lock */
+static void br_multicast_init_querier(struct net_bridge_vlan *vlan,
+				      struct bridge_mcast_own_query *query,
+				      unsigned long max_delay)
+{
+	struct bridge_mcast_other_query *other_query = NULL;
+
+	if (query == &vlan->ip4_own_query)
+		other_query = &vlan->ip4_other_query;
+	else
+		other_query = &vlan->ip6_other_query;
+
+	if (!timer_pending(&other_query->timer))
+		other_query->delay_time = jiffies + max_delay;
+
+	br_multicast_start_querier(vlan, query);
+}
+
+static void br_multicast_start_querier(struct net_bridge_vlan *vlan,
 				       struct bridge_mcast_own_query *query)
 {
-	struct net_bridge_port *port;
+	struct net_bridge *br = vlan->br;
 
 	__br_multicast_open(br, query);
 
-	list_for_each_entry(port, &br->port_list, list) {
-		if (port->state == BR_STATE_DISABLED ||
-		    port->state == BR_STATE_BLOCKING)
-			continue;
-
-		if (query == &br->ip4_own_query)
-			br_multicast_enable(&port->ip4_own_query);
+	if (query == &vlan->ip4_own_query)
+		br_multicast_enable(&vlan->ip4_own_query);
 #if IS_ENABLED(CONFIG_IPV6)
-		else
-			br_multicast_enable(&port->ip6_own_query);
+	else
+		br_multicast_enable(&vlan->ip6_own_query);
 #endif
-	}
 }
 
 int br_multicast_toggle(struct net_bridge *br, unsigned long val)
@@ -2248,6 +2369,8 @@ EXPORT_SYMBOL_GPL(br_multicast_router);
 
 int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
 {
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
 	unsigned long max_delay;
 
 	val = !!val;
@@ -2260,19 +2383,18 @@ int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
 	if (!val)
 		goto unlock;
 
-	max_delay = br->multicast_query_response_interval;
-
-	if (!timer_pending(&br->ip4_other_query.timer))
-		br->ip4_other_query.delay_time = jiffies + max_delay;
+	vg = br_vlan_group(br);
+	if (!vg || !vg->num_vlans)
+		goto unlock;
 
-	br_multicast_start_querier(br, &br->ip4_own_query);
+	max_delay = br->multicast_query_response_interval;
 
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		br_multicast_init_querier(v, &v->ip4_own_query, max_delay);
 #if IS_ENABLED(CONFIG_IPV6)
-	if (!timer_pending(&br->ip6_other_query.timer))
-		br->ip6_other_query.delay_time = jiffies + max_delay;
-
-	br_multicast_start_querier(br, &br->ip6_own_query);
+		br_multicast_init_querier(v, &v->ip6_own_query, max_delay);
 #endif
+	}
 
 unlock:
 	spin_unlock_bh(&br->multicast_lock);
@@ -2425,6 +2547,7 @@ EXPORT_SYMBOL_GPL(br_multicast_list_adjacent);
  */
 bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 {
+	struct net_bridge_vlan_group *vg;
 	struct net_bridge *br;
 	struct net_bridge_port *port;
 	struct ethhdr eth;
@@ -2438,12 +2561,16 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 	if (!port || !port->br)
 		goto unlock;
 
+	vg = nbp_vlan_group_rcu(port);
+	if (!vg)
+		goto unlock;
+
 	br = port->br;
 
 	memset(&eth, 0, sizeof(eth));
 	eth.h_proto = htons(proto);
 
-	ret = br_multicast_querier_exists(br, &eth);
+	ret = br_multicast_querier_exists(br, br_get_pvid(vg), &eth);
 
 unlock:
 	rcu_read_unlock();
@@ -2462,7 +2589,8 @@ EXPORT_SYMBOL_GPL(br_multicast_has_querier_anywhere);
  */
 bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto)
 {
-	struct net_bridge *br;
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
 	struct net_bridge_port *port;
 	bool ret = false;
 
@@ -2474,18 +2602,24 @@ bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto)
 	if (!port || !port->br)
 		goto unlock;
 
-	br = port->br;
+	vg = nbp_vlan_group_rcu(port);
+	if (!vg)
+		goto unlock;
+
+	v = br_vlan_find(br_vlan_group(port->br), br_get_pvid(vg));
+	if (!v)
+		goto unlock;
 
 	switch (proto) {
 	case ETH_P_IP:
-		if (!timer_pending(&br->ip4_other_query.timer) ||
-		    rcu_dereference(br->ip4_querier.port) == port)
+		if (!timer_pending(&v->ip4_other_query.timer) ||
+		    rcu_dereference(v->ip4_querier.port) == port)
 			goto unlock;
 		break;
 #if IS_ENABLED(CONFIG_IPV6)
 	case ETH_P_IPV6:
-		if (!timer_pending(&br->ip6_other_query.timer) ||
-		    rcu_dereference(br->ip6_querier.port) == port)
+		if (!timer_pending(&v->ip6_other_query.timer) ||
+		    rcu_dereference(v->ip6_querier.port) == port)
 			goto unlock;
 		break;
 #endif
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 6e31be61d2c6..00dac1bbfaba 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -140,6 +140,17 @@ struct net_bridge_vlan {
 		struct net_bridge_vlan	*brvlan;
 	};
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	struct bridge_mcast_other_query	ip4_other_query;
+	struct bridge_mcast_own_query	ip4_own_query;
+	struct bridge_mcast_querier	ip4_querier;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct bridge_mcast_other_query	ip6_other_query;
+	struct bridge_mcast_own_query	ip6_own_query;
+	struct bridge_mcast_querier	ip6_querier;
+#endif
+#endif
+
 	struct br_tunnel_info		tinfo;
 
 	struct list_head		vlist;
@@ -261,10 +272,6 @@ struct net_bridge_port {
 	struct rcu_head			rcu;
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
-	struct bridge_mcast_own_query	ip4_own_query;
-#if IS_ENABLED(CONFIG_IPV6)
-	struct bridge_mcast_own_query	ip6_own_query;
-#endif /* IS_ENABLED(CONFIG_IPV6) */
 	unsigned char			multicast_router;
 	struct bridge_mcast_stats	__percpu *mcast_stats;
 	struct timer_list		multicast_router_timer;
@@ -390,14 +397,8 @@ struct net_bridge {
 	struct hlist_head		router_list;
 
 	struct timer_list		multicast_router_timer;
-	struct bridge_mcast_other_query	ip4_other_query;
-	struct bridge_mcast_own_query	ip4_own_query;
-	struct bridge_mcast_querier	ip4_querier;
 	struct bridge_mcast_stats	__percpu *mcast_stats;
 #if IS_ENABLED(CONFIG_IPV6)
-	struct bridge_mcast_other_query	ip6_other_query;
-	struct bridge_mcast_own_query	ip6_own_query;
-	struct bridge_mcast_querier	ip6_querier;
 	u8				multicast_mld_version;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 #endif
@@ -618,6 +619,7 @@ int br_multicast_add_port(struct net_bridge_port *port);
 void br_multicast_del_port(struct net_bridge_port *port);
 void br_multicast_enable_port(struct net_bridge_port *port);
 void br_multicast_disable_port(struct net_bridge_port *port);
+void br_multicast_enable_vlan(struct net_bridge *br, u16 vid);
 void br_multicast_init(struct net_bridge *br);
 void br_multicast_open(struct net_bridge *br);
 void br_multicast_stop(struct net_bridge *br);
@@ -633,6 +635,7 @@ int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val);
 #if IS_ENABLED(CONFIG_IPV6)
 int br_multicast_set_mld_version(struct net_bridge *br, unsigned long val);
 #endif
+__be32 br_multicast_inet_addr(struct net_bridge *br, u16 vid);
 struct net_bridge_mdb_entry *
 br_mdb_ip_get(struct net_bridge_mdb_htable *mdb, struct br_ip *dst);
 struct net_bridge_mdb_entry *
@@ -687,17 +690,27 @@ __br_multicast_querier_exists(struct net_bridge *br,
 	       (own_querier_enabled || timer_pending(&querier->timer));
 }
 
+static struct net_bridge_vlan_group *br_vlan_group(const struct net_bridge *br);
+struct net_bridge_vlan *br_vlan_find(struct net_bridge_vlan_group *vg, u16 vid);
+
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
+					       u16 vid,
 					       struct ethhdr *eth)
 {
+	struct net_bridge_vlan *v;
+
+	v = br_vlan_find(br_vlan_group(br), vid);
+	if (!v)
+		return false;
+
 	switch (eth->h_proto) {
 	case (htons(ETH_P_IP)):
 		return __br_multicast_querier_exists(br,
-			&br->ip4_other_query, false);
+			&v->ip4_other_query, false);
 #if IS_ENABLED(CONFIG_IPV6)
 	case (htons(ETH_P_IPV6)):
 		return __br_multicast_querier_exists(br,
-			&br->ip6_other_query, true);
+			&v->ip6_other_query, true);
 #endif
 	default:
 		return false;
@@ -768,6 +781,7 @@ static inline bool br_multicast_is_router(struct net_bridge *br)
 }
 
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
+					       u16 vid,
 					       struct ethhdr *eth)
 {
 	return false;
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index a1ba52d247d8..d1d6c4fb39dd 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -460,10 +460,7 @@ void br_port_state_selection(struct net_bridge *br)
 
 		if (p->state != BR_STATE_BLOCKING)
 			br_multicast_enable_port(p);
-		/* Multicast is not disabled for the port when it goes in
-		 * blocking state because the timers will expire and stop by
-		 * themselves without sending more queries.
-		 */
+
 		if (p->state == BR_STATE_FORWARDING)
 			++liveports;
 	}
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index bb9cbad4bad6..3b8fb28e9ab4 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -270,6 +270,9 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
 			goto out_filt;
 		}
 		vg->num_vlans++;
+
+		/* Start per VLAN IGMP/MLD querier timers */
+		br_multicast_enable_vlan(br, v->vid);
 	}
 
 	err = rhashtable_lookup_insert_fast(&vg->vlan_hash, &v->vnode,
-- 
2.17.0

^ permalink raw reply related

* [RFC net-next PATCH 0/2] bpf: followup avoid leaking info stored in frame data on page reuse
From: Jesper Dangaard Brouer @ 2018-04-18 12:10 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, Jesper Dangaard Brouer

This is a followup to fix commit:
 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")

Posting as RFC, as I want Daniel to review this before it goes in, as
Daniel usually have smarter/brighter ideas of howto solve this in a
more optimal manor?

---

Jesper Dangaard Brouer (2):
      bpf: avoid clear xdp_frame area again
      bpf: disallow XDP data_meta to overlap with xdp_frame area


 net/core/filter.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

^ permalink raw reply

* [RFC net-next PATCH 1/2] bpf: avoid clear xdp_frame area again
From: Jesper Dangaard Brouer @ 2018-04-18 12:10 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, Jesper Dangaard Brouer
In-Reply-To: <152405338404.30730.9846848505925123326.stgit@firesoul>

Avoid clearing xdp_frame area if this was already done by prevous
invocations of bpf_xdp_adjust_head.

The xdp_adjust_head helper can be called multiple times by the
bpf_prog.  If increasing the packet header size (with a negative
offset), kernel must assume bpf_prog store valuable information here,
and not clear this information.

In case of extending header into xdp_frame area the kernel clear this
area to avoid any info leaking.

The bug in the current implementation is that if existing xdp->data
pointer have already been moved into xdp_frame area, then memory is
cleared between new-data pointer and xdp_frame-end, which covers an
area that might contain information store by BPF-prog (as curr
xdp->data lays between those pointers).

Fixes: 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index a374b8560bc4..15e9b5477360 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2705,6 +2705,13 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
 	if (data < xdp_frame_end) {
 		unsigned long clearlen = xdp_frame_end - data;

+		/* Handle if prev call adjusted xdp->data into xdp_frame area */
+		if (unlikely(xdp->data < xdp_frame_end)) {
+			if (data < xdp->data)
+				clearlen = xdp->data - data;
+			else
+				clearlen = 0;
+		}
 		memset(data, 0, clearlen);
 	}

^ permalink raw reply related

* [RFC net-next PATCH 2/2] bpf: disallow XDP data_meta to overlap with xdp_frame area
From: Jesper Dangaard Brouer @ 2018-04-18 12:10 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, Jesper Dangaard Brouer
In-Reply-To: <152405338404.30730.9846848505925123326.stgit@firesoul>

If combining xdp_adjust_head and xdp_adjust_meta, then it is possible
to make data_meta overlap with area used by xdp_frame.  And another
invocation of xdp_adjust_head can then clear that area, due to
clearing of xdp_frame area.

The easiest solution I found was to simply not allow
xdp_buff->data_meta to overlap with area used by xdp_frame.

Fixes: 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 15e9b5477360..e3623e741181 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2701,6 +2701,11 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
 		     data > xdp->data_end - ETH_HLEN))
 		return -EINVAL;
 
+	/* Disallow data_meta to use xdp_frame area */
+	if (metalen > 0 &&
+	    unlikely((data - metalen) < xdp_frame_end))
+		return -EINVAL;
+
 	/* Avoid info leak, when reusing area prev used by xdp_frame */
 	if (data < xdp_frame_end) {
 		unsigned long clearlen = xdp_frame_end - data;
@@ -2734,6 +2739,7 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 {
+	void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame);
 	void *meta = xdp->data_meta + offset;
 	unsigned long metalen = xdp->data - meta;
 
@@ -2742,6 +2748,11 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(meta < xdp->data_hard_start ||
 		     meta > xdp->data))
 		return -EINVAL;
+
+	/* Disallow data_meta to use xdp_frame area */
+	if (unlikely(meta < xdp_frame_end))
+		return -EINVAL;
+
 	if (unlikely((metalen & (sizeof(__u32) - 1)) ||
 		     (metalen > 32)))
 		return -EACCES;

^ permalink raw reply related

* Re: [PATCH v3 1/9] net: phy: new Asix Electronics PHY driver
From: Andrew Lunn @ 2018-04-18 12:13 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: netdev, fthain, geert, f.fainelli, linux-m68k, Michael.Karcher
In-Reply-To: <1524025616-3722-2-git-send-email-schmitzmic@gmail.com>

> +
> +/**
> + * asix_soft_reset - software reset the PHY via BMCR_RESET bit
> + * @phydev: target phy_device struct
> + *
> + * Description: Perform a software PHY reset using the standard
> + * BMCR_RESET bit and poll for the reset bit to be cleared.
> + * Toggle BMCR_RESET bit off to accomodate broken PHY implementations
> + * such as used on the Individual Computers' X-Surf 100 Zorro card.
> + *
> + * Returns: 0 on success, < 0 on failure
> + */
> +static int asix_soft_reset(struct phy_device *phydev)
> +{
> +	int ret;
> +
> +	/* Asix PHY won't reset unless reset bit toggles */
> +	ret = phy_write(phydev, MII_BMCR, 0);
> +	if (ret < 0)
> +		return ret;
> +
> +	phy_write(phydev, MII_BMCR, BMCR_RESET);
> +
> +	return phy_poll_reset(phydev);
> +}

Why not simply:

static int asix_soft_reset(struct phy_device *phydev)
{
	int ret;

	/* Asix PHY won't reset unless reset bit toggles */
	ret = phy_write(phydev, MII_BMCR, 0);
	if (ret < 0)
		return ret;

	return genphy_soft_reset(phydev);
}

	Andrew

^ permalink raw reply

* Re: [PATCH v3 00/10] New network driver for Amiga X-Surf 100 (m68k)
From: Andrew Lunn @ 2018-04-18 12:19 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: netdev, Finn Thain, Geert Uytterhoeven, Florian Fainelli,
	Linux/m68k, Michael Karcher
In-Reply-To: <CAOmrzk+zUTmSzXWU9WoXYauBx2Z4qkAh+Y4d49faA8Tu5RRQnQ@mail.gmail.com>

On Wed, Apr 18, 2018 at 05:10:45PM +1200, Michael Schmitz wrote:
> All,
> 
> just noticed belatedly that the Makefile hunk of patch 9 does no
> longer apply cleanly in 4.17-rc1, sorry. My series was based on 4.16.
> I'll resend that one, OK?

Hi Michael

You should be based on DaveM net-next tree:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

Please also have "net-next" in the patch subject. See
Documentation/networking/netdev-FAQ.txt

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: phy: Enable C45 PHYs with vendor specific address space
From: Andrew Lunn @ 2018-04-18 12:27 UTC (permalink / raw)
  To: Vicenţiu Galanopulo
  Cc: Florian Fainelli, robh@kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, mark.rutland@arm.com,
	davem@davemloft.net, marcel@holtmann.org,
	devicetree@vger.kernel.org, Alexandru Marginean,
	Madalin-cristian Bucur
In-Reply-To: <AM0PR04MB411620E8DB3E7EF8E55C99B5EEB60@AM0PR04MB4116.eurprd04.prod.outlook.com>

On Wed, Apr 18, 2018 at 09:38:47AM +0000, Vicenţiu Galanopulo wrote:
> 
> 
> > > Having dev-addr stored in devices_addrs, in get_phy_c45_ids(), when
> > > probing the identifiers, dev-addr can be extracted from devices_addrs
> > > and probed if devices_addrs[current_identifier] is not 0.
> > 
> > I must clearly be missing something, but why are you introducing all these
> > conditionals instead of updating the existing code to be able to operate against
> > an arbitrary dev-addr value, and then just making sure the first thing you do is
> > fetch that property from Device Tree? There is no way someone is going to be
> > testing with your specific use case in the future (except yourselves) so unless you
> > make supporting an arbitrary "dev-addr" value become part of how the code
> > works, this is going to be breaking badly.
> >
> 
> Hi Florian,
> 
> My intention was to have this patch as "plugin" and modify the existing kernel API little to none.

Hi Vicenţiu

In Linux, kernel APIs are not sacred. If you need to change them, do
so.

We want a clear, well integrated solution, with minimal
duplication.

	Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Sowmini Varadhan @ 2018-04-18 12:31 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Samudrala, Sridhar, Network Development, Willem de Bruijn
In-Reply-To: <CAF=yD-KhNrcZBQizK+RtFq4Lx-ExntdLR69qz_2beRo8d7XOTA@mail.gmail.com>

I went through the patch set and the code looks fine- it extends existing
infra for TCP/GSO to UDP.

One thing that was not clear to me about the API: shouldn't UDP_SEGMENT
just be automatically determined in the stack from the pmtu? Whats
the motivation for the socket option for this? also AIUI this can be
either a per-socket or a per-packet option?

However, I share Sridhar's concerns about the very fundamental change
to UDP message boundary semantics here.  There is actually no such thing
as a "segment" in udp, so in general this feature makes me a little
uneasy.  Well behaved udp applications should already be sending mtu
sized datagrams. And the not-so-well-behaved ones are probably relying
on IP fragmentation/reassembly to take care of datagram boundary semantics
for them?

As Sridhar points out, the feature is not really "negotiated" - one side
unilaterally sets the option. If the receiver is a classic/POSIX UDP
implementation, it will have no way of knowing that message boundaries
have been re-adjusted at the sender.  

One thought to recover from this: use the infra being proposed in
  https://tools.ietf.org/html/draft-touch-tsvwg-udp-options-09
to include a new UDP TLV option that tracks datagram# (similar to IP ID)
to help the receiver reassemble the UDP datagram and pass it up with
the POSIX-conformant UDP message boundary. I realize that this is also
not a perfect solution: as you point out, there are risks from
packet re-ordering/drops- you may well end up just reinventing IP
frag/re-assembly when you are done (with just the slight improvement
that each "fragment" has a full UDP header, so it has a better shot
at ECMP and RSS).

--Sowmini

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-04-18 12:31 UTC (permalink / raw)
  To: Dave Young
  Cc: Indranil Choudhury,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Nirranjan Kirubaharan,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	Ganesh GR, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org
In-Reply-To: <20180418061546.GA4551-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>

On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> Hi Rahul,
> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> > 
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device in the crash recovery kernel. In crash
> > recovery kernel, the collected logs are added as elf notes to
> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> > 
> > The sequence of actions done by device drivers to append their device
> > specific hardware/firmware logs to /proc/vmcore are as follows:
> > 
> > 1. During probe (before hardware is initialized), device drivers
> > register to the vmcore module (via vmcore_add_device_dump()), with
> > callback function, along with buffer size and log name needed for
> > firmware/hardware log collection.
> 
> I assumed the elf notes info should be prepared while kexec_[file_]load
> phase. But I did not read the old comment, not sure if it has been discussed
> or not.
> 

We must not collect dumps in crashing kernel. Adding more things in
crash dump path risks not collecting vmcore at all. Eric had
discussed this in more detail at:

https://lkml.org/lkml/2018/3/24/319

We are safe to collect dumps in the second kernel. Each device dump
will be exported as an elf note in /proc/vmcore.

> If do this in 2nd kernel a question is driver can be loaded later than vmcore init.

Yes, drivers will add their device dumps after vmcore init.

> How to guarantee the function works if vmcore reading happens before
> the driver is loaded?
> 
> Also it is possible that kdump initramfs does not contains the driver
> module.
> 
> Am I missing something?
> 

Yes, driver must be in initramfs if it wants to collect and add device
dump to /proc/vmcore in second kernel.

> > 
> > 2. vmcore module allocates the buffer with requested size. It adds
> > an elf note and invokes the device driver's registered callback
> > function.
> > 
> > 3. Device driver collects all hardware/firmware logs into the buffer
> > and returns control back to vmcore module.
> > 
> > The device specific hardware/firmware logs can be seen as elf notes:
> > 
> > # readelf -n /proc/vmcore
> > 
> > Displaying notes found at file offset 0x00001000 with length 0x04003288:
> >   Owner                 Data size	Description
> >   VMCOREDD_cxgb4_0000:02:00.4 0x02000fd8	Unknown note type: (0x00000700)
> >   VMCOREDD_cxgb4_0000:04:00.4 0x02000fd8	Unknown note type: (0x00000700)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
> >   VMCOREINFO           0x0000074f	Unknown note type: (0x00000000)
> > 
> > Patch 1 adds API to vmcore module to allow drivers to register callback
> > to collect the device specific hardware/firmware logs.  The logs will
> > be added to /proc/vmcore as elf notes.
> > 
> > Patch 2 updates read and mmap logic to append device specific hardware/
> > firmware logs as elf notes.
> > 
> > Patch 3 shows a cxgb4 driver example using the API to collect
> > hardware/firmware logs in crash recovery kernel, before hardware is
> > initialized.
> > 
> > Thanks,
> > Rahul
> > 
> > RFC v1: https://lkml.org/lkml/2018/3/2/542
> > RFC v2: https://lkml.org/lkml/2018/3/16/326
> > 
[...]

Thanks,
Rahul

^ permalink raw reply

* Re: [PATCH net 5/5] nfp: remove false positive offloads in flower vxlan
From: John Hurley @ 2018-04-18 12:31 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Jakub Kicinski, Linux Netdev List, oss-drivers, Simon Horman
In-Reply-To: <CAJ3xEMgmUFS5vEzy-sRZ7XzFsHVSaKKLDuBoba0WZvauP9ZmLw@mail.gmail.com>

On Wed, Apr 18, 2018 at 8:43 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Fri, Nov 17, 2017 at 4:06 AM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: John Hurley <john.hurley@netronome.com>
>>
>> Pass information to the match offload on whether or not the repr is the
>> ingress or egress dev. Only accept tunnel matches if repr is the egress dev.
>>
>> This means rules such as the following are successfully offloaded:
>> tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0
>>
>> While rules such as the following are rejected:
>> tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0
>
> cool
>
>
>> Also reject non tunnel flows that are offloaded to an egress dev.
>> Non tunnel matches assume that the offload dev is the ingress port and
>> offload a match accordingly.
>
> not following on the "Also" here, see below
>
>
>> diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> index a0193e0c24a0..f5d73b83dcc2 100644
>> --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> @@ -131,7 +131,8 @@ static bool nfp_flower_check_higher_than_mac(struct tc_cls_flower_offload *f)
>>
>>  static int
>>  nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>> -                               struct tc_cls_flower_offload *flow)
>> +                               struct tc_cls_flower_offload *flow,
>> +                               bool egress)
>>  {
>>         struct flow_dissector_key_basic *mask_basic = NULL;
>>         struct flow_dissector_key_basic *key_basic = NULL;
>> @@ -167,6 +168,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>                         skb_flow_dissector_target(flow->dissector,
>>                                                   FLOW_DISSECTOR_KEY_ENC_CONTROL,
>>                                                   flow->key);
>> +               if (!egress)
>> +                       return -EOPNOTSUPP;
>> +
>>                 if (mask_enc_ctl->addr_type != 0xffff ||
>>                     enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
>>                         return -EOPNOTSUPP;
>> @@ -194,6 +198,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>
>>                 key_layer |= NFP_FLOWER_LAYER_VXLAN;
>>                 key_size += sizeof(struct nfp_flower_vxlan);
>> +       } else if (egress) {
>> +               /* Reject non tunnel matches offloaded to egress repr. */
>> +               return -EOPNOTSUPP;
>>         }
>
> with these two hunks we get: egress <- IFF -> encap match, right?
>
> (1) we can't offload the egress way if there isn't matching on encap headers
> (2) we can't go the matching on encap headers way if we are not egress
>

yes, this is correct.
With the block code and egdev offload, we do not have access to the
ingress netdev when doing an offload.
We need to use the encap headers (especially the enc_port) to
distinguish the type of tunnel used and, therefore, require that the
encap matches be present before offloading.

> what other cases are rejected by this logic?
>

Yes, some other cases may be rejected (like veth mentioned below).
However, this is better than allowing rules to be incorrectly
offloaded (as could have happened before these changes).
Currently, we are looking at offloading flows on other ingress devices
such as bonds so this will require a change to the driver code here.
IMO, the cleanest solution will also require tc core changes to either
avoid egdev offload or to have access to the ingress netdev of a rule.

> e.g If we add a rule with SW device (veth. tap) being the ingress, and
> HW device (vf rep)
> being the egress while not using skip_sw (just no flags == both) we
> get the TC stack
> go along the egdev callback from the vf rep hw device and add an
> (uplink --> vf rep) rule
> which will not be rejected if there is matching on tunnel headers, it
> will also not be rejected
> by some driver logic as the one we discussed to identify and ignore
> rules that are attempted to being added twice.
>
> Or.

^ permalink raw reply

* Re: [RFC PATCH] net: bridge: multicast querier per VLAN support
From: Nikolay Aleksandrov @ 2018-04-18 12:31 UTC (permalink / raw)
  To: Joachim Nilsson, netdev; +Cc: Stephen Hemminger, roopa
In-Reply-To: <20180418120713.GA10742@troglobit>

On 18/04/18 15:07, Joachim Nilsson wrote:
> This RFC patch¹ is an attempt to add multicast querier per VLAN support
> to a VLAN aware bridge.  I'm posting it as RFC for now since non-VLAN
> aware bridges are not handled, and one of my questions is if that is
> complexity we need to continue supporting?
> 
>  From what I understand, multicast join/report already support per VLAN
> operation, and the MDB as well support filtering per VLAN, but queries
> are currently limited to per-port operation on VLAN-aware bridges.
> 
> The naive² approach of this patch relocates query timers from the bridge
> to operate per VLAN, on timer expiry we send queries to all bridge ports
> in the same VLAN.  Tagged port members have tagged VLAN queries.
> 
> Unlike the original patch¹, which uses a sysfs entry to set the querier
> address of each VLAN, this use the IP address of the VLAN interface when
> initiating a per VLAN query.  A version of inet_select_addr() is used
> for this, called inet_select_dev_addr(), not included in this patch.
> 
> Open questions/TODO:
> 
> - First of all, is this patch useful to anyone

Obviously to us as it's based on our patch. :-)
We actually recently discussed what will be needed to make it acceptable to upstream.

> - The current br_multicast.c is very complex.  The support for both IPv4
>    and IPv6 is a no-brainer, but it also has #ifdef VLAN_FILTERING and
>    'br->vlan_enabled' ... this has likely been discussed before, but if
>    we could remove those code paths I believe what's left would be quite
>    a bit easier to read and maintain.

br->vlan_enabled has a wrapper that can be used without ifdefs, as does br_vlan_find()
so in short - you can remove the ifdefs and use the wrappers,  they'll degrade to always
false/null when vlans are disabled.

> - Many per-bridge specific multicast sysfs settings may need to have a
>    corresponding per-VLAN setting, e.g. snooping, query_interval, etc.
>    How should we go about that? (For status reporting I have a proposal)

We'll have to add more to the per-vlan context, but yes it has to happen.
It will be only netlink interface for config/retrieval, no sysfs.

> - Dito per-port specific multicast sysfs settings, e.g. multicast_router

I'm not sure I follow this one, there is per-port mcast router config now ?
Take a look at br_multicast_set_port_router().

> - The MLD support has been kept in sync with the rest but is completely
>    untested.  In particular I suspect the wrong source IP will be used.
> 
> ¹) Initially based on a patch by Cumulus Networks
>     http://repo3.cumulusnetworks.com/repo/pool/cumulus/l/linux/linux-source-4.1_4.1.33-1+cl3u11_all.deb

I knew this looked familiar when I glanced through it :)

> ²) This patch is currently limited to work only on bridges with VLAN
>     enabled.  Care has been taken to support MLD snooping, but it is
>     completely untested.
> 
> Thank you for reading this far!
> 
> Signed-off-by: Joachim Nilsson <troglobit@gmail.com>

Thanks for the effort, I see that you have done some of the required cleanups
for this to be upstreamable, but as you've noted above we need to make it
complete (with the per-vlan contexts and all).

I will review this patch in detail later and come back if there's anything.

Cheers,
  Nik

> ---
>   net/bridge/br_device.c    |   2 +-
>   net/bridge/br_input.c     |   2 +-
>   net/bridge/br_multicast.c | 456 ++++++++++++++++++++++++--------------
>   net/bridge/br_private.h   |  38 +++-
>   net/bridge/br_stp.c       |   5 +-
>   net/bridge/br_vlan.c      |   3 +
>   6 files changed, 327 insertions(+), 179 deletions(-)
> 
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 02f9f8aab047..ba35485032d8 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -98,7 +98,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>   
>   		mdst = br_mdb_get(br, skb, vid);
>   		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> -		    br_multicast_querier_exists(br, eth_hdr(skb)))
> +		    br_multicast_querier_exists(br, vid, eth_hdr(skb)))
>   			br_multicast_flood(mdst, skb, false, true);
>   		else
>   			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 56bb9189c374..13d48489e0e1 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -137,7 +137,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>   		mdst = br_mdb_get(br, skb, vid);
>   		if ((mdst && mdst->addr.proto == htons(ETH_P_ALL)) ||
>   		    ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> -		     br_multicast_querier_exists(br, eth_hdr(skb)))) {
> +		     br_multicast_querier_exists(br, vid, eth_hdr(skb)))) {
>   			if ((mdst && mdst->host_joined) ||
>   			    br_multicast_is_router(br)) {
>   				local_rcv = true;
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 277ecd077dc4..72e47d500972 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -13,6 +13,7 @@
>   #include <linux/err.h>
>   #include <linux/export.h>
>   #include <linux/if_ether.h>
> +#include <linux/if_vlan.h>
>   #include <linux/igmp.h>
>   #include <linux/jhash.h>
>   #include <linux/kernel.h>
> @@ -37,7 +38,7 @@
>   
>   #include "br_private.h"
>   
> -static void br_multicast_start_querier(struct net_bridge *br,
> +static void br_multicast_start_querier(struct net_bridge_vlan *vlan,
>   				       struct bridge_mcast_own_query *query);
>   static void br_multicast_add_router(struct net_bridge *br,
>   				    struct net_bridge_port *port);
> @@ -46,13 +47,14 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
>   					 __be32 group,
>   					 __u16 vid,
>   					 const unsigned char *src);
> -
> +static void br_ip4_multicast_query_expired(struct timer_list *t);
>   static void __del_port_router(struct net_bridge_port *p);
>   #if IS_ENABLED(CONFIG_IPV6)
>   static void br_ip6_multicast_leave_group(struct net_bridge *br,
>   					 struct net_bridge_port *port,
>   					 const struct in6_addr *group,
>   					 __u16 vid, const unsigned char *src);
> +static void br_ip6_multicast_query_expired(struct timer_list *t);
>   #endif
>   unsigned int br_mdb_rehash_seq;
>   
> @@ -381,8 +383,30 @@ static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
>   	return 0;
>   }
>   
> +__be32 br_multicast_inet_addr(struct net_bridge *br, u16 vid)
> +{
> +	struct net_device *dev;
> +
> +	if (!br->multicast_query_use_ifaddr)
> +		return 0;
> +
> +	if (!vid)
> +		return inet_select_addr(br->dev, 0, RT_SCOPE_LINK);
> +
> +	rcu_read_lock();
> +	dev = __vlan_find_dev_deep_rcu(br->dev, htons(ETH_P_8021Q), vid);
> +	rcu_read_unlock();
> +
> +	if (!dev)
> +		return 0;
> +
> +	return inet_select_dev_addr(dev, 0, RT_SCOPE_LINK);
> +}
> +
>   static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   						    __be32 group,
> +						    __u16 vid,
> +						    bool tagged,
>   						    u8 *igmp_type)
>   {
>   	struct igmpv3_query *ihv3;
> @@ -391,12 +415,17 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   	struct igmphdr *ih;
>   	struct ethhdr *eth;
>   	struct iphdr *iph;
> +	int vh_size = 0;
> +
> +	/* if vid is non-zero, insert the 1Q header also */
> +	if (vid && tagged)
> +		vh_size = sizeof(struct vlan_hdr);
>   
>   	igmp_hdr_size = sizeof(*ih);
>   	if (br->multicast_igmp_version == 3)
>   		igmp_hdr_size = sizeof(*ihv3);
>   	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
> -						 igmp_hdr_size + 4);
> +						 vh_size + igmp_hdr_size + 4);
>   	if (!skb)
>   		goto out;
>   
> @@ -415,6 +444,15 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   	eth->h_proto = htons(ETH_P_IP);
>   	skb_put(skb, sizeof(*eth));
>   
> +	if (vid && tagged) {
> +		skb = vlan_insert_tag_set_proto(skb, htons(ETH_P_8021Q), vid);
> +		if (!skb) {
> +			kfree_skb(skb);
> +			br_err(br, "Failed adding VLAN tag to IGMP query, vid:%d\n", vid);
> +			return NULL;
> +		}
> +	}
> +
>   	skb_set_network_header(skb, skb->len);
>   	iph = ip_hdr(skb);
>   
> @@ -426,8 +464,7 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   	iph->frag_off = htons(IP_DF);
>   	iph->ttl = 1;
>   	iph->protocol = IPPROTO_IGMP;
> -	iph->saddr = br->multicast_query_use_ifaddr ?
> -		     inet_select_addr(br->dev, 0, RT_SCOPE_LINK) : 0;
> +	iph->saddr = br_multicast_inet_addr(br, vid);
>   	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
>   	((u8 *)&iph[1])[0] = IPOPT_RA;
>   	((u8 *)&iph[1])[1] = 4;
> @@ -477,6 +514,8 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   #if IS_ENABLED(CONFIG_IPV6)
>   static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
>   						    const struct in6_addr *grp,
> +						    __u16 vid,
> +						    bool tagged,
>   						    u8 *igmp_type)
>   {
>   	struct mld2_query *mld2q;
> @@ -486,13 +525,18 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
>   	size_t mld_hdr_size;
>   	struct sk_buff *skb;
>   	struct ethhdr *eth;
> +	int vh_size = 0;
>   	u8 *hopopt;
>   
> +	/* if vid is non-zero, insert the 1Q header also */
> +	if (vid && tagged)
> +		vh_size = sizeof(struct vlan_hdr);
> +
>   	mld_hdr_size = sizeof(*mldq);
>   	if (br->multicast_mld_version == 2)
>   		mld_hdr_size = sizeof(*mld2q);
>   	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*ip6h) +
> -						 8 + mld_hdr_size);
> +						 vh_size + 8 + mld_hdr_size);
>   	if (!skb)
>   		goto out;
>   
> @@ -506,6 +550,15 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
>   	eth->h_proto = htons(ETH_P_IPV6);
>   	skb_put(skb, sizeof(*eth));
>   
> +	if (vid && tagged) {
> +		skb = vlan_insert_tag_set_proto(skb, htons(ETH_P_8021Q), vid);
> +		if (!skb) {
> +			kfree_skb(skb);
> +			br_err(br, "Failed adding VLAN tag to MLD query, vid:%d\n", vid);
> +			return NULL;
> +		}
> +	}
> +
>   	/* IPv6 header + HbH option */
>   	skb_set_network_header(skb, skb->len);
>   	ip6h = ipv6_hdr(skb);
> @@ -590,15 +643,17 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
>   
>   static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
>   						struct br_ip *addr,
> +						bool tagged,
>   						u8 *igmp_type)
>   {
>   	switch (addr->proto) {
>   	case htons(ETH_P_IP):
> -		return br_ip4_multicast_alloc_query(br, addr->u.ip4, igmp_type);
> +		return br_ip4_multicast_alloc_query(br, addr->u.ip4, addr->vid,
> +						    tagged, igmp_type);
>   #if IS_ENABLED(CONFIG_IPV6)
>   	case htons(ETH_P_IPV6):
> -		return br_ip6_multicast_alloc_query(br, &addr->u.ip6,
> -						    igmp_type);
> +		return br_ip6_multicast_alloc_query(br, &addr->u.ip6, addr->vid,
> +						    tagged, igmp_type);
>   #endif
>   	}
>   	return NULL;
> @@ -905,14 +960,16 @@ static void br_multicast_local_router_expired(struct timer_list *t)
>   	spin_unlock(&br->multicast_lock);
>   }
>   
> -static void br_multicast_querier_expired(struct net_bridge *br,
> +static void br_multicast_querier_expired(struct net_bridge_vlan *vlan,
>   					 struct bridge_mcast_own_query *query)
>   {
> +	struct net_bridge *br = vlan->br;
> +
>   	spin_lock(&br->multicast_lock);
>   	if (!netif_running(br->dev) || br->multicast_disabled)
>   		goto out;
>   
> -	br_multicast_start_querier(br, query);
> +	br_multicast_start_querier(vlan, query);
>   
>   out:
>   	spin_unlock(&br->multicast_lock);
> @@ -920,17 +977,17 @@ static void br_multicast_querier_expired(struct net_bridge *br,
>   
>   static void br_ip4_multicast_querier_expired(struct timer_list *t)
>   {
> -	struct net_bridge *br = from_timer(br, t, ip4_other_query.timer);
> +	struct net_bridge_vlan *v = from_timer(v, t, ip4_other_query.timer);
>   
> -	br_multicast_querier_expired(br, &br->ip4_own_query);
> +	br_multicast_querier_expired(v, &v->ip4_own_query);
>   }
>   
>   #if IS_ENABLED(CONFIG_IPV6)
>   static void br_ip6_multicast_querier_expired(struct timer_list *t)
>   {
> -	struct net_bridge *br = from_timer(br, t, ip6_other_query.timer);
> +	struct net_bridge_vlan *v = from_timer(v, t, ip6_other_query.timer);
>   
> -	br_multicast_querier_expired(br, &br->ip6_own_query);
> +	br_multicast_querier_expired(v, &v->ip6_own_query);
>   }
>   #endif
>   
> @@ -938,11 +995,17 @@ static void br_multicast_select_own_querier(struct net_bridge *br,
>   					    struct br_ip *ip,
>   					    struct sk_buff *skb)
>   {
> +	struct net_bridge_vlan *v;
> +
> +	v = br_vlan_find(br_vlan_group(br), ip->vid);
> +	if (!v)
> +		return;
> +
>   	if (ip->proto == htons(ETH_P_IP))
> -		br->ip4_querier.addr.u.ip4 = ip_hdr(skb)->saddr;
> +		v->ip4_querier.addr.u.ip4 = ip_hdr(skb)->saddr;
>   #if IS_ENABLED(CONFIG_IPV6)
>   	else
> -		br->ip6_querier.addr.u.ip6 = ipv6_hdr(skb)->saddr;
> +		v->ip6_querier.addr.u.ip6 = ipv6_hdr(skb)->saddr;
>   #endif
>   }
>   
> @@ -951,9 +1014,27 @@ static void __br_multicast_send_query(struct net_bridge *br,
>   				      struct br_ip *ip)
>   {
>   	struct sk_buff *skb;
> +	bool tagged = false;
>   	u8 igmp_type;
>   
> -	skb = br_multicast_alloc_query(br, ip, &igmp_type);
> +	if (port->state == BR_STATE_DISABLED ||
> +	    port->state == BR_STATE_BLOCKING)
> +		return;
> +
> +#ifdef CONFIG_BRIDGE_VLAN_FILTERING
> +	if (port && ip->vid) {
> +		struct net_bridge_vlan *v;
> +
> +		v = br_vlan_find(nbp_vlan_group_rcu(port), ip->vid);
> +		if (!br->vlan_enabled || !v)
> +			return;
> +
> +		if (!(v->flags & BRIDGE_VLAN_INFO_UNTAGGED))
> +			tagged = true;
> +	}
> +#endif
> +
> +	skb = br_multicast_alloc_query(br, ip, tagged, &igmp_type);
>   	if (!skb)
>   		return;
>   
> @@ -972,11 +1053,12 @@ static void __br_multicast_send_query(struct net_bridge *br,
>   	}
>   }
>   
> -static void br_multicast_send_query(struct net_bridge *br,
> +static void br_multicast_send_query(struct net_bridge_vlan *vlan,
>   				    struct net_bridge_port *port,
>   				    struct bridge_mcast_own_query *own_query)
>   {
>   	struct bridge_mcast_other_query *other_query = NULL;
> +	struct net_bridge *br = vlan->br;
>   	struct br_ip br_group;
>   	unsigned long time;
>   
> @@ -985,22 +1067,27 @@ static void br_multicast_send_query(struct net_bridge *br,
>   		return;
>   
>   	memset(&br_group.u, 0, sizeof(br_group.u));
> -
> -	if (port ? (own_query == &port->ip4_own_query) :
> -		   (own_query == &br->ip4_own_query)) {
> -		other_query = &br->ip4_other_query;
> +	br_group.vid = vlan->vid;
> +	if (own_query == &vlan->ip4_own_query) {
> +		other_query = &vlan->ip4_other_query;
>   		br_group.proto = htons(ETH_P_IP);
>   #if IS_ENABLED(CONFIG_IPV6)
>   	} else {
> -		other_query = &br->ip6_other_query;
> +		other_query = &vlan->ip6_other_query;
>   		br_group.proto = htons(ETH_P_IPV6);
>   #endif
>   	}
>   
> +	if (port) {
> +		__br_multicast_send_query(br, port, &br_group);
> +		return;
> +	}
> +
>   	if (!other_query || timer_pending(&other_query->timer))
>   		return;
>   
> -	__br_multicast_send_query(br, port, &br_group);
> +	list_for_each_entry(port, &br->port_list, list)
> +		__br_multicast_send_query(br, port, &br_group);
>   
>   	time = jiffies;
>   	time += own_query->startup_sent < br->multicast_startup_query_count ?
> @@ -1009,42 +1096,6 @@ static void br_multicast_send_query(struct net_bridge *br,
>   	mod_timer(&own_query->timer, time);
>   }
>   
> -static void
> -br_multicast_port_query_expired(struct net_bridge_port *port,
> -				struct bridge_mcast_own_query *query)
> -{
> -	struct net_bridge *br = port->br;
> -
> -	spin_lock(&br->multicast_lock);
> -	if (port->state == BR_STATE_DISABLED ||
> -	    port->state == BR_STATE_BLOCKING)
> -		goto out;
> -
> -	if (query->startup_sent < br->multicast_startup_query_count)
> -		query->startup_sent++;
> -
> -	br_multicast_send_query(port->br, port, query);
> -
> -out:
> -	spin_unlock(&br->multicast_lock);
> -}
> -
> -static void br_ip4_multicast_port_query_expired(struct timer_list *t)
> -{
> -	struct net_bridge_port *port = from_timer(port, t, ip4_own_query.timer);
> -
> -	br_multicast_port_query_expired(port, &port->ip4_own_query);
> -}
> -
> -#if IS_ENABLED(CONFIG_IPV6)
> -static void br_ip6_multicast_port_query_expired(struct timer_list *t)
> -{
> -	struct net_bridge_port *port = from_timer(port, t, ip6_own_query.timer);
> -
> -	br_multicast_port_query_expired(port, &port->ip6_own_query);
> -}
> -#endif
> -
>   static void br_mc_disabled_update(struct net_device *dev, bool value)
>   {
>   	struct switchdev_attr attr = {
> @@ -1063,12 +1114,6 @@ int br_multicast_add_port(struct net_bridge_port *port)
>   
>   	timer_setup(&port->multicast_router_timer,
>   		    br_multicast_router_expired, 0);
> -	timer_setup(&port->ip4_own_query.timer,
> -		    br_ip4_multicast_port_query_expired, 0);
> -#if IS_ENABLED(CONFIG_IPV6)
> -	timer_setup(&port->ip6_own_query.timer,
> -		    br_ip6_multicast_port_query_expired, 0);
> -#endif
>   	br_mc_disabled_update(port->dev, port->br->multicast_disabled);
>   
>   	port->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats);
> @@ -1109,15 +1154,47 @@ static void __br_multicast_enable_port(struct net_bridge_port *port)
>   	if (br->multicast_disabled || !netif_running(br->dev))
>   		return;
>   
> -	br_multicast_enable(&port->ip4_own_query);
> -#if IS_ENABLED(CONFIG_IPV6)
> -	br_multicast_enable(&port->ip6_own_query);
> -#endif
>   	if (port->multicast_router == MDB_RTR_TYPE_PERM &&
>   	    hlist_unhashed(&port->rlist))
>   		br_multicast_add_router(br, port);
>   }
>   
> +static void __br_multicast_vlan_init(struct net_bridge_vlan *vlan)
> +{
> +	vlan->ip4_querier.port = NULL;
> +	vlan->ip4_other_query.delay_time = 0;
> +
> +	timer_setup(&vlan->ip4_other_query.timer,
> +		    br_ip4_multicast_querier_expired, 0);
> +	timer_setup(&vlan->ip4_own_query.timer,
> +		    br_ip4_multicast_query_expired, 0);
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +	vlan->ip6_querier.port = NULL;
> +	vlan->ip6_other_query.delay_time = 0;
> +	timer_setup(&vlan->ip6_other_query.timer,
> +		    br_ip6_multicast_querier_expired, 0);
> +	timer_setup(&vlan->ip6_own_query.timer,
> +		    br_ip6_multicast_query_expired, 0);
> + #endif
> +}
> +
> +void br_multicast_enable_vlan(struct net_bridge *br, u16 vid)
> +{
> +	struct net_bridge_vlan *v;
> +
> +	v = br_vlan_find(br_vlan_group(br), vid);
> +	if (!v)
> +		return;
> +
> +	__br_multicast_vlan_init(v);
> +	br_multicast_enable(&v->ip4_own_query);
> +#if IS_ENABLED(CONFIG_IPV6)
> +	br_multicast_enable(&v->ip6_own_query);
> +#endif
> +}
> +
> +/* called by stp to enable timers, only use it to enable router port? -jnn */
>   void br_multicast_enable_port(struct net_bridge_port *port)
>   {
>   	struct net_bridge *br = port->br;
> @@ -1127,6 +1204,7 @@ void br_multicast_enable_port(struct net_bridge_port *port)
>   	spin_unlock(&br->multicast_lock);
>   }
>   
> +/* called by stp_if */
>   void br_multicast_disable_port(struct net_bridge_port *port)
>   {
>   	struct net_bridge *br = port->br;
> @@ -1139,12 +1217,6 @@ void br_multicast_disable_port(struct net_bridge_port *port)
>   			br_multicast_del_pg(br, pg);
>   
>   	__del_port_router(port);
> -
> -	del_timer(&port->multicast_router_timer);
> -	del_timer(&port->ip4_own_query.timer);
> -#if IS_ENABLED(CONFIG_IPV6)
> -	del_timer(&port->ip6_own_query.timer);
> -#endif
>   	spin_unlock(&br->multicast_lock);
>   }
>   
> @@ -1283,65 +1355,66 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
>   }
>   #endif
>   
> -static bool br_ip4_multicast_select_querier(struct net_bridge *br,
> +static bool br_ip4_multicast_select_querier(struct net_bridge_vlan *vlan,
>   					    struct net_bridge_port *port,
>   					    __be32 saddr)
>   {
> -	if (!timer_pending(&br->ip4_own_query.timer) &&
> -	    !timer_pending(&br->ip4_other_query.timer))
> +
> +	if (!timer_pending(&vlan->ip4_own_query.timer) &&
> +	    !timer_pending(&vlan->ip4_other_query.timer))
>   		goto update;
>   
> -	if (!br->ip4_querier.addr.u.ip4)
> +	if (!vlan->ip4_querier.addr.u.ip4)
>   		goto update;
>   
> -	if (ntohl(saddr) <= ntohl(br->ip4_querier.addr.u.ip4))
> +	if (ntohl(saddr) <= ntohl(vlan->ip4_querier.addr.u.ip4))
>   		goto update;
>   
>   	return false;
>   
>   update:
> -	br->ip4_querier.addr.u.ip4 = saddr;
> +	vlan->ip4_querier.addr.u.ip4 = saddr;
>   
>   	/* update protected by general multicast_lock by caller */
> -	rcu_assign_pointer(br->ip4_querier.port, port);
> +	rcu_assign_pointer(vlan->ip4_querier.port, port);
>   
>   	return true;
>   }
>   
>   #if IS_ENABLED(CONFIG_IPV6)
> -static bool br_ip6_multicast_select_querier(struct net_bridge *br,
> +static bool br_ip6_multicast_select_querier(struct net_bridge_vlan *vlan,
>   					    struct net_bridge_port *port,
>   					    struct in6_addr *saddr)
>   {
> -	if (!timer_pending(&br->ip6_own_query.timer) &&
> -	    !timer_pending(&br->ip6_other_query.timer))
> +	if (!timer_pending(&vlan->ip6_own_query.timer) &&
> +	    !timer_pending(&vlan->ip6_other_query.timer))
>   		goto update;
>   
> -	if (ipv6_addr_cmp(saddr, &br->ip6_querier.addr.u.ip6) <= 0)
> +	if (ipv6_addr_cmp(saddr, &vlan->ip6_querier.addr.u.ip6) <= 0)
>   		goto update;
>   
>   	return false;
>   
>   update:
> -	br->ip6_querier.addr.u.ip6 = *saddr;
> +	vlan->ip6_querier.addr.u.ip6 = *saddr;
>   
>   	/* update protected by general multicast_lock by caller */
> -	rcu_assign_pointer(br->ip6_querier.port, port);
> +	rcu_assign_pointer(vlan->ip6_querier.port, port);
>   
>   	return true;
>   }
>   #endif
>   
> -static bool br_multicast_select_querier(struct net_bridge *br,
> +static bool br_multicast_select_querier(struct net_bridge_vlan *vlan,
>   					struct net_bridge_port *port,
>   					struct br_ip *saddr)
>   {
>   	switch (saddr->proto) {
>   	case htons(ETH_P_IP):
> -		return br_ip4_multicast_select_querier(br, port, saddr->u.ip4);
> +		return br_ip4_multicast_select_querier(vlan, port, saddr->u.ip4);
>   #if IS_ENABLED(CONFIG_IPV6)
>   	case htons(ETH_P_IPV6):
> -		return br_ip6_multicast_select_querier(br, port, &saddr->u.ip6);
> +		return br_ip6_multicast_select_querier(vlan, port, &saddr->u.ip6);
>   #endif
>   	}
>   
> @@ -1425,17 +1498,17 @@ static void br_multicast_mark_router(struct net_bridge *br,
>   		  now + br->multicast_querier_interval);
>   }
>   
> -static void br_multicast_query_received(struct net_bridge *br,
> +static void br_multicast_query_received(struct net_bridge_vlan *vlan,
>   					struct net_bridge_port *port,
>   					struct bridge_mcast_other_query *query,
>   					struct br_ip *saddr,
>   					unsigned long max_delay)
>   {
> -	if (!br_multicast_select_querier(br, port, saddr))
> +	if (!br_multicast_select_querier(vlan, port, saddr))
>   		return;
>   
> -	br_multicast_update_query_timer(br, query, max_delay);
> -	br_multicast_mark_router(br, port);
> +	br_multicast_update_query_timer(vlan->br, query, max_delay);
> +	br_multicast_mark_router(vlan->br, port);
>   }
>   
>   static int br_ip4_multicast_query(struct net_bridge *br,
> @@ -1482,10 +1555,17 @@ static int br_ip4_multicast_query(struct net_bridge *br,
>   	}
>   
>   	if (!group) {
> +		struct net_bridge_vlan *v;
> +
> +		v = br_vlan_find(br_vlan_group(br), vid);
> +		if (!v)
> +			goto out;
> +
>   		saddr.proto = htons(ETH_P_IP);
> +		saddr.vid   = vid;
>   		saddr.u.ip4 = iph->saddr;
>   
> -		br_multicast_query_received(br, port, &br->ip4_other_query,
> +		br_multicast_query_received(v, port, &v->ip4_other_query,
>   					    &saddr, max_delay);
>   		goto out;
>   	}
> @@ -1565,10 +1645,17 @@ static int br_ip6_multicast_query(struct net_bridge *br,
>   	is_general_query = group && ipv6_addr_any(group);
>   
>   	if (is_general_query) {
> +		struct net_bridge_vlan *v;
> +
> +		v = br_vlan_find(br_vlan_group(br), vid);
> +		if (!v)
> +			goto out;
> +
>   		saddr.proto = htons(ETH_P_IPV6);
> +		saddr.vid   = vid;
>   		saddr.u.ip6 = ip6h->saddr;
>   
> -		br_multicast_query_received(br, port, &br->ip6_other_query,
> +		br_multicast_query_received(v, port, &v->ip6_other_query,
>   					    &saddr, max_delay);
>   		goto out;
>   	} else if (!group) {
> @@ -1716,20 +1803,22 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
>   					 __u16 vid,
>   					 const unsigned char *src)
>   {
> +	struct net_bridge_vlan *v;
>   	struct br_ip br_group;
> -	struct bridge_mcast_own_query *own_query;
>   
>   	if (ipv4_is_local_multicast(group))
>   		return;
>   
> -	own_query = port ? &port->ip4_own_query : &br->ip4_own_query;
> +	v = br_vlan_find(br_vlan_group(br), vid);
> +	if (!v)
> +		return;
>   
>   	br_group.u.ip4 = group;
>   	br_group.proto = htons(ETH_P_IP);
>   	br_group.vid = vid;
>   
> -	br_multicast_leave_group(br, port, &br_group, &br->ip4_other_query,
> -				 own_query, src);
> +	br_multicast_leave_group(br, port, &br_group, &v->ip4_other_query,
> +				 &v->ip4_own_query, src);
>   }
>   
>   #if IS_ENABLED(CONFIG_IPV6)
> @@ -1739,20 +1828,22 @@ static void br_ip6_multicast_leave_group(struct net_bridge *br,
>   					 __u16 vid,
>   					 const unsigned char *src)
>   {
> +	struct net_bridge_vlan *v;
>   	struct br_ip br_group;
> -	struct bridge_mcast_own_query *own_query;
>   
>   	if (ipv6_addr_is_ll_all_nodes(group))
>   		return;
>   
> -	own_query = port ? &port->ip6_own_query : &br->ip6_own_query;
> +	v = br_vlan_find(br_vlan_group(br), vid);
> +	if (!v)
> +		return;
>   
>   	br_group.u.ip6 = *group;
>   	br_group.proto = htons(ETH_P_IPV6);
>   	br_group.vid = vid;
>   
> -	br_multicast_leave_group(br, port, &br_group, &br->ip6_other_query,
> -				 own_query, src);
> +	br_multicast_leave_group(br, port, &br_group, &v->ip6_other_query,
> +				 &v->ip6_own_query, src);
>   }
>   #endif
>   
> @@ -1938,37 +2029,42 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
>   	return ret;
>   }
>   
> -static void br_multicast_query_expired(struct net_bridge *br,
> +static void br_multicast_query_expired(struct net_bridge_vlan *vlan,
>   				       struct bridge_mcast_own_query *query,
>   				       struct bridge_mcast_querier *querier)
>   {
> +	struct net_bridge *br = vlan->br;
> +
>   	spin_lock(&br->multicast_lock);
>   	if (query->startup_sent < br->multicast_startup_query_count)
>   		query->startup_sent++;
>   
>   	RCU_INIT_POINTER(querier->port, NULL);
> -	br_multicast_send_query(br, NULL, query);
> +	br_multicast_send_query(vlan, NULL, query);
>   	spin_unlock(&br->multicast_lock);
>   }
>   
>   static void br_ip4_multicast_query_expired(struct timer_list *t)
>   {
> -	struct net_bridge *br = from_timer(br, t, ip4_own_query.timer);
> +	struct net_bridge_vlan *v = from_timer(v, t, ip4_own_query.timer);
>   
> -	br_multicast_query_expired(br, &br->ip4_own_query, &br->ip4_querier);
> +	br_multicast_query_expired(v, &v->ip4_own_query, &v->ip4_querier);
>   }
>   
>   #if IS_ENABLED(CONFIG_IPV6)
>   static void br_ip6_multicast_query_expired(struct timer_list *t)
>   {
> -	struct net_bridge *br = from_timer(br, t, ip6_own_query.timer);
> +	struct net_bridge_vlan *v = from_timer(v, t, ip6_own_query.timer);
>   
> -	br_multicast_query_expired(br, &br->ip6_own_query, &br->ip6_querier);
> +	br_multicast_query_expired(v, &v->ip6_own_query, &v->ip6_querier);
>   }
>   #endif
>   
>   void br_multicast_init(struct net_bridge *br)
>   {
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
> +
>   	br->hash_elasticity = 4;
>   	br->hash_max = 512;
>   
> @@ -1985,29 +2081,22 @@ void br_multicast_init(struct net_bridge *br)
>   	br->multicast_querier_interval = 255 * HZ;
>   	br->multicast_membership_interval = 260 * HZ;
>   
> -	br->ip4_other_query.delay_time = 0;
> -	br->ip4_querier.port = NULL;
>   	br->multicast_igmp_version = 2;
>   #if IS_ENABLED(CONFIG_IPV6)
>   	br->multicast_mld_version = 1;
> -	br->ip6_other_query.delay_time = 0;
> -	br->ip6_querier.port = NULL;
>   #endif
>   	br->has_ipv6_addr = 1;
>   
>   	spin_lock_init(&br->multicast_lock);
>   	timer_setup(&br->multicast_router_timer,
>   		    br_multicast_local_router_expired, 0);
> -	timer_setup(&br->ip4_other_query.timer,
> -		    br_ip4_multicast_querier_expired, 0);
> -	timer_setup(&br->ip4_own_query.timer,
> -		    br_ip4_multicast_query_expired, 0);
> -#if IS_ENABLED(CONFIG_IPV6)
> -	timer_setup(&br->ip6_other_query.timer,
> -		    br_ip6_multicast_querier_expired, 0);
> -	timer_setup(&br->ip6_own_query.timer,
> -		    br_ip6_multicast_query_expired, 0);
> -#endif
> +
> +	vg = br_vlan_group(br);
> +	if (!vg || !vg->num_vlans)
> +		return;
> +
> +	list_for_each_entry(v, &vg->vlan_list, vlist)
> +		__br_multicast_vlan_init(v);
>   }
>   
>   static void __br_multicast_open(struct net_bridge *br,
> @@ -2023,21 +2112,41 @@ static void __br_multicast_open(struct net_bridge *br,
>   
>   void br_multicast_open(struct net_bridge *br)
>   {
> -	__br_multicast_open(br, &br->ip4_own_query);
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
> +
> +	vg = br_vlan_group(br);
> +	if (!vg || !vg->num_vlans)
> +		return;
> +
> +	list_for_each_entry(v, &vg->vlan_list, vlist) {
> +		__br_multicast_vlan_init(v);
> +		__br_multicast_open(br, &v->ip4_own_query);
>   #if IS_ENABLED(CONFIG_IPV6)
> -	__br_multicast_open(br, &br->ip6_own_query);
> +		__br_multicast_open(br, &v->ip6_own_query);
>   #endif
> +	}
>   }
>   
>   void br_multicast_stop(struct net_bridge *br)
>   {
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
> +
>   	del_timer_sync(&br->multicast_router_timer);
> -	del_timer_sync(&br->ip4_other_query.timer);
> -	del_timer_sync(&br->ip4_own_query.timer);
> +
> +	vg = br_vlan_group(br);
> +	if (!vg || !vg->num_vlans)
> +		return;
> +
> +	list_for_each_entry(v, &vg->vlan_list, vlist) {
> +		del_timer_sync(&v->ip4_other_query.timer);
> +		del_timer_sync(&v->ip4_own_query.timer);
>   #if IS_ENABLED(CONFIG_IPV6)
> -	del_timer_sync(&br->ip6_other_query.timer);
> -	del_timer_sync(&br->ip6_own_query.timer);
> +		del_timer_sync(&v->ip6_other_query.timer);
> +		del_timer_sync(&v->ip6_own_query.timer);
>   #endif
> +	}
>   }
>   
>   void br_multicast_dev_del(struct net_bridge *br)
> @@ -2162,25 +2271,37 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
>   	return err;
>   }
>   
> -static void br_multicast_start_querier(struct net_bridge *br,
> +/* Must be called with multicast_lock */
> +static void br_multicast_init_querier(struct net_bridge_vlan *vlan,
> +				      struct bridge_mcast_own_query *query,
> +				      unsigned long max_delay)
> +{
> +	struct bridge_mcast_other_query *other_query = NULL;
> +
> +	if (query == &vlan->ip4_own_query)
> +		other_query = &vlan->ip4_other_query;
> +	else
> +		other_query = &vlan->ip6_other_query;
> +
> +	if (!timer_pending(&other_query->timer))
> +		other_query->delay_time = jiffies + max_delay;
> +
> +	br_multicast_start_querier(vlan, query);
> +}
> +
> +static void br_multicast_start_querier(struct net_bridge_vlan *vlan,
>   				       struct bridge_mcast_own_query *query)
>   {
> -	struct net_bridge_port *port;
> +	struct net_bridge *br = vlan->br;
>   
>   	__br_multicast_open(br, query);
>   
> -	list_for_each_entry(port, &br->port_list, list) {
> -		if (port->state == BR_STATE_DISABLED ||
> -		    port->state == BR_STATE_BLOCKING)
> -			continue;
> -
> -		if (query == &br->ip4_own_query)
> -			br_multicast_enable(&port->ip4_own_query);
> +	if (query == &vlan->ip4_own_query)
> +		br_multicast_enable(&vlan->ip4_own_query);
>   #if IS_ENABLED(CONFIG_IPV6)
> -		else
> -			br_multicast_enable(&port->ip6_own_query);
> +	else
> +		br_multicast_enable(&vlan->ip6_own_query);
>   #endif
> -	}
>   }
>   
>   int br_multicast_toggle(struct net_bridge *br, unsigned long val)
> @@ -2248,6 +2369,8 @@ EXPORT_SYMBOL_GPL(br_multicast_router);
>   
>   int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
>   {
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
>   	unsigned long max_delay;
>   
>   	val = !!val;
> @@ -2260,19 +2383,18 @@ int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
>   	if (!val)
>   		goto unlock;
>   
> -	max_delay = br->multicast_query_response_interval;
> -
> -	if (!timer_pending(&br->ip4_other_query.timer))
> -		br->ip4_other_query.delay_time = jiffies + max_delay;
> +	vg = br_vlan_group(br);
> +	if (!vg || !vg->num_vlans)
> +		goto unlock;
>   
> -	br_multicast_start_querier(br, &br->ip4_own_query);
> +	max_delay = br->multicast_query_response_interval;
>   
> +	list_for_each_entry(v, &vg->vlan_list, vlist) {
> +		br_multicast_init_querier(v, &v->ip4_own_query, max_delay);
>   #if IS_ENABLED(CONFIG_IPV6)
> -	if (!timer_pending(&br->ip6_other_query.timer))
> -		br->ip6_other_query.delay_time = jiffies + max_delay;
> -
> -	br_multicast_start_querier(br, &br->ip6_own_query);
> +		br_multicast_init_querier(v, &v->ip6_own_query, max_delay);
>   #endif
> +	}
>   
>   unlock:
>   	spin_unlock_bh(&br->multicast_lock);
> @@ -2425,6 +2547,7 @@ EXPORT_SYMBOL_GPL(br_multicast_list_adjacent);
>    */
>   bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
>   {
> +	struct net_bridge_vlan_group *vg;
>   	struct net_bridge *br;
>   	struct net_bridge_port *port;
>   	struct ethhdr eth;
> @@ -2438,12 +2561,16 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
>   	if (!port || !port->br)
>   		goto unlock;
>   
> +	vg = nbp_vlan_group_rcu(port);
> +	if (!vg)
> +		goto unlock;
> +
>   	br = port->br;
>   
>   	memset(&eth, 0, sizeof(eth));
>   	eth.h_proto = htons(proto);
>   
> -	ret = br_multicast_querier_exists(br, &eth);
> +	ret = br_multicast_querier_exists(br, br_get_pvid(vg), &eth);
>   
>   unlock:
>   	rcu_read_unlock();
> @@ -2462,7 +2589,8 @@ EXPORT_SYMBOL_GPL(br_multicast_has_querier_anywhere);
>    */
>   bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto)
>   {
> -	struct net_bridge *br;
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
>   	struct net_bridge_port *port;
>   	bool ret = false;
>   
> @@ -2474,18 +2602,24 @@ bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto)
>   	if (!port || !port->br)
>   		goto unlock;
>   
> -	br = port->br;
> +	vg = nbp_vlan_group_rcu(port);
> +	if (!vg)
> +		goto unlock;
> +
> +	v = br_vlan_find(br_vlan_group(port->br), br_get_pvid(vg));
> +	if (!v)
> +		goto unlock;
>   
>   	switch (proto) {
>   	case ETH_P_IP:
> -		if (!timer_pending(&br->ip4_other_query.timer) ||
> -		    rcu_dereference(br->ip4_querier.port) == port)
> +		if (!timer_pending(&v->ip4_other_query.timer) ||
> +		    rcu_dereference(v->ip4_querier.port) == port)
>   			goto unlock;
>   		break;
>   #if IS_ENABLED(CONFIG_IPV6)
>   	case ETH_P_IPV6:
> -		if (!timer_pending(&br->ip6_other_query.timer) ||
> -		    rcu_dereference(br->ip6_querier.port) == port)
> +		if (!timer_pending(&v->ip6_other_query.timer) ||
> +		    rcu_dereference(v->ip6_querier.port) == port)
>   			goto unlock;
>   		break;
>   #endif
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 6e31be61d2c6..00dac1bbfaba 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -140,6 +140,17 @@ struct net_bridge_vlan {
>   		struct net_bridge_vlan	*brvlan;
>   	};
>   
> +#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> +	struct bridge_mcast_other_query	ip4_other_query;
> +	struct bridge_mcast_own_query	ip4_own_query;
> +	struct bridge_mcast_querier	ip4_querier;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	struct bridge_mcast_other_query	ip6_other_query;
> +	struct bridge_mcast_own_query	ip6_own_query;
> +	struct bridge_mcast_querier	ip6_querier;
> +#endif
> +#endif
> +
>   	struct br_tunnel_info		tinfo;
>   
>   	struct list_head		vlist;
> @@ -261,10 +272,6 @@ struct net_bridge_port {
>   	struct rcu_head			rcu;
>   
>   #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> -	struct bridge_mcast_own_query	ip4_own_query;
> -#if IS_ENABLED(CONFIG_IPV6)
> -	struct bridge_mcast_own_query	ip6_own_query;
> -#endif /* IS_ENABLED(CONFIG_IPV6) */
>   	unsigned char			multicast_router;
>   	struct bridge_mcast_stats	__percpu *mcast_stats;
>   	struct timer_list		multicast_router_timer;
> @@ -390,14 +397,8 @@ struct net_bridge {
>   	struct hlist_head		router_list;
>   
>   	struct timer_list		multicast_router_timer;
> -	struct bridge_mcast_other_query	ip4_other_query;
> -	struct bridge_mcast_own_query	ip4_own_query;
> -	struct bridge_mcast_querier	ip4_querier;
>   	struct bridge_mcast_stats	__percpu *mcast_stats;
>   #if IS_ENABLED(CONFIG_IPV6)
> -	struct bridge_mcast_other_query	ip6_other_query;
> -	struct bridge_mcast_own_query	ip6_own_query;
> -	struct bridge_mcast_querier	ip6_querier;
>   	u8				multicast_mld_version;
>   #endif /* IS_ENABLED(CONFIG_IPV6) */
>   #endif
> @@ -618,6 +619,7 @@ int br_multicast_add_port(struct net_bridge_port *port);
>   void br_multicast_del_port(struct net_bridge_port *port);
>   void br_multicast_enable_port(struct net_bridge_port *port);
>   void br_multicast_disable_port(struct net_bridge_port *port);
> +void br_multicast_enable_vlan(struct net_bridge *br, u16 vid);
>   void br_multicast_init(struct net_bridge *br);
>   void br_multicast_open(struct net_bridge *br);
>   void br_multicast_stop(struct net_bridge *br);
> @@ -633,6 +635,7 @@ int br_multicast_set_igmp_version(struct net_bridge *br, unsigned long val);
>   #if IS_ENABLED(CONFIG_IPV6)
>   int br_multicast_set_mld_version(struct net_bridge *br, unsigned long val);
>   #endif
> +__be32 br_multicast_inet_addr(struct net_bridge *br, u16 vid);
>   struct net_bridge_mdb_entry *
>   br_mdb_ip_get(struct net_bridge_mdb_htable *mdb, struct br_ip *dst);
>   struct net_bridge_mdb_entry *
> @@ -687,17 +690,27 @@ __br_multicast_querier_exists(struct net_bridge *br,
>   	       (own_querier_enabled || timer_pending(&querier->timer));
>   }
>   
> +static struct net_bridge_vlan_group *br_vlan_group(const struct net_bridge *br);
> +struct net_bridge_vlan *br_vlan_find(struct net_bridge_vlan_group *vg, u16 vid);
> +
>   static inline bool br_multicast_querier_exists(struct net_bridge *br,
> +					       u16 vid,
>   					       struct ethhdr *eth)
>   {
> +	struct net_bridge_vlan *v;
> +
> +	v = br_vlan_find(br_vlan_group(br), vid);
> +	if (!v)
> +		return false;
> +
>   	switch (eth->h_proto) {
>   	case (htons(ETH_P_IP)):
>   		return __br_multicast_querier_exists(br,
> -			&br->ip4_other_query, false);
> +			&v->ip4_other_query, false);
>   #if IS_ENABLED(CONFIG_IPV6)
>   	case (htons(ETH_P_IPV6)):
>   		return __br_multicast_querier_exists(br,
> -			&br->ip6_other_query, true);
> +			&v->ip6_other_query, true);
>   #endif
>   	default:
>   		return false;
> @@ -768,6 +781,7 @@ static inline bool br_multicast_is_router(struct net_bridge *br)
>   }
>   
>   static inline bool br_multicast_querier_exists(struct net_bridge *br,
> +					       u16 vid,
>   					       struct ethhdr *eth)
>   {
>   	return false;
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index a1ba52d247d8..d1d6c4fb39dd 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -460,10 +460,7 @@ void br_port_state_selection(struct net_bridge *br)
>   
>   		if (p->state != BR_STATE_BLOCKING)
>   			br_multicast_enable_port(p);
> -		/* Multicast is not disabled for the port when it goes in
> -		 * blocking state because the timers will expire and stop by
> -		 * themselves without sending more queries.
> -		 */
> +
>   		if (p->state == BR_STATE_FORWARDING)
>   			++liveports;
>   	}
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index bb9cbad4bad6..3b8fb28e9ab4 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -270,6 +270,9 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
>   			goto out_filt;
>   		}
>   		vg->num_vlans++;
> +
> +		/* Start per VLAN IGMP/MLD querier timers */
> +		br_multicast_enable_vlan(br, v->vid);
>   	}
>   
>   	err = rhashtable_lookup_insert_fast(&vg->vlan_hash, &v->vnode,
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 00/11] introduction of bpf_xdp_adjust_tail
From: Daniel Borkmann @ 2018-04-18 12:37 UTC (permalink / raw)
  To: Nikita V. Shirokov, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20180418042951.17183-1-tehnerd@tehnerd.com>

On 04/18/2018 06:29 AM, Nikita V. Shirokov wrote:
> In this patch series i'm add new bpf helper which allow to manupulate
> xdp's data_end pointer. right now only "shrinking" (reduce packet's size
> by moving pointer) is supported (and i see no use case for "growing").
> Main use case for such helper is to be able to generate controll (ICMP)
> messages from XDP context. such messages usually contains first N bytes
> from original packets as a payload, and this is exactly what this helper
> would allow us to do (see patch 3 for sample program, where we generate
> ICMP "packet too big" message). This helper could be usefull for load
> balancing applications where after additional encapsulation, resulting
> packet could be bigger then interface MTU.
> Aside from new helper this patch series contains minor changes in device
> drivers (for ones which requires), so they would recal packet's length
> not only when head pointer was adjusted, but if tail's one as well.

The whole set doesn't have any SoBs from you which is mandatory before
applying anything. Please add.

Thanks,
Daniel

> v1->v2:
>  * fixed kbuild warning
>  * made offset eq 0 invalid for xdp_bpf_adjust_tail
>  * splitted bpf_prog_test_run fix and selftests in sep commits
>  * added SPDX licence where applicable
>  * some reshuffling in patches order (tests now in the end)
> 
> 
> Nikita V. Shirokov (11):
>   bpf: making bpf_prog_test run aware of possible data_end ptr change
>   bpf: adding tests for bpf_xdp_adjust_tail
>   bpf: adding bpf_xdp_adjust_tail helper
>   bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail
>   bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail
>   bpf: make bnxt compatible w/ bpf_xdp_adjust_tail
>   bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail
>   bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail
>   bpf: make tun compatible w/ bpf_xdp_adjust_tail
>   bpf: make virtio compatible w/ bpf_xdp_adjust_tail
>   bpf: add bpf_xdp_adjust_tail sample prog
> 
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |   2 +-
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   2 +-
>  .../net/ethernet/netronome/nfp/nfp_net_common.c    |   2 +-
>  drivers/net/tun.c                                  |   3 +-
>  drivers/net/virtio_net.c                           |   7 +-
>  include/uapi/linux/bpf.h                           |  10 +-
>  net/bpf/test_run.c                                 |   3 +-
>  net/core/dev.c                                     |  10 +-
>  net/core/filter.c                                  |  29 +++-
>  samples/bpf/Makefile                               |   4 +
>  samples/bpf/xdp_adjust_tail_kern.c                 | 152 +++++++++++++++++++++
>  samples/bpf/xdp_adjust_tail_user.c                 | 142 +++++++++++++++++++
>  tools/include/uapi/linux/bpf.h                     |  10 +-
>  tools/testing/selftests/bpf/Makefile               |   2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h          |   5 +
>  tools/testing/selftests/bpf/test_adjust_tail.c     |  30 ++++
>  tools/testing/selftests/bpf/test_progs.c           |  32 +++++
>  18 files changed, 435 insertions(+), 12 deletions(-)
>  create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
>  create mode 100644 samples/bpf/xdp_adjust_tail_user.c
>  create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c
> 

^ permalink raw reply

* Re: [PATCH v2 bpf-next 0/3] Add missing types to bpftool, libbpf
From: Daniel Borkmann @ 2018-04-18 12:42 UTC (permalink / raw)
  To: Andrey Ignatov, ast; +Cc: kubakici, quentin.monnet, netdev, kernel-team
In-Reply-To: <cover.1523985784.git.rdna@fb.com>

On 04/17/2018 07:28 PM, Andrey Ignatov wrote:
> v1->v2:
> - add new types to bpftool-cgroup man page;
> - add new types to bash completion for bpftool;
> - don't add types that should not be in bpftool cgroup.
> 
> Add support for various BPF prog types and attach types that have been
> added to kernel recently but not to bpftool or libbpf yet.
> 
> Andrey Ignatov (3):
>   bpftool: Support new prog types and attach types
>   libbpf: Support guessing post_bind{4,6} progs
>   libbpf: Type functions for raw tracepoints

Applied to bpf-next, thanks Andrey!

^ permalink raw reply

* Re: [Regression] net/phy/micrel.c v4.9.94
From: Andrew Lunn @ 2018-04-18 12:43 UTC (permalink / raw)
  To: Chris Ruehl; +Cc: f.fainelli, netdev
In-Reply-To: <22ccc548-0000-1873-1ea0-1aad140d7131@gtsys.com.hk>

> If I look at the patch I think it should call kszphy_config_init() not _reset()
> in the resume function:
> 
> 
> @@ -715,8 +723,14 @@ static int kszphy_suspend(struct phy_device *phydev)
> 
>  static int kszphy_resume(struct phy_device *phydev)
>  {
> +	int ret;
> +
>  	genphy_resume(phydev);
> 
> -	ret = kszphy_config_reset(phydev);
> +       ret = kszphy_config_init(phydev);
> +	if (ret)
> +		return ret;
> +
> 

Hi Chris

I think there has been a patch for this posted. If i remember
correctly, the PHY you have does not call probe, hence phydev->priv is
a NULL pointer, so priv->rmii_ref_clk_sel does not work.

It would be good to find the patch and make sure it has been accepted,
and marked for stable.

    Andrew

^ permalink raw reply

* Re: [PATCH] samples/bpf: correct comment in sock_example.c
From: Daniel Borkmann @ 2018-04-18 12:47 UTC (permalink / raw)
  To: Wang Sheng-Hui, ast, netdev
In-Reply-To: <20180417022520.2412-1-shhuiw@foxmail.com>

On 04/17/2018 04:25 AM, Wang Sheng-Hui wrote:
> The program run against loopback interace "lo", not "eth0".
> Correct the comment.
> 
> Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>

Applied to bpf-next, thanks Wang!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox