Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] mcast: ensure L-L IPv6 packets are accepted by bridge
From: Nikolay Aleksandrov @ 2019-08-14 16:58 UTC (permalink / raw)
  To: pruddy, Ido Schimmel; +Cc: netdev, roopa, linus.luessing
In-Reply-To: <620d3cfbe58e3ae87ef1d5e7f2aa1588cac3e64a.camel@vyatta.att-mail.com>

On 8/14/19 7:40 PM, Patrick Ruddy wrote:
> Thanks both for the quick replies, answers inline...
> 
> On Wed, 2019-08-14 at 02:55 +0300, Nikolay Aleksandrov wrote:
>> On 8/13/19 10:53 PM, Ido Schimmel wrote:
>>> + Bridge maintainers, Linus
>>>
>>
>> Good catch Ido, thanks!
>> First I'd say the subject needs to reflect that this is a bridge change
>> better, please rearrange it like so - bridge: mcast: ...
>> More below,
>>
>>> On Tue, Aug 13, 2019 at 03:18:04PM +0100, Patrick Ruddy wrote:
>>>> At present only all-nodes IPv6 multicast packets are accepted by
>>>> a bridge interface that is not in multicast router mode. Since
>>>> other protocols can be running in the absense of multicast
>>>> forwarding e.g. OSPFv3 IPv6 ND. Change the test to allow
>>>> all of the FFx2::/16 range to be accepted when not in multicast
>>>> router mode. This aligns the code with IPv4 link-local reception
>>>> and RFC4291
>>>
>>> Can you please quote the relevant part from RFC 4291?
>>>
>>>> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
>>>> ---
>>>>  include/net/addrconf.h    | 15 +++++++++++++++
>>>>  net/bridge/br_multicast.c |  2 +-
>>>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
>>>> index becdad576859..05b42867e969 100644
>>>> --- a/include/net/addrconf.h
>>>> +++ b/include/net/addrconf.h
>>>> @@ -434,6 +434,21 @@ static inline void addrconf_addr_solict_mult(const struct in6_addr *addr,
>>>>  		      htonl(0xFF000000) | addr->s6_addr32[3]);
>>>>  }
>>>>  
>>>> +/*
>>>> + *      link local multicast address range ffx2::/16 rfc4291
>>>> + */
>>>> +static inline bool ipv6_addr_is_ll_mcast(const struct in6_addr *addr)
>>>> +{
>>>> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
>>>> +	__be64 *p = (__be64 *)addr;
>>>> +	return ((p[0] & cpu_to_be64(0xff0f000000000000UL))
>>>> +		^ cpu_to_be64(0xff02000000000000UL)) == 0UL;
>>>> +#else
>>>> +	return ((addr->s6_addr32[0] & htonl(0xff0f0000)) ^
>>>> +		htonl(0xff020000)) == 0;
>>>> +#endif
>>>> +}
>>>> +
>>>>  static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr)
>>>>  {
>>>>  #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
>>>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>>>> index 9b379e110129..ed3957381fa2 100644
>>>> --- a/net/bridge/br_multicast.c
>>>> +++ b/net/bridge/br_multicast.c
>>>> @@ -1664,7 +1664,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>>>>  	err = ipv6_mc_check_mld(skb);
>>>>  
>>>>  	if (err == -ENOMSG) {
>>>> -		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
>>>> +		if (!ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
>>>>  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
>>>
>>> IIUC, you want IPv6 link-local packets to be locally received, but this
>>> also changes how these packets are flooded. RFC 4541 says that packets
>>
>> Indeed, we'll start flooding them all, not just the all hosts address.
>> If that is at all required it'll definitely have to be optional.
>>
>>> addressed to the all hosts address are a special case and should be
>>> forwarded to all ports:
>>>
>>> "In IPv6, the data forwarding rules are more straight forward because MLD is
>>> mandated for addresses with scope 2 (link-scope) or greater. The only exception
>>> is the address FF02::1 which is the all hosts link-scope address for which MLD
>>> messages are never sent. Packets with the all hosts link-scope address should
>>> be forwarded on all ports."
>>>
>>
>> I wonder what is the problem for the host to join such group on behalf of the bridge ?
>> Then you'll receive the traffic at least locally and the RFC says it itself - MLD is mandated
>> for the other link-local addresses.
>> It's very late here and maybe I'm missing something.. :)
>>
> The group is being joined by MLD at the L3 level but the packets are
> not being passed up to the l3 interface becasue there is a MLD querier
> on the network
> 

That shouldn't matter if the host has joined the group, there is a specific
check for that. If the host has joined the group and we have an mdst then
we'll hit this code:
                mdst = br_mdb_get(br, skb, vid);
                if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
                    br_multicast_querier_exists(br, eth_hdr(skb))) {
                        if ((mdst && mdst->host_joined) ||
                            br_multicast_is_router(br)) {
                                local_rcv = true;
                                br->dev->stats.multicast++;
                        }
                        mcast_hit = true;
                } else {

local_rcv become true and the packet is passed up, so what is the problem ?
Have you missed to refresh the group and it has expired in the bridge perhaps ?


> snippet from /proc/net/igmp6
> ...
> 40   sw1             ff0200000000000000000001ff008700     1 00000004 0
> 40   sw1             ff020000000000000000000000000002     1 00000004 0
> 40   sw1             ff020000000000000000000000000001     1 0000000C 0
> 40   sw1             ff010000000000000000000000000001     1 00000008 0
> 41   lo1             ff020000000000000000000000000001     1 0000000C 0
> 41   lo1             ff010000000000000000000000000001     1 00000008 0
> 42   sw1.1           ff020000000000000000000000000006     1 00000004 0
> 42   sw1.1           ff020000000000000000000000000005     1 00000004 0
> 42   sw1.1           ff0200000000000000000001ff000000     2 00000004 0
> 42   sw1.1           ff0200000000000000000001ff008700     1 00000004 0
> 42   sw1.1           ff0200000000000000000001ff000099     1 00000004 0
> 42   sw1.1           ff020000000000000000000000000002     1 00000004 0
> 42   sw1.1           ff020000000000000000000000000001     1 0000000C 0
> 42   sw1.1           ff010000000000000000000000000001     1 00000008 0
> ...
> 
> the bridge is sw1 and the l3 intervace is sw1.1
> 
> Ido is correct about the flooding - I will update the patch with the
> comments and reissue.
> 
> Thanks again
> 
> -pr
>>  
>>> Maybe you want something like:
>>>
>>
>> I think we can do without the new field, either pass local_rcv into br_multicast_rcv() or
>> set it based on return value. The extra test will have to remain unfortunately, but we
>> can reduce the tests by one if carefully done.
>>
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index 09b1dd8cd853..9f312a73f61c 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -132,7 +132,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>>>  		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
>>>  		    br_multicast_querier_exists(br, eth_hdr(skb))) {
>>>  			if ((mdst && mdst->host_joined) ||
>>> -			    br_multicast_is_router(br)) {
>>> +			    br_multicast_is_router(br) ||
>>> +			    BR_INPUT_SKB_CB_LOCAL_RECEIVE(skb)) {
>>>  				local_rcv = true;
>>>  				br->dev->stats.multicast++;
>>>  			}
>>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>>> index 9b379e110129..f03cecf6174e 100644
>>> --- a/net/bridge/br_multicast.c
>>> +++ b/net/bridge/br_multicast.c
>>> @@ -1667,6 +1667,9 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>>>  		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
>>>  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
>>>  
>>> +		if (ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
>>> +			BR_INPUT_SKB_CB(skb)->local_receive = 1;
>>> +
>>>  		if (ipv6_addr_is_all_snoopers(&ipv6_hdr(skb)->daddr)) {
>>>  			err = br_ip6_multicast_mrd_rcv(br, port, skb);
>>>  
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index b7a4942ff1b3..d76394ca4059 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -426,6 +426,7 @@ struct br_input_skb_cb {
>>>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>>>  	u8 igmp;
>>>  	u8 mrouters_only:1;
>>> +	u8 local_receive:1;
>>>  #endif
>>>  	u8 proxyarp_replied:1;
>>>  	u8 src_port_isolated:1;
>>> @@ -445,8 +446,10 @@ struct br_input_skb_cb {
>>>  
>>>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>>>  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(BR_INPUT_SKB_CB(__skb)->mrouters_only)
>>> +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(BR_INPUT_SKB_CB(__skb)->local_receive)
>>>  #else
>>>  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(0)
>>> +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(0)
>>>  #endif
>>>  
>>>  #define br_printk(level, br, format, args...)	\
>>>
> 


^ permalink raw reply

* Re: [PATCH net] ipv6: Fix return value of ipv6_mc_may_pull() for malformed packets
From: David Miller @ 2019-08-14 16:58 UTC (permalink / raw)
  To: sbrivio; +Cc: gnault, haliu, edumazet, linus.luessing, netdev
In-Reply-To: <dc0d0b1bc3c67e2a1346b0dd1f68428eb956fbb7.1565649789.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue, 13 Aug 2019 00:46:01 +0200

> Commit ba5ea614622d ("bridge: simplify ip_mc_check_igmp() and
> ipv6_mc_check_mld() calls") replaces direct calls to pskb_may_pull()
> in br_ipv6_multicast_mld2_report() with calls to ipv6_mc_may_pull(),
> that returns -EINVAL on buffers too short to be valid IPv6 packets,
> while maintaining the previous handling of the return code.
> 
> This leads to the direct opposite of the intended effect: if the
> packet is malformed, -EINVAL evaluates as true, and we'll happily
> proceed with the processing.
> 
> Return 0 if the packet is too short, in the same way as this was
> fixed for IPv4 by commit 083b78a9ed64 ("ip: fix ip_mc_may_pull()
> return value").
> 
> I don't have a reproducer for this, unlike the one referred to by
> the IPv4 commit, but this is clearly broken.
> 
> Fixes: ba5ea614622d ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* [PATCH] netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argument
From: Nathan Chancellor @ 2019-08-14 16:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: David S. Miller, netfilter-devel, coreteam, netdev, linux-kernel,
	clang-built-linux, Nathan Chancellor, kbuild test robot

clang warns:

net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp'
call is a comparison [-Werror,-Wmemsize-comparison]
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
                                      ~~~~~~~~~~~~~~~~~~^~
net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the
result of 'memcmp' instead?
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
            ^
                                                       )
net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument
to size_t to silence this warning
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
                                      ^
                                      (size_t)(
1 error generated.

Adjust the parentheses so that the result of the sizeof is used for the
size argument in memcmp, rather than the result of the comparison (which
would always be true because sizeof is a non-zero number).

Fixes: bd8699e9e292 ("netfilter: nft_bitwise: add offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/638
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 net/netfilter/nft_bitwise.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index 1f04ed5c518c..974300178fa9 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -135,8 +135,8 @@ static int nft_bitwise_offload(struct nft_offload_ctx *ctx,
 {
 	const struct nft_bitwise *priv = nft_expr_priv(expr);
 
-	if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
-	    priv->sreg != priv->dreg))
+	if (memcmp(&priv->xor, &zero, sizeof(priv->xor)) ||
+	    priv->sreg != priv->dreg)
 		return -EOPNOTSUPP;
 
 	memcpy(&ctx->regs[priv->dreg].mask, &priv->mask, sizeof(priv->mask));
-- 
2.23.0.rc2


^ permalink raw reply related

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Quentin Monnet @ 2019-08-14 16:58 UTC (permalink / raw)
  To: Edward Cree, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <bdb4b47b-25fa-eb96-aa8d-dd4f4b012277@solarflare.com>

2019-08-14 17:45 UTC+0100 ~ Edward Cree <ecree@solarflare.com>
> On 14/08/2019 10:42, Quentin Monnet wrote:
>> 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
>> <alexei.starovoitov@gmail.com>
>>> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?
>> To some extent (with subtleties for some other map types); and we use a
>> similar command line as a workaround for now. But because of the rate of
>> inserts/deletes in the map, the process often reports a number higher
>> than the max number of entries (we observed up to ~750k when max_entries
>> is 500k), even is the map is only half-full on average during the count.
>> On the worst case (though not frequent), an entry is deleted just before
>> we get the next key from it, and iteration starts all over again. This
>> is not reliable to determine how much space is left in the map.
>>
>> I cannot see a solution that would provide a more accurate count from
>> user space, when the map is under pressure?
> This might be a really dumb suggestion, but: you're wanting to collect a
>  summary statistic over an in-kernel data structure in a single syscall,
>  because making a series of syscalls to examine every entry is slow and
>  racy.  Isn't that exactly a job for an in-kernel virtual machine, and
>  could you not supply an eBPF program which the kernel runs on each entry
>  in the map, thus supporting people who want to calculate something else
>  (mean, min and max, whatever) instead of count?
> 

Hi Edward, I like the approach, thanks for the suggestion.

But I did not mention that we were using offloaded maps: Tracing the
kernel would probably work for programs running on the host, but this is
not a solution we could extend to hardware offload.

Best regards,
Quentin

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Alexei Starovoitov @ 2019-08-14 16:58 UTC (permalink / raw)
  To: Edward Cree
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann, bpf,
	Network Development, oss-drivers
In-Reply-To: <bdb4b47b-25fa-eb96-aa8d-dd4f4b012277@solarflare.com>

On Wed, Aug 14, 2019 at 9:45 AM Edward Cree <ecree@solarflare.com> wrote:
>
> On 14/08/2019 10:42, Quentin Monnet wrote:
> > 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
> > <alexei.starovoitov@gmail.com>
> >> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?
> > To some extent (with subtleties for some other map types); and we use a
> > similar command line as a workaround for now. But because of the rate of
> > inserts/deletes in the map, the process often reports a number higher
> > than the max number of entries (we observed up to ~750k when max_entries
> > is 500k), even is the map is only half-full on average during the count.
> > On the worst case (though not frequent), an entry is deleted just before
> > we get the next key from it, and iteration starts all over again. This
> > is not reliable to determine how much space is left in the map.
> >
> > I cannot see a solution that would provide a more accurate count from
> > user space, when the map is under pressure?
> This might be a really dumb suggestion, but: you're wanting to collect a
>  summary statistic over an in-kernel data structure in a single syscall,
>  because making a series of syscalls to examine every entry is slow and
>  racy.  Isn't that exactly a job for an in-kernel virtual machine, and
>  could you not supply an eBPF program which the kernel runs on each entry
>  in the map, thus supporting people who want to calculate something else
>  (mean, min and max, whatever) instead of count?

Pretty much my suggestion as well :)

It seems the better fix for your nat threshold is to keep count of
elements in the map in a separate global variable that
bpf program manually increments and decrements.
bpftool will dump it just as regular map of single element.
(I believe it doesn't recognize global variables properly yet)
and BTF will be there to pick exactly that 'count' variable.

^ permalink raw reply

* Re: [PATCH v4 9/9] Input: add IOC3 serio driver
From: Jonas Gorski @ 2019-08-14 16:57 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, Dmitry Torokhov,
	Lee Jones, David S. Miller, Srinivas Kandagatla, Alessandro Zummo,
	Alexandre Belloni, Greg Kroah-Hartman, Jiri Slaby,
	Evgeniy Polyakov, linux-mips, linux-kernel, linux-input,
	Network Development, linux-rtc, linux-serial
In-Reply-To: <20190814163733.82f624e342d061866ba8ff87@suse.de>

On Wed, 14 Aug 2019 at 16:37, Thomas Bogendoerfer <tbogendoerfer@suse.de> wrote:
>
> On Wed, 14 Aug 2019 15:20:14 +0200
> Jonas Gorski <jonas.gorski@gmail.com> wrote:
>
> > > +       d = devm_kzalloc(&pdev->dev, sizeof(*d), GFP_KERNEL);
> >
> > &pdev->dev => dev
>
> will change.
>
> >
> > > +       if (!d)
> > > +               return -ENOMEM;
> > > +
> > > +       sk = kzalloc(sizeof(*sk), GFP_KERNEL);
> >
> > any reason not to devm_kzalloc this as well? Then you won't need to
> > manually free it in the error cases.
>
> it has different life time than the device, so it may not allocated
> via devm_kzalloc
>
> > > +static int ioc3kbd_remove(struct platform_device *pdev)
> > > +{
> > > +       struct ioc3kbd_data *d = platform_get_drvdata(pdev);
> > > +
> > > +       devm_free_irq(&pdev->dev, d->irq, d);
> > > +       serio_unregister_port(d->kbd);
> > > +       serio_unregister_port(d->aux);
> > > +       return 0;
> > > +}
> >
> > and on that topic, won't you need to kfree d->kbd and d->aux here?
>
> that's done in serio_release_port() by the serio core.

i see. But in that case, don't the kfree's after the
serio_unregister_port's in the error path of the .probe function cause
a double free?


Regards
Jonas

^ permalink raw reply

* Re: [PATCH net-next] r8169: fix sporadic transmit timeout issue
From: David Miller @ 2019-08-14 16:54 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev, edumazet, holger
In-Reply-To: <e343933b-1965-4617-3011-6290ed30d4ae@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 12 Aug 2019 20:47:40 +0200

> Holger reported sporadic transmit timeouts and it turned out that one
> path misses ringing the doorbell. Fix was suggested by Eric.
> 
> Fixes: ef14358546b1 ("r8169: make use of xmit_more")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* fallout from net-next netfilter changes
From: David Miller @ 2019-08-14 16:53 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev


This started happening after Jakub's pull of your net-next changes
yesterday:

./include/uapi/linux/netfilter_ipv6/ip6t_LOG.h:5:2: warning: #warning "Please update iptables, this file will be removed soon!" [-Wcpp]
 #warning "Please update iptables, this file will be removed soon!"
  ^~~~~~~
In file included from <command-line>:
./include/uapi/linux/netfilter_ipv4/ipt_LOG.h:5:2: warning: #warning "Please update iptables, this file will be removed soon!" [-Wcpp]
 #warning "Please update iptables, this file will be removed soon!"
  ^~~~~~~

It's probaly from the standard kernel build UAPI header checks.

Please fix this.

^ permalink raw reply

* Re: [PATCH net-next v6 6/6] net: mscc: PTP Hardware Clock (PHC) support
From: David Miller @ 2019-08-14 16:49 UTC (permalink / raw)
  To: antoine.tenart
  Cc: richardcochran, alexandre.belloni, UNGLinuxDriver, netdev,
	thomas.petazzoni, allan.nielsen, andrew
In-Reply-To: <20190812144537.14497-7-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 12 Aug 2019 16:45:37 +0200

> This patch adds support for PTP Hardware Clock (PHC) to the Ocelot
> switch for both PTP 1-step and 2-step modes.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>

Richard, I really need your review on this patch.

Thank you.

^ permalink raw reply

* [PATCH bpf-next 4/4] selftests/bpf: test_progs: remove asserts from subtests
From: Stanislav Fomichev @ 2019-08-14 16:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190814164742.208909-1-sdf@google.com>

Otherwise they can bring the whole process down.

Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/bpf_obj_id.c     | 30 ++++++++++++++-----
 .../selftests/bpf/prog_tests/map_lock.c       | 20 ++++++++-----
 .../selftests/bpf/prog_tests/spinlock.c       | 10 ++++---
 .../bpf/prog_tests/stacktrace_build_id.c      | 11 +++++--
 .../bpf/prog_tests/stacktrace_build_id_nmi.c  | 11 +++++--
 5 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index f57e0c625de3..4ec8c4e9e9a1 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -48,16 +48,23 @@ void test_bpf_obj_id(void)
 		/* test_obj_id.o is a dumb prog. It should never fail
 		 * to load.
 		 */
-		if (err)
+		if (err) {
 			test__fail();
-		assert(!err);
+			continue;
+		}
 
 		/* Insert a magic value to the map */
 		map_fds[i] = bpf_find_map(__func__, objs[i], "test_map_id");
-		assert(map_fds[i] >= 0);
+		if (map_fds[i] < 0) {
+			test__fail();
+			goto done;
+		}
 		err = bpf_map_update_elem(map_fds[i], &array_key,
 					  &array_magic_value, 0);
-		assert(!err);
+		if (err) {
+			test__fail();
+			goto done;
+		}
 
 		/* Check getting map info */
 		info_len = sizeof(struct bpf_map_info) * 2;
@@ -96,9 +103,15 @@ void test_bpf_obj_id(void)
 		prog_infos[i].map_ids = ptr_to_u64(map_ids + i);
 		prog_infos[i].nr_map_ids = 2;
 		err = clock_gettime(CLOCK_REALTIME, &real_time_ts);
-		assert(!err);
+		if (err) {
+			test__fail();
+			goto done;
+		}
 		err = clock_gettime(CLOCK_BOOTTIME, &boot_time_ts);
-		assert(!err);
+		if (err) {
+			test__fail();
+			goto done;
+		}
 		err = bpf_obj_get_info_by_fd(prog_fds[i], &prog_infos[i],
 					     &info_len);
 		load_time = (real_time_ts.tv_sec - boot_time_ts.tv_sec)
@@ -224,7 +237,10 @@ void test_bpf_obj_id(void)
 		nr_id_found++;
 
 		err = bpf_map_lookup_elem(map_fd, &array_key, &array_value);
-		assert(!err);
+		if (err) {
+			test__fail();
+			goto done;
+		}
 
 		err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len);
 		CHECK(err || info_len != sizeof(struct bpf_map_info) ||
diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
index 12123ff1f31f..e7663721fb57 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
@@ -56,17 +56,21 @@ void test_map_lock(void)
 	bpf_map_update_elem(map_fd[0], &key, vars, BPF_F_LOCK);
 
 	for (i = 0; i < 4; i++)
-		assert(pthread_create(&thread_id[i], NULL,
-				      &spin_lock_thread, &prog_fd) == 0);
+		if (pthread_create(&thread_id[i], NULL,
+				   &spin_lock_thread, &prog_fd))
+			goto close_prog;
 	for (i = 4; i < 6; i++)
-		assert(pthread_create(&thread_id[i], NULL,
-				      &parallel_map_access, &map_fd[i - 4]) == 0);
+		if (pthread_create(&thread_id[i], NULL,
+				   &parallel_map_access, &map_fd[i - 4]))
+			goto close_prog;
 	for (i = 0; i < 4; i++)
-		assert(pthread_join(thread_id[i], &ret) == 0 &&
-		       ret == (void *)&prog_fd);
+		if (pthread_join(thread_id[i], &ret) ||
+		    ret != (void *)&prog_fd)
+			goto close_prog;
 	for (i = 4; i < 6; i++)
-		assert(pthread_join(thread_id[i], &ret) == 0 &&
-		       ret == (void *)&map_fd[i - 4]);
+		if (pthread_join(thread_id[i], &ret) ||
+		    ret != (void *)&map_fd[i - 4])
+			goto close_prog;
 	goto close_prog_noerr;
 close_prog:
 	test__fail();
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
index e843336713e8..5f32a913f732 100644
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
@@ -16,11 +16,13 @@ void test_spinlock(void)
 		goto close_prog;
 	}
 	for (i = 0; i < 4; i++)
-		assert(pthread_create(&thread_id[i], NULL,
-				      &spin_lock_thread, &prog_fd) == 0);
+		if (pthread_create(&thread_id[i], NULL,
+				   &spin_lock_thread, &prog_fd))
+			goto close_prog;
+
 	for (i = 0; i < 4; i++)
-		assert(pthread_join(thread_id[i], &ret) == 0 &&
-		       ret == (void *)&prog_fd);
+		if (pthread_join(thread_id[i], &ret) || ret != (void *)&prog_fd)
+			goto close_prog;
 	goto close_prog_noerr;
 close_prog:
 	test__fail();
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
index ac44fda84833..d74464faebd7 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
@@ -51,9 +51,14 @@ void test_stacktrace_build_id(void)
 		  "err %d errno %d\n", err, errno))
 		goto disable_pmu;
 
-	assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
-	       == 0);
-	assert(system("./urandom_read") == 0);
+	if (system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")) {
+		test__fail();
+		goto disable_pmu;
+	}
+	if (system("./urandom_read")) {
+		test__fail();
+		goto disable_pmu;
+	}
 	/* disable stack trace collection */
 	key = 0;
 	val = 1;
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
index 9557b7dfb782..e886911928bc 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
@@ -82,9 +82,14 @@ void test_stacktrace_build_id_nmi(void)
 		  "err %d errno %d\n", err, errno))
 		goto disable_pmu;
 
-	assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
-	       == 0);
-	assert(system("taskset 0x1 ./urandom_read 100000") == 0);
+	if (system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")) {
+		test__fail();
+		goto disable_pmu;
+	}
+	if (system("taskset 0x1 ./urandom_read 100000")) {
+		test__fail();
+		goto disable_pmu;
+	}
 	/* disable stack trace collection */
 	key = 0;
 	val = 1;
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next 3/4] selftests/bpf: test_progs: remove global fail/success counts
From: Stanislav Fomichev @ 2019-08-14 16:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190814164742.208909-1-sdf@google.com>

Now that we have a global per-test/per-environment state, there
is no longer the need to have global fail/success counters
(and there is no need to save/get the diff before/after the
test).

Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/bpf_obj_id.c     |  2 +-
 .../bpf/prog_tests/bpf_verif_scale.c          | 10 +---
 .../selftests/bpf/prog_tests/flow_dissector.c |  2 +-
 .../bpf/prog_tests/get_stack_raw_tp.c         |  2 +-
 .../selftests/bpf/prog_tests/global_data.c    | 10 ++--
 .../selftests/bpf/prog_tests/l4lb_all.c       |  4 +-
 .../selftests/bpf/prog_tests/map_lock.c       |  8 +--
 .../selftests/bpf/prog_tests/pkt_access.c     |  2 +-
 .../selftests/bpf/prog_tests/pkt_md_access.c  |  2 +-
 .../bpf/prog_tests/queue_stack_map.c          |  4 +-
 .../bpf/prog_tests/reference_tracking.c       |  2 +-
 .../selftests/bpf/prog_tests/spinlock.c       |  2 +-
 .../selftests/bpf/prog_tests/stacktrace_map.c |  2 +-
 .../bpf/prog_tests/stacktrace_map_raw_tp.c    |  2 +-
 .../bpf/prog_tests/task_fd_query_rawtp.c      |  2 +-
 .../bpf/prog_tests/task_fd_query_tp.c         |  2 +-
 .../selftests/bpf/prog_tests/tcp_estats.c     |  2 +-
 tools/testing/selftests/bpf/prog_tests/xdp.c  |  2 +-
 .../bpf/prog_tests/xdp_adjust_tail.c          |  2 +-
 .../selftests/bpf/prog_tests/xdp_noinline.c   |  4 +-
 tools/testing/selftests/bpf/test_progs.c      | 55 ++++++++-----------
 tools/testing/selftests/bpf/test_progs.h      | 26 +++++++--
 22 files changed, 75 insertions(+), 74 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index fb5840a62548..f57e0c625de3 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -49,7 +49,7 @@ void test_bpf_obj_id(void)
 		 * to load.
 		 */
 		if (err)
-			error_cnt++;
+			test__fail();
 		assert(!err);
 
 		/* Insert a magic value to the map */
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index 1a1eae356f81..217988243077 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -28,8 +28,6 @@ static int check_load(const char *file, enum bpf_prog_type type)
 	attr.prog_flags = BPF_F_TEST_RND_HI32;
 	err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
 	bpf_object__close(obj);
-	if (err)
-		error_cnt++;
 	return err;
 }
 
@@ -105,12 +103,8 @@ void test_bpf_verif_scale(void)
 			continue;
 
 		err = check_load(test->file, test->attach_type);
-		if (test->fails) { /* expected to fail */
-			if (err)
-				error_cnt--;
-			else
-				error_cnt++;
-		}
+		if (err && !test->fails)
+			test__fail();
 	}
 
 	if (env.verifier_stats)
diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index 6892b88ae065..e9d882c05ded 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -453,7 +453,7 @@ void test_flow_dissector(void)
 	err = bpf_flow_load(&obj, "./bpf_flow.o", "flow_dissector",
 			    "jmp_table", "last_dissection", &prog_fd, &keys_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
index 3d59b3c841fe..afc60f62e2a8 100644
--- a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
@@ -137,7 +137,7 @@ void test_get_stack_raw_tp(void)
 
 	goto close_prog_noerr;
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	if (!IS_ERR_OR_NULL(link))
 		bpf_link__destroy(link);
diff --git a/tools/testing/selftests/bpf/prog_tests/global_data.c b/tools/testing/selftests/bpf/prog_tests/global_data.c
index d011079fb0bf..db13bfee6bb9 100644
--- a/tools/testing/selftests/bpf/prog_tests/global_data.c
+++ b/tools/testing/selftests/bpf/prog_tests/global_data.c
@@ -8,7 +8,7 @@ static void test_global_data_number(struct bpf_object *obj, __u32 duration)
 
 	map_fd = bpf_find_map(__func__, obj, "result_number");
 	if (map_fd < 0) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -45,7 +45,7 @@ static void test_global_data_string(struct bpf_object *obj, __u32 duration)
 
 	map_fd = bpf_find_map(__func__, obj, "result_string");
 	if (map_fd < 0) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -82,7 +82,7 @@ static void test_global_data_struct(struct bpf_object *obj, __u32 duration)
 
 	map_fd = bpf_find_map(__func__, obj, "result_struct");
 	if (map_fd < 0) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -113,13 +113,13 @@ static void test_global_data_rdonly(struct bpf_object *obj, __u32 duration)
 
 	map = bpf_object__find_map_by_name(obj, "test_glo.rodata");
 	if (!map || !bpf_map__is_internal(map)) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
 	map_fd = bpf_map__fd(map);
 	if (map_fd < 0) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
index 20ddca830e68..724bb40de1f8 100644
--- a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
+++ b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
@@ -31,7 +31,7 @@ static void test_l4lb(const char *file)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -73,7 +73,7 @@ static void test_l4lb(const char *file)
 		pkts += stats[i].pkts;
 	}
 	if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
-		error_cnt++;
+		test__fail();
 		printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
 	}
 out:
diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
index ee99368c595c..12123ff1f31f 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
@@ -10,12 +10,12 @@ static void *parallel_map_access(void *arg)
 		err = bpf_map_lookup_elem_flags(map_fd, &key, vars, BPF_F_LOCK);
 		if (err) {
 			printf("lookup failed\n");
-			error_cnt++;
+			test__fail();
 			goto out;
 		}
 		if (vars[0] != 0) {
 			printf("lookup #%d var[0]=%d\n", i, vars[0]);
-			error_cnt++;
+			test__fail();
 			goto out;
 		}
 		rnd = vars[1];
@@ -24,7 +24,7 @@ static void *parallel_map_access(void *arg)
 				continue;
 			printf("lookup #%d var[1]=%d var[%d]=%d\n",
 			       i, rnd, j, vars[j]);
-			error_cnt++;
+			test__fail();
 			goto out;
 		}
 	}
@@ -69,7 +69,7 @@ void test_map_lock(void)
 		       ret == (void *)&map_fd[i - 4]);
 	goto close_prog_noerr;
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	bpf_object__close(obj);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/pkt_access.c b/tools/testing/selftests/bpf/prog_tests/pkt_access.c
index 4ecfd721a044..9ef4e4ffb379 100644
--- a/tools/testing/selftests/bpf/prog_tests/pkt_access.c
+++ b/tools/testing/selftests/bpf/prog_tests/pkt_access.c
@@ -10,7 +10,7 @@ void test_pkt_access(void)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c b/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
index ac0d43435806..c354b9d21f4f 100644
--- a/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
+++ b/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
@@ -10,7 +10,7 @@ void test_pkt_md_access(void)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
index e60cd5ff1f55..48a8cd144bd1 100644
--- a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
@@ -28,7 +28,7 @@ static void test_queue_stack_map_by_type(int type)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -44,7 +44,7 @@ static void test_queue_stack_map_by_type(int type)
 	for (i = 0; i < MAP_SIZE; i++) {
 		err = bpf_map_update_elem(map_in_fd, NULL, &vals[i], 0);
 		if (err) {
-			error_cnt++;
+			test__fail();
 			goto out;
 		}
 	}
diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index 4a4f428d1a78..f6987e3dd28c 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -11,7 +11,7 @@ void test_reference_tracking(void)
 
 	obj = bpf_object__open(file);
 	if (IS_ERR(obj)) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
index 114ebe6a438e..e843336713e8 100644
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
@@ -23,7 +23,7 @@ void test_spinlock(void)
 		       ret == (void *)&prog_fd);
 	goto close_prog_noerr;
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	bpf_object__close(obj);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
index fc539335c5b3..9dba1cc3da60 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
@@ -70,7 +70,7 @@ void test_stacktrace_map(void)
 
 	goto disable_pmu_noerr;
 disable_pmu:
-	error_cnt++;
+	test__fail();
 disable_pmu_noerr:
 	bpf_link__destroy(link);
 close_prog:
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
index fbfa8e76cf63..4e7cf2e663f7 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
@@ -60,7 +60,7 @@ void test_stacktrace_map_raw_tp(void)
 
 	goto close_prog_noerr;
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	if (!IS_ERR_OR_NULL(link))
 		bpf_link__destroy(link);
diff --git a/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c b/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
index 958a3d88de99..d9ad1aa8a026 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
@@ -72,7 +72,7 @@ void test_task_fd_query_rawtp(void)
 
 	goto close_prog_noerr;
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	bpf_object__close(obj);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c b/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
index f9b70e81682b..76209f2386c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
@@ -68,7 +68,7 @@ static void test_task_fd_query_tp_core(const char *probe_name,
 close_pmu:
 	close(pmu_fd);
 close_prog:
-	error_cnt++;
+	test__fail();
 close_prog_noerr:
 	bpf_object__close(obj);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_estats.c b/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
index bb8759d69099..e241e5d7c71f 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
@@ -11,7 +11,7 @@ void test_tcp_estats(void)
 	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
 	CHECK(err, "", "err %d errno %d\n", err, errno);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp.c b/tools/testing/selftests/bpf/prog_tests/xdp.c
index a74167289545..7c9f89fa1d02 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp.c
@@ -17,7 +17,7 @@ void test_xdp(void)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 922aa0a19764..a479a3303c3b 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -11,7 +11,7 @@ void test_xdp_adjust_tail(void)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
index 15f7c272edb0..10bef9d5ab81 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
@@ -32,7 +32,7 @@ void test_xdp_noinline(void)
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (err) {
-		error_cnt++;
+		test__fail();
 		return;
 	}
 
@@ -74,7 +74,7 @@ void test_xdp_noinline(void)
 		pkts += stats[i].pkts;
 	}
 	if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
-		error_cnt++;
+		test__fail();
 		printf("test_xdp_noinline:FAIL:stats %lld %lld\n",
 		       bytes, pkts);
 	}
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 1993f2ce0d23..ad90e45768ce 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -8,24 +8,6 @@
 
 /* defined in test_progs.h */
 struct test_env env;
-int error_cnt, pass_cnt;
-
-struct prog_test_def {
-	const char *test_name;
-	int test_num;
-	void (*run_test)(void);
-	bool force_log;
-	int pass_cnt;
-	int error_cnt;
-	bool tested;
-
-	const char *subtest_name;
-	int subtest_num;
-
-	/* store counts before subtest started */
-	int old_pass_cnt;
-	int old_error_cnt;
-};
 
 static bool should_run(struct test_selector *sel, int num, const char *name)
 {
@@ -74,14 +56,14 @@ static const char *test_status_string(bool success)
 void test__end_subtest()
 {
 	struct prog_test_def *test = env.test;
-	int sub_error_cnt = error_cnt - test->old_error_cnt;
+	int sub_fail_cnt = test->fail_cnt - test->old_fail_cnt;
 
-	if (sub_error_cnt)
-		env.fail_cnt++;
+	if (sub_fail_cnt)
+		test->fail_cnt++;
 	else
 		env.sub_succ_cnt++;
 
-	dump_test_log(test, sub_error_cnt);
+	dump_test_log(test, sub_fail_cnt);
 
 	fprintf(env.stdout, "#%3d/%-3d %4s %s:%s\n",
 		test->test_num, test->subtest_num,
@@ -111,8 +93,8 @@ bool test__start_subtest(const char *name)
 		return false;
 
 	test->subtest_name = name;
-	env.test->old_pass_cnt = pass_cnt;
-	env.test->old_error_cnt = error_cnt;
+	env.test->old_succ_cnt = env.test->succ_cnt;
+	env.test->old_fail_cnt = env.test->fail_cnt;
 
 	return true;
 }
@@ -126,6 +108,19 @@ void test__skip(void)
 	env.skip_cnt++;
 }
 
+void __test__fail(const char *file, int line)
+{
+	if (env.test->subtest_name)
+		fprintf(stderr, "%s:%s failed at %s:%d, errno=%d\n",
+			env.test->test_name, env.test->subtest_name,
+			file, line, errno);
+	else
+		fprintf(stderr, "%s failed at %s:%d, errno=%d\n",
+			env.test->test_name, file, line, errno);
+
+	env.test->fail_cnt++;
+}
+
 struct ipv4_packet pkt_v4 = {
 	.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
 	.iph.ihl = 5,
@@ -150,7 +145,7 @@ int bpf_find_map(const char *test, struct bpf_object *obj, const char *name)
 	map = bpf_object__find_map_by_name(obj, name);
 	if (!map) {
 		printf("%s:FAIL:map '%s' not found\n", test, name);
-		error_cnt++;
+		test__fail();
 		return -1;
 	}
 	return bpf_map__fd(map);
@@ -509,8 +504,6 @@ int main(int argc, char **argv)
 	stdio_hijack();
 	for (i = 0; i < prog_test_cnt; i++) {
 		struct prog_test_def *test = &prog_test_defs[i];
-		int old_pass_cnt = pass_cnt;
-		int old_error_cnt = error_cnt;
 
 		env.test = test;
 		test->test_num = i + 1;
@@ -525,14 +518,12 @@ int main(int argc, char **argv)
 			test__end_subtest();
 
 		test->tested = true;
-		test->pass_cnt = pass_cnt - old_pass_cnt;
-		test->error_cnt = error_cnt - old_error_cnt;
-		if (test->error_cnt)
+		if (test->fail_cnt)
 			env.fail_cnt++;
 		else
 			env.succ_cnt++;
 
-		dump_test_log(test, test->error_cnt);
+		dump_test_log(test, test->fail_cnt);
 
 		fprintf(env.stdout, "#%3d     %4s %s\n",
 			test->test_num,
@@ -546,5 +537,5 @@ int main(int argc, char **argv)
 	free(env.test_selector.num_set);
 	free(env.subtest_selector.num_set);
 
-	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
+	return env.fail_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
 }
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 9defd35cb6c0..7b05921784a4 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -38,7 +38,23 @@ typedef __u16 __sum16;
 #include "trace_helpers.h"
 #include "flow_dissector_load.h"
 
-struct prog_test_def;
+struct prog_test_def {
+	const char *test_name;
+	int test_num;
+	void (*run_test)(void);
+	bool force_log;
+	bool tested;
+
+	const char *subtest_name;
+	int subtest_num;
+
+	int succ_cnt;
+	int fail_cnt;
+
+	/* store counts before subtest started */
+	int old_succ_cnt;
+	int old_fail_cnt;
+};
 
 struct test_selector {
 	const char *name;
@@ -67,13 +83,13 @@ struct test_env {
 	int skip_cnt; /* skipped tests */
 };
 
-extern int error_cnt;
-extern int pass_cnt;
 extern struct test_env env;
 
 extern void test__force_log();
 extern bool test__start_subtest(const char *name);
 extern void test__skip(void);
+#define test__fail() __test__fail(__FILE__, __LINE__)
+extern void __test__fail(const char *file, int line);
 
 #define MAGIC_BYTES 123
 
@@ -96,11 +112,11 @@ extern struct ipv6_packet pkt_v6;
 #define _CHECK(condition, tag, duration, format...) ({			\
 	int __ret = !!(condition);					\
 	if (__ret) {							\
-		error_cnt++;						\
+		test__fail();						\
 		printf("%s:FAIL:%s ", __func__, tag);			\
 		printf(format);						\
 	} else {							\
-		pass_cnt++;						\
+		env.test->succ_cnt++;					\
 		printf("%s:PASS:%s %d nsec\n",				\
 		       __func__, tag, duration);			\
 	}								\
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next 2/4] selftests/bpf: test_progs: test__skip
From: Stanislav Fomichev @ 2019-08-14 16:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190814164742.208909-1-sdf@google.com>

Export test__skip() to indicate skipped tests and use it in
test_send_signal_nmi().

Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/prog_tests/send_signal.c | 1 +
 tools/testing/selftests/bpf/test_progs.c             | 9 +++++++--
 tools/testing/selftests/bpf/test_progs.h             | 2 ++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 1575f0a1f586..40c2c5efdd3e 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -204,6 +204,7 @@ static int test_send_signal_nmi(void)
 		if (errno == ENOENT) {
 			printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
 			       __func__);
+			test__skip();
 			return 0;
 		}
 		/* Let the test fail with a more informative message */
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 1a7a2a0c0a11..1993f2ce0d23 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -121,6 +121,11 @@ void test__force_log() {
 	env.test->force_log = true;
 }
 
+void test__skip(void)
+{
+	env.skip_cnt++;
+}
+
 struct ipv4_packet pkt_v4 = {
 	.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
 	.iph.ihl = 5,
@@ -535,8 +540,8 @@ int main(int argc, char **argv)
 			test->test_name);
 	}
 	stdio_restore();
-	printf("Summary: %d/%d PASSED, %d FAILED\n",
-	       env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
+	printf("Summary: %d/%d PASSED, %d SKIPPED, %d FAILED\n",
+	       env.succ_cnt, env.sub_succ_cnt, env.skip_cnt, env.fail_cnt);
 
 	free(env.test_selector.num_set);
 	free(env.subtest_selector.num_set);
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 37d427f5a1e5..9defd35cb6c0 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -64,6 +64,7 @@ struct test_env {
 	int succ_cnt; /* successful tests */
 	int sub_succ_cnt; /* successful sub-tests */
 	int fail_cnt; /* total failed tests + sub-tests */
+	int skip_cnt; /* skipped tests */
 };
 
 extern int error_cnt;
@@ -72,6 +73,7 @@ extern struct test_env env;
 
 extern void test__force_log();
 extern bool test__start_subtest(const char *name);
+extern void test__skip(void);
 
 #define MAGIC_BYTES 123
 
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next 1/4] selftests/bpf: test_progs: change formatting of the condenced output
From: Stanislav Fomichev @ 2019-08-14 16:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190814164742.208909-1-sdf@google.com>

This makes it visually simpler to follow the output.
Also, highlight with red color failures when outputting to tty.

Before:
  #1 attach_probe:FAIL
  #2 bpf_obj_id:OK
  #3/1 bpf_verif_scale:loop3.o:OK
  #3/2 bpf_verif_scale:test_verif_scale1.o:OK
  #3/3 bpf_verif_scale:test_verif_scale2.o:OK
  #3/4 bpf_verif_scale:test_verif_scale3.o:OK
  #3/5 bpf_verif_scale:pyperf50.o:OK
  #3/6 bpf_verif_scale:pyperf100.o:OK
  #3/7 bpf_verif_scale:pyperf180.o:OK
  #3/8 bpf_verif_scale:pyperf600.o:OK
  #3/9 bpf_verif_scale:pyperf600_nounroll.o:OK
  #3/10 bpf_verif_scale:loop1.o:OK
  #3/11 bpf_verif_scale:loop2.o:OK
  #3/12 bpf_verif_scale:loop4.o:OK
  #3/13 bpf_verif_scale:loop5.o:OK
  #3/14 bpf_verif_scale:strobemeta.o:OK
  #3/15 bpf_verif_scale:strobemeta_nounroll1.o:OK
  #3/16 bpf_verif_scale:strobemeta_nounroll2.o:OK
  #3/17 bpf_verif_scale:test_sysctl_loop1.o:OK
  #3/18 bpf_verif_scale:test_sysctl_loop2.o:OK
  #3/19 bpf_verif_scale:test_xdp_loop.o:OK
  #3/20 bpf_verif_scale:test_seg6_loop.o:OK
  #3 bpf_verif_scale:OK
  #4 flow_dissector:OK

After:
  #  1     FAIL attach_probe
  #  2       OK bpf_obj_id
  #  3/1     OK bpf_verif_scale:loop3.o
  #  3/2     OK bpf_verif_scale:test_verif_scale1.o
  #  3/3     OK bpf_verif_scale:test_verif_scale2.o
  #  3/4     OK bpf_verif_scale:test_verif_scale3.o
  #  3/5     OK bpf_verif_scale:pyperf50.o
  #  3/6     OK bpf_verif_scale:pyperf100.o
  #  3/7     OK bpf_verif_scale:pyperf180.o
  #  3/8     OK bpf_verif_scale:pyperf600.o
  #  3/9     OK bpf_verif_scale:pyperf600_nounroll.o
  #  3/10    OK bpf_verif_scale:loop1.o
  #  3/11    OK bpf_verif_scale:loop2.o
  #  3/12    OK bpf_verif_scale:loop4.o
  #  3/13    OK bpf_verif_scale:loop5.o
  #  3/14    OK bpf_verif_scale:strobemeta.o
  #  3/15    OK bpf_verif_scale:strobemeta_nounroll1.o
  #  3/16    OK bpf_verif_scale:strobemeta_nounroll2.o
  #  3/17    OK bpf_verif_scale:test_sysctl_loop1.o
  #  3/18    OK bpf_verif_scale:test_sysctl_loop2.o
  #  3/19    OK bpf_verif_scale:test_xdp_loop.o
  #  3/20    OK bpf_verif_scale:test_seg6_loop.o
  #  3       OK bpf_verif_scale
  #  4       OK flow_dissector

Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/test_progs.c | 29 +++++++++++++++++++-----
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 12895d03d58b..1a7a2a0c0a11 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -56,6 +56,21 @@ static void dump_test_log(const struct prog_test_def *test, bool failed)
 	fseeko(stdout, 0, SEEK_SET); /* rewind */
 }
 
+static const char *test_status_string(bool success)
+{
+#define COLOR_RED	"\033[31m"
+#define COLOR_RESET	"\033[m"
+	if (success)
+		return "OK";
+
+	if (isatty(fileno(env.stdout)))
+		return COLOR_RED "FAIL" COLOR_RESET;
+	else
+		return "FAIL";
+#undef COLOR_RED
+#undef COLOR_RESET
+}
+
 void test__end_subtest()
 {
 	struct prog_test_def *test = env.test;
@@ -68,9 +83,10 @@ void test__end_subtest()
 
 	dump_test_log(test, sub_error_cnt);
 
-	fprintf(env.stdout, "#%d/%d %s:%s\n",
-	       test->test_num, test->subtest_num,
-	       test->subtest_name, sub_error_cnt ? "FAIL" : "OK");
+	fprintf(env.stdout, "#%3d/%-3d %4s %s:%s\n",
+		test->test_num, test->subtest_num,
+		test_status_string(test->fail_cnt == 0),
+		test->test_name, test->subtest_name);
 }
 
 bool test__start_subtest(const char *name)
@@ -513,9 +529,10 @@ int main(int argc, char **argv)
 
 		dump_test_log(test, test->error_cnt);
 
-		fprintf(env.stdout, "#%d %s:%s\n",
-			test->test_num, test->test_name,
-			test->error_cnt ? "FAIL" : "OK");
+		fprintf(env.stdout, "#%3d     %4s %s\n",
+			test->test_num,
+			test_status_string(test->fail_cnt == 0),
+			test->test_name);
 	}
 	stdio_restore();
 	printf("Summary: %d/%d PASSED, %d FAILED\n",
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next 0/4] selftests/bpf: test_progs: misc fixes
From: Stanislav Fomichev @ 2019-08-14 16:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko

* make output a bit easier to follow
* add test__skip to indicate skipped tests
* remove global success/error counts (use environment)
* remove asserts from the tests

Cc: Andrii Nakryiko <andriin@fb.com>

Stanislav Fomichev (4):
  selftests/bpf: test_progs: change formatting of the condenced output
  selftests/bpf: test_progs: test__skip
  selftests/bpf: test_progs: remove global fail/success counts
  selftests/bpf: test_progs: remove asserts from subtests

 .../selftests/bpf/prog_tests/bpf_obj_id.c     | 32 +++++--
 .../bpf/prog_tests/bpf_verif_scale.c          | 10 +-
 .../selftests/bpf/prog_tests/flow_dissector.c |  2 +-
 .../bpf/prog_tests/get_stack_raw_tp.c         |  2 +-
 .../selftests/bpf/prog_tests/global_data.c    | 10 +-
 .../selftests/bpf/prog_tests/l4lb_all.c       |  4 +-
 .../selftests/bpf/prog_tests/map_lock.c       | 28 +++---
 .../selftests/bpf/prog_tests/pkt_access.c     |  2 +-
 .../selftests/bpf/prog_tests/pkt_md_access.c  |  2 +-
 .../bpf/prog_tests/queue_stack_map.c          |  4 +-
 .../bpf/prog_tests/reference_tracking.c       |  2 +-
 .../selftests/bpf/prog_tests/send_signal.c    |  1 +
 .../selftests/bpf/prog_tests/spinlock.c       | 12 ++-
 .../bpf/prog_tests/stacktrace_build_id.c      | 11 ++-
 .../bpf/prog_tests/stacktrace_build_id_nmi.c  | 11 ++-
 .../selftests/bpf/prog_tests/stacktrace_map.c |  2 +-
 .../bpf/prog_tests/stacktrace_map_raw_tp.c    |  2 +-
 .../bpf/prog_tests/task_fd_query_rawtp.c      |  2 +-
 .../bpf/prog_tests/task_fd_query_tp.c         |  2 +-
 .../selftests/bpf/prog_tests/tcp_estats.c     |  2 +-
 tools/testing/selftests/bpf/prog_tests/xdp.c  |  2 +-
 .../bpf/prog_tests/xdp_adjust_tail.c          |  2 +-
 .../selftests/bpf/prog_tests/xdp_noinline.c   |  4 +-
 tools/testing/selftests/bpf/test_progs.c      | 93 +++++++++++--------
 tools/testing/selftests/bpf/test_progs.h      | 28 +++++-
 25 files changed, 165 insertions(+), 107 deletions(-)

-- 
2.23.0.rc1.153.gdeed80330f-goog

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Edward Cree @ 2019-08-14 16:45 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <ab11a9f2-0fbd-d35f-fee1-784554a2705a@netronome.com>

On 14/08/2019 10:42, Quentin Monnet wrote:
> 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
> <alexei.starovoitov@gmail.com>
>> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?
> To some extent (with subtleties for some other map types); and we use a
> similar command line as a workaround for now. But because of the rate of
> inserts/deletes in the map, the process often reports a number higher
> than the max number of entries (we observed up to ~750k when max_entries
> is 500k), even is the map is only half-full on average during the count.
> On the worst case (though not frequent), an entry is deleted just before
> we get the next key from it, and iteration starts all over again. This
> is not reliable to determine how much space is left in the map.
>
> I cannot see a solution that would provide a more accurate count from
> user space, when the map is under pressure?
This might be a really dumb suggestion, but: you're wanting to collect a
 summary statistic over an in-kernel data structure in a single syscall,
 because making a series of syscalls to examine every entry is slow and
 racy.  Isn't that exactly a job for an in-kernel virtual machine, and
 could you not supply an eBPF program which the kernel runs on each entry
 in the map, thus supporting people who want to calculate something else
 (mean, min and max, whatever) instead of count?

^ permalink raw reply

* Re: [PATCH net-next] mcast: ensure L-L IPv6 packets are accepted by bridge
From: Patrick Ruddy @ 2019-08-14 16:40 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Ido Schimmel; +Cc: netdev, roopa, linus.luessing
In-Reply-To: <43ed59db-9228-9132-b9a5-31c8d1e8e9e9@cumulusnetworks.com>

Thanks both for the quick replies, answers inline...

On Wed, 2019-08-14 at 02:55 +0300, Nikolay Aleksandrov wrote:
> On 8/13/19 10:53 PM, Ido Schimmel wrote:
> > + Bridge maintainers, Linus
> > 
> 
> Good catch Ido, thanks!
> First I'd say the subject needs to reflect that this is a bridge change
> better, please rearrange it like so - bridge: mcast: ...
> More below,
> 
> > On Tue, Aug 13, 2019 at 03:18:04PM +0100, Patrick Ruddy wrote:
> > > At present only all-nodes IPv6 multicast packets are accepted by
> > > a bridge interface that is not in multicast router mode. Since
> > > other protocols can be running in the absense of multicast
> > > forwarding e.g. OSPFv3 IPv6 ND. Change the test to allow
> > > all of the FFx2::/16 range to be accepted when not in multicast
> > > router mode. This aligns the code with IPv4 link-local reception
> > > and RFC4291
> > 
> > Can you please quote the relevant part from RFC 4291?
> > 
> > > Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
> > > ---
> > >  include/net/addrconf.h    | 15 +++++++++++++++
> > >  net/bridge/br_multicast.c |  2 +-
> > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> > > index becdad576859..05b42867e969 100644
> > > --- a/include/net/addrconf.h
> > > +++ b/include/net/addrconf.h
> > > @@ -434,6 +434,21 @@ static inline void addrconf_addr_solict_mult(const struct in6_addr *addr,
> > >  		      htonl(0xFF000000) | addr->s6_addr32[3]);
> > >  }
> > >  
> > > +/*
> > > + *      link local multicast address range ffx2::/16 rfc4291
> > > + */
> > > +static inline bool ipv6_addr_is_ll_mcast(const struct in6_addr *addr)
> > > +{
> > > +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
> > > +	__be64 *p = (__be64 *)addr;
> > > +	return ((p[0] & cpu_to_be64(0xff0f000000000000UL))
> > > +		^ cpu_to_be64(0xff02000000000000UL)) == 0UL;
> > > +#else
> > > +	return ((addr->s6_addr32[0] & htonl(0xff0f0000)) ^
> > > +		htonl(0xff020000)) == 0;
> > > +#endif
> > > +}
> > > +
> > >  static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr)
> > >  {
> > >  #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
> > > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > > index 9b379e110129..ed3957381fa2 100644
> > > --- a/net/bridge/br_multicast.c
> > > +++ b/net/bridge/br_multicast.c
> > > @@ -1664,7 +1664,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
> > >  	err = ipv6_mc_check_mld(skb);
> > >  
> > >  	if (err == -ENOMSG) {
> > > -		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
> > > +		if (!ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
> > >  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
> > 
> > IIUC, you want IPv6 link-local packets to be locally received, but this
> > also changes how these packets are flooded. RFC 4541 says that packets
> 
> Indeed, we'll start flooding them all, not just the all hosts address.
> If that is at all required it'll definitely have to be optional.
> 
> > addressed to the all hosts address are a special case and should be
> > forwarded to all ports:
> > 
> > "In IPv6, the data forwarding rules are more straight forward because MLD is
> > mandated for addresses with scope 2 (link-scope) or greater. The only exception
> > is the address FF02::1 which is the all hosts link-scope address for which MLD
> > messages are never sent. Packets with the all hosts link-scope address should
> > be forwarded on all ports."
> > 
> 
> I wonder what is the problem for the host to join such group on behalf of the bridge ?
> Then you'll receive the traffic at least locally and the RFC says it itself - MLD is mandated
> for the other link-local addresses.
> It's very late here and maybe I'm missing something.. :)
> 
The group is being joined by MLD at the L3 level but the packets are
not being passed up to the l3 interface becasue there is a MLD querier
on the network

snippet from /proc/net/igmp6
...
40   sw1             ff0200000000000000000001ff008700     1 00000004 0
40   sw1             ff020000000000000000000000000002     1 00000004 0
40   sw1             ff020000000000000000000000000001     1 0000000C 0
40   sw1             ff010000000000000000000000000001     1 00000008 0
41   lo1             ff020000000000000000000000000001     1 0000000C 0
41   lo1             ff010000000000000000000000000001     1 00000008 0
42   sw1.1           ff020000000000000000000000000006     1 00000004 0
42   sw1.1           ff020000000000000000000000000005     1 00000004 0
42   sw1.1           ff0200000000000000000001ff000000     2 00000004 0
42   sw1.1           ff0200000000000000000001ff008700     1 00000004 0
42   sw1.1           ff0200000000000000000001ff000099     1 00000004 0
42   sw1.1           ff020000000000000000000000000002     1 00000004 0
42   sw1.1           ff020000000000000000000000000001     1 0000000C 0
42   sw1.1           ff010000000000000000000000000001     1 00000008 0
...

the bridge is sw1 and the l3 intervace is sw1.1

Ido is correct about the flooding - I will update the patch with the
comments and reissue.

Thanks again

-pr
>  
> > Maybe you want something like:
> > 
> 
> I think we can do without the new field, either pass local_rcv into br_multicast_rcv() or
> set it based on return value. The extra test will have to remain unfortunately, but we
> can reduce the tests by one if carefully done.
> 
> > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> > index 09b1dd8cd853..9f312a73f61c 100644
> > --- a/net/bridge/br_input.c
> > +++ b/net/bridge/br_input.c
> > @@ -132,7 +132,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
> >  		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> >  		    br_multicast_querier_exists(br, eth_hdr(skb))) {
> >  			if ((mdst && mdst->host_joined) ||
> > -			    br_multicast_is_router(br)) {
> > +			    br_multicast_is_router(br) ||
> > +			    BR_INPUT_SKB_CB_LOCAL_RECEIVE(skb)) {
> >  				local_rcv = true;
> >  				br->dev->stats.multicast++;
> >  			}
> > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > index 9b379e110129..f03cecf6174e 100644
> > --- a/net/bridge/br_multicast.c
> > +++ b/net/bridge/br_multicast.c
> > @@ -1667,6 +1667,9 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
> >  		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
> >  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
> >  
> > +		if (ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
> > +			BR_INPUT_SKB_CB(skb)->local_receive = 1;
> > +
> >  		if (ipv6_addr_is_all_snoopers(&ipv6_hdr(skb)->daddr)) {
> >  			err = br_ip6_multicast_mrd_rcv(br, port, skb);
> >  
> > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> > index b7a4942ff1b3..d76394ca4059 100644
> > --- a/net/bridge/br_private.h
> > +++ b/net/bridge/br_private.h
> > @@ -426,6 +426,7 @@ struct br_input_skb_cb {
> >  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> >  	u8 igmp;
> >  	u8 mrouters_only:1;
> > +	u8 local_receive:1;
> >  #endif
> >  	u8 proxyarp_replied:1;
> >  	u8 src_port_isolated:1;
> > @@ -445,8 +446,10 @@ struct br_input_skb_cb {
> >  
> >  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> >  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(BR_INPUT_SKB_CB(__skb)->mrouters_only)
> > +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(BR_INPUT_SKB_CB(__skb)->local_receive)
> >  #else
> >  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(0)
> > +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(0)
> >  #endif
> >  
> >  #define br_printk(level, br, format, args...)	\
> > 


^ permalink raw reply

* Re: [PATCH net] batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
From: David Miller @ 2019-08-14 16:36 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, syzkaller, mareklindner, sw, a
In-Reply-To: <20190812115727.72149-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 12 Aug 2019 04:57:27 -0700

> batadv_netlink_get_ifindex() needs to make sure user passed
> a correct u32 attribute.
 ...
> Fixes: b60620cf567b ("batman-adv: netlink: hardif query")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Simon, I assume I will get this ultimately from you.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next,v4 07/12] net: sched: use flow block API
From: Edward Cree @ 2019-08-14 16:32 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20190709205550.3160-8-pablo@netfilter.org>

On 09/07/2019 21:55, Pablo Neira Ayuso wrote:
> This patch adds tcf_block_setup() which uses the flow block API.
>
> This infrastructure takes the flow block callbacks coming from the
> driver and register/unregister to/from the cls_api core.
>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> <snip>
> @@ -796,13 +804,20 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
>  				 struct netlink_ext_ack *extack)
>  {
>  	struct tc_block_offload bo = {};
> +	int err;
>  
>  	bo.net = dev_net(dev);
>  	bo.command = command;
>  	bo.binder_type = ei->binder_type;
>  	bo.block = block;
>  	bo.extack = extack;
> -	return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
> +	INIT_LIST_HEAD(&bo.cb_list);
> +
> +	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
> +	if (err < 0)
> +		return err;
> +
> +	return tcf_block_setup(block, &bo);
>  }
>  
>  static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
> @@ -1636,6 +1651,77 @@ void tcf_block_cb_unregister(struct tcf_block *block,
>  }
>  EXPORT_SYMBOL(tcf_block_cb_unregister);
>  
> +static int tcf_block_bind(struct tcf_block *block,
> +			  struct flow_block_offload *bo)
> +{
> +	struct flow_block_cb *block_cb, *next;
> +	int err, i = 0;
> +
> +	list_for_each_entry(block_cb, &bo->cb_list, list) {
> +		err = tcf_block_playback_offloads(block, block_cb->cb,
> +						  block_cb->cb_priv, true,
> +						  tcf_block_offload_in_use(block),
> +						  bo->extack);
> +		if (err)
> +			goto err_unroll;
> +
> +		i++;
> +	}
> +	list_splice(&bo->cb_list, &block->cb_list);
> +
> +	return 0;
> +
> +err_unroll:
> +	list_for_each_entry_safe(block_cb, next, &bo->cb_list, list) {
> +		if (i-- > 0) {
> +			list_del(&block_cb->list);
> +			tcf_block_playback_offloads(block, block_cb->cb,
> +						    block_cb->cb_priv, false,
> +						    tcf_block_offload_in_use(block),
> +						    NULL);
> +		}
> +		flow_block_cb_free(block_cb);
> +	}
> +
> +	return err;
> +}
Why has the replay been moved from the function called by the driver
 (__tcf_block_cb_register()) to work done by the driver's caller based on
 what the driver has left on this flow_block_offload.cb_list?  This makes
 it impossible for the driver to (say) unregister a block outside of an
 explicit request from ndo_setup_tc().
In my under-development driver, I have a teardown path called on PCI
 remove, which calls tcf_block_cb_unregister() on all my block bindings
 (of which the driver keeps track), to ensure that no flow rules are still
 in place when unregister_netdev() is called; this is needed because some
 of the driver's state for certain rules involves taking a reference on
 the netdevice (dev_hold()).  Your structural changes here make that
 impossible; is there any reason why they're necessary?

-Ed

^ permalink raw reply

* Re: [PATCH RFC 2/4] net: phy: allow to bind genphy driver at probe time
From: Florian Fainelli @ 2019-08-14 16:30 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <7225e653-6f93-63fc-8d61-a712318d1949@gmail.com>

On 8/13/19 4:02 PM, Heiner Kallweit wrote:
> On 14.08.2019 00:53, Florian Fainelli wrote:
>> On 8/13/19 2:25 PM, Heiner Kallweit wrote:
>>> In cases like a fixed phy that is never attached to a net_device we
>>> may want to bind the genphy driver at probe time. Setting a PHY ID of
>>> 0xffffffff to bind the genphy driver would fail due to a check in
>>> get_phy_device(). Therefore let's change the PHY ID the genphy driver
>>> binds to to 0xfffffffe. This still shouldn't match any real PHY,
>>> and it will pass the check in get_phy_devcie().
>>>
>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>> ---
>>>  drivers/net/phy/phy_device.c | 3 +--
>>>  include/linux/phy.h          | 4 ++++
>>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 163295dbc..54f80af31 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -2388,8 +2388,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n)
>>>  EXPORT_SYMBOL(phy_drivers_unregister);
>>>  
>>>  static struct phy_driver genphy_driver = {
>>> -	.phy_id		= 0xffffffff,
>>> -	.phy_id_mask	= 0xffffffff,
>>> +	PHY_ID_MATCH_EXACT(GENPHY_ID),
>>>  	.name		= "Generic PHY",
>>>  	.soft_reset	= genphy_no_soft_reset,
>>>  	.get_features	= genphy_read_abilities,
>>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>>> index 5ac7d2137..3b07bce78 100644
>>> --- a/include/linux/phy.h
>>> +++ b/include/linux/phy.h
>>> @@ -37,6 +37,10 @@
>>>  #define PHY_1000BT_FEATURES	(SUPPORTED_1000baseT_Half | \
>>>  				 SUPPORTED_1000baseT_Full)
>>>  
>>> +#define GENPHY_ID_HIGH		0xffffU
>>> +#define GENPHY_ID_LOW		0xfffeU
>>> +#define GENPHY_ID		((GENPHY_ID_HIGH << 16) | GENPHY_ID_LOW)
>>
>> This is a possible user ABI change here, if there is anything that
>> relies on reading 0xffff_ffff as a valid PHY OUI, you would be breaking
>> it. We might as well try to assign ourselves a specific PHY OUI, very
>> much like the Linux USB hubs show up with a Linux Foundation vendor ID.
>>
> 
> I see the point. However in get_phy_device() we have the following check
> that should cause a PHY with ID 0xffff_ffff to be ignored. Therefore
> I doubt there's any such PHY ID in use.
> 
> 	/* If the phy_id is mostly Fs, there is no device there */
> 	if ((phy_id & 0x1fffffff) == 0x1fffffff)
> 		return ERR_PTR(-ENODEV);

Indeed, it looks like the phy_id reported through sysfs for fixed PHY is
actually 0, so your change should be fine then, thanks!
-- 
Florian

^ permalink raw reply

* [PATCH] lan78xx: Fix memory leaks
From: Wenwen Wang @ 2019-08-14 16:23 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Woojung Huh, Microchip Linux Driver Support, David S. Miller,
	open list:USB LAN78XX ETHERNET DRIVER,
	open list:USB NETWORKING DRIVERS, open list

In lan78xx_probe(), a new urb is allocated through usb_alloc_urb() and
saved to 'dev->urb_intr'. However, in the following execution, if an error
occurs, 'dev->urb_intr' is not deallocated, leading to memory leaks. To fix
this issue, invoke usb_free_urb() to free the allocated urb before
returning from the function.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
---
 drivers/net/usb/lan78xx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 3d92ea6..f033fee 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3792,7 +3792,7 @@ static int lan78xx_probe(struct usb_interface *intf,
 	ret = register_netdev(netdev);
 	if (ret != 0) {
 		netif_err(dev, probe, netdev, "couldn't register the device\n");
-		goto out3;
+		goto out4;
 	}
 
 	usb_set_intfdata(intf, dev);
@@ -3807,12 +3807,14 @@ static int lan78xx_probe(struct usb_interface *intf,
 
 	ret = lan78xx_phy_init(dev);
 	if (ret < 0)
-		goto out4;
+		goto out5;
 
 	return 0;
 
-out4:
+out5:
 	unregister_netdev(netdev);
+out4:
+	usb_free_urb(dev->urb_intr);
 out3:
 	lan78xx_unbind(dev, intf);
 out2:
-- 
2.7.4


^ permalink raw reply related

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-14 16:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, Kirti Wankhede, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cjia@nvidia.com, Jiri Pirko,
	netdev@vger.kernel.org
In-Reply-To: <20190814085746.26b5f2a3@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, August 14, 2019 8:28 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Cornelia Huck <cohuck@redhat.com>; Kirti Wankhede
> <kwankhede@nvidia.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia@nvidia.com; Jiri Pirko <jiri@mellanox.com>;
> netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Wed, 14 Aug 2019 13:45:49 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: Wednesday, August 14, 2019 6:39 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; kvm@vger.kernel.org; linux-
> > > kernel@vger.kernel.org; cjia@nvidia.com; Jiri Pirko
> > > <jiri@mellanox.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Wed, 14 Aug 2019 12:27:01 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > + Jiri, + netdev
> > > > To get perspective on the ndo->phys_port_name for the representor
> > > > netdev
> > > of mdev.
> > > >
> > > > Hi Cornelia,
> > > >
> > > > > -----Original Message-----
> > > > > From: Cornelia Huck <cohuck@redhat.com>
> > > > > Sent: Wednesday, August 14, 2019 1:32 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Kirti Wankhede
> > > > > <kwankhede@nvidia.com>; kvm@vger.kernel.org; linux-
> > > > > kernel@vger.kernel.org; cjia@nvidia.com
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > On Wed, 14 Aug 2019 05:54:36 +0000 Parav Pandit
> > > > > <parav@mellanox.com> wrote:
> > > > >
> > > > > > > > I get that part. I prefer to remove the UUID itself from
> > > > > > > > the structure and therefore removing this API makes lot more
> sense?
> > > > > > >
> > > > > > > Mdev and support tools around mdev are based on UUIDs
> > > > > > > because it's
> > > > > defined
> > > > > > > in the documentation.
> > > > > > When we introduce newer device naming scheme, it will update
> > > > > > the
> > > > > documentation also.
> > > > > > May be that is the time to move to .rst format too.
> > > > >
> > > > > You are aware that there are existing tools that expect a uuid
> > > > > naming scheme, right?
> > > > >
> > > > Yes, Alex mentioned too.
> > > > The good tool that I am aware of is [1], which is 4 months old.
> > > > Not sure if it is
> > > part of any distros yet.
> > > >
> > > > README also says, that it is in 'early in development. So we have
> > > > scope to
> > > improve it for non UUID names, but lets discuss that more below.
> > >
> > > The up-to-date reference for mdevctl is
> > > https://github.com/mdevctl/mdevctl. There is currently an effort to
> > > get this packaged in Fedora.
> > >
> > Awesome.
> >
> > > >
> > > > > >
> > > > > > > I don't think it's as simple as saying "voila, UUID
> > > > > > > dependencies are removed, users are free to use arbitrary
> > > > > > > strings".  We'd need to create some kind of naming policy,
> > > > > > > what characters are allows so that we can potentially expand
> > > > > > > the creation parameters as has been proposed a couple times,
> > > > > > > how do we deal with collisions and races, and why should we
> > > > > > > make such a change when a UUID is a perfectly reasonable
> > > > > > > devices name.  Thanks,
> > > > > > >
> > > > > > Sure, we should define a policy on device naming to be more relaxed.
> > > > > > We have enough examples in-kernel.
> > > > > > Few that I am aware of are netdev (vxlan, macvlan, ipvlan, lot
> > > > > > more), rdma
> > > > > etc which has arbitrary device names and ID based device names.
> > > > > >
> > > > > > Collisions and race is already taken care today in the mdev core.
> > > > > > Same
> > > > > unique device names continue.
> > > > >
> > > > > I'm still completely missing a rationale _why_ uuids are
> > > > > supposedly bad/restricting/etc.
> > > > There is nothing bad about uuid based naming.
> > > > Its just too long name to derive phys_port_name of a netdev.
> > > > In details below.
> > > >
> > > > For a given mdev of networking type, we would like to have
> > > > (a) representor netdevice [2]
> > > > (b) associated devlink port [3]
> > > >
> > > > Currently these representor netdevice exist only for the PCIe SR-IOV VFs.
> > > > It is further getting extended for mdev without SR-IOV.
> > > >
> > > > Each of the devlink port is attached to representor netdevice [4].
> > > >
> > > > This netdevice phys_port_name should be a unique derived from some
> > > property of mdev.
> > > > Udev/systemd uses phys_port_name to derive unique representor
> > > > netdev
> > > name.
> > > > This netdev name is further use by orchestration and switching
> > > > software in
> > > user space.
> > > > One such distro supported switching software is ovs [4], which
> > > > relies on the
> > > persistent device name of the representor netdevice.
> > >
> > > Ok, let me rephrase this to check that I understand this correctly.
> > > I'm not sure about some of the terms you use here (even after
> > > looking at the linked doc/code), but that's probably still ok.
> > >
> > > We want to derive an unique (and probably persistent?) netdev name
> > > so that userspace can refer to a representor netdevice. Makes sense.
> > > For generating that name, udev uses the phys_port_name (which
> > > represents the devlink port, IIUC). Also makes sense.
> > >
> > You understood it correctly.
> >
> > > >
> > > > phys_port_name has limitation to be only 15 characters long.
> > > > UUID doesn't fit in phys_port_name.
> > >
> > > Understood. But why do we need to derive the phys_port_name from the
> > > mdev device name? This netdevice use case seems to be just one use
> > > case for using mdev devices? If this is a specialized mdev type for
> > > this setup, why not just expose a shorter identifier via an extra attribute?
> > >
> > Representor netdev, represents mdev's switch port (like PCI SRIOV VF's switch
> port).
> > So user must be able to relate this two objects in similar manner as SRIOV
> VFs.
> > Phys_port_name is derived from the PCI PF and VF numbering scheme.
> > Similarly mdev's such port should be derived from mdev's id/name/attribute.
> >
> > > > Longer UUID names are creating snow ball effect, not just in
> > > > networking stack
> > > but many user space tools too.
> > >
> > > This snowball effect mainly comes from the device name ->
> > > phys_port_name setup, IIUC.
> > >
> > Right.
> >
> > > > (as opposed to recently introduced mdevctl, are they more mdev
> > > > tools which has dependency on UUID name?)
> > >
> > > I am aware that people have written scripts etc. to manage their mdevs.
> > > Given that the mdev infrastructure has been around for quite some
> > > time, I'd say the chance of some of those scripts relying on uuid names is
> non-zero.
> > >
> > Ok. but those scripts have never managed networking devices.
> > So those scripts won't break because they will always create mdev devices
> using UUID.
> > When they use these new networking devices, they need more things than
> their scripts.
> > So user space upgrade for such mixed mode case is reasonable.
> 
> Tools like mdevctl are agnostic of the type of mdev device they're managing, it
> shouldn't matter than they've never managed a networking mdev previously, it
> follows the standards of mdev management.
> 
> > > >
> > > > Instead of mdev subsystem creating such effect, one option we are
> > > considering is to have shorter mdev names.
> > > > (Similar to netdev, rdma, nvme devices).
> > > > Such as mdev1, mdev2000 etc.
> 
> Note that these are kernel generated names, as are the other examples.
No. I probably gave the wrong examples.
Mdev user provided names can be 'foo', 'bar', 'foo1'.

> In the case of mdev, the user is providing the UUID, which becomes the device
> name.  When a user writes to the create attribute, there needs to be
> determinism that the user can identify the device they created vs another that
> may have been created concurrently.  I don't see that we can put users in the
> path of managing device instance numbers.
No. Its just user provided names.

> 
> > > > Second option I was considering is to have an optional alias for
> > > > UUID based
> > > mdev.
> > > > This name alias is given at time of mdev creation.
> > > > Devlink port's phys_port_name is derived out of this shorter mdev
> > > > name
> > > alias.
> > > > This way, mdev remains to be UUID based with optional extension.
> > > > However, I prefer first option to relax mdev naming scheme.
> > >
> > > Actually, I think that second option makes much more sense, as you
> > > avoid potentially breaking existing tooling.
> > Let's first understand of what exactly will break with existing tool
> > if they see non_uuid based device.
> 
> Do we really want a mixed namespace of device names, some UUID, some...
> something else?  That seems like a mess.
> 
So you prefer alias as an attribute? If so, it should be an optional additional parameter during create time, 
because it is desired to not invent new callbacks for such attributes setting and (and rewrite them).

> > Existing tooling continue to work with UUID devices.
> > Do you have example of what can break if they see non_uuid based
> > device name? I think you are clear, but to be sure, UUID based
> > creation will continue to be there. Optionally mdev will be created
> > with alpha-numeric string, if we don't it as additional attribute.
> 
> I'm not onboard with a UUID being just one of the possible naming strings via
> which we can create mdev devices.  I think that becomes untenable for
> userspace.  I don't think a sufficient argument has been made against the alias
> approach, which seems to keep the UUID as a canonical name, providing a
> consistent namespace, augmented with user or kernel provided short alias.
> Thanks,
> 
If I understand you correctly, you prefer alias name approach to keep UUID naming scheme intact in mdev?

> Alex
> 
> > > >
> > > > > We want to uniquely identify a device, across different types of
> > > > > vendor drivers. An uuid is a unique identifier and even a
> > > > > well-defined one. Tools (e.g. mdevctl) are relying on it for
> > > > > mdev devices
> > > today.
> > > > >
> > > > > What is the problem you're trying to solve?
> > > > Unique device naming is still achieved without UUID scheme by
> > > > various
> > > subsystems in kernel using alpha-numeric string.
> > > > Having such string based continue to provide unique names.
> > > >
> > > > I hope I described the problem and two solutions above.
> > > >
> > > > [1] https://github.com/awilliam/mdevctl
> > > > [2]
> > > > https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/net/ether
> > > > net/
> > > > mellanox/mlx5/core/en_rep.c [3]
> > > > http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > [4]
> > > > https://elixir.bootlin.com/linux/v5.3-rc4/source/net/core/devlink.
> > > > c#L6
> > > > 921
> > > > [5] https://www.openvswitch.org/
> > > >
> >


^ permalink raw reply

* Re: [PATCH bpf-next 1/3] libbpf: add asm/unistd.h to xsk to get __NR_mmap2
From: Yonghong Song @ 2019-08-14 16:17 UTC (permalink / raw)
  To: Björn Töpel, Andrii Nakryiko, Magnus Karlsson,
	Björn Töpel, David S. Miller, Jesper Dangaard Brouer,
	john fastabend, Jakub Kicinski, Daniel Borkmann, Networking, bpf,
	Xdp, open list
In-Reply-To: <CAJ+HfNiqu7WEoBFnfK3znU4tVyAmpPVabTjTSKH1ZVo2W1rrXg@mail.gmail.com>



On 8/14/19 6:32 AM, Björn Töpel wrote:
> On Wed, 14 Aug 2019 at 13:57, Ivan Khoronzhuk
> <ivan.khoronzhuk@linaro.org> wrote:
>>
>> On Wed, Aug 14, 2019 at 12:24:05PM +0300, Ivan Khoronzhuk wrote:
>>> On Tue, Aug 13, 2019 at 04:38:13PM -0700, Andrii Nakryiko wrote:
>>>
>>> Hi, Andrii
>>>
>>>> On Tue, Aug 13, 2019 at 3:24 AM Ivan Khoronzhuk
>>>> <ivan.khoronzhuk@linaro.org> wrote:
>>>>>
>>>>> That's needed to get __NR_mmap2 when mmap2 syscall is used.
>>>>>
>>>>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>>>> ---
>>>>> tools/lib/bpf/xsk.c | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
>>>>> index 5007b5d4fd2c..f2fc40f9804c 100644
>>>>> --- a/tools/lib/bpf/xsk.c
>>>>> +++ b/tools/lib/bpf/xsk.c
>>>>> @@ -12,6 +12,7 @@
>>>>> #include <stdlib.h>
>>>>> #include <string.h>
>>>>> #include <unistd.h>
>>>>> +#include <asm/unistd.h>
>>>>
>>>> asm/unistd.h is not present in Github libbpf projection. Is there any
>>>
>>> Look on includes from
>>> tools/lib/bpf/libpf.c
>>> tools/lib/bpf/bpf.c
>>>
>>> That's how it's done... Copping headers to arch/arm will not
>>> solve this, it includes both of them anyway, and anyway it needs
>>> asm/unistd.h inclusion here, only because xsk.c needs __NR_*
>>>
>>>
>>
>> There is one more radical solution for this I can send, but I'm not sure how it
>> can impact on other syscals/arches...
>>
>> Looks like:
>>
>>
>> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
>> index 9312066a1ae3..8b2f8ff7ce44 100644
>> --- a/tools/lib/bpf/Makefile
>> +++ b/tools/lib/bpf/Makefile
>> @@ -113,6 +113,7 @@ override CFLAGS += -Werror -Wall
>>   override CFLAGS += -fPIC
>>   override CFLAGS += $(INCLUDES)
>>   override CFLAGS += -fvisibility=hidden
>> +override CFLAGS += -D_FILE_OFFSET_BITS=64
>>
> 
> Hmm, isn't this glibc-ism? Does is it work for, say, musl or bionic?
> 
> If this is portable, and works on 32-, and 64-bit archs, I'm happy
> with the patch. :-)

Second here. Looks defining -D_FILE_OFFSET_BITS=64 is a well known
fix for 32bit system to deal with files > 2GB.
I remembered I used it in distant past. The below link
also explains the case.
https://digital-domain.net/largefiles.html

Testing on musl is necessary as Arnaldo's perf test suite
indeed tested it. Probably bionic too, not really familiar with that.

> 
> 
> Björn
> 
>>   ifeq ($(VERBOSE),1)
>>     Q =
>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
>> index f2fc40f9804c..ff2d03b8380d 100644
>> --- a/tools/lib/bpf/xsk.c
>> +++ b/tools/lib/bpf/xsk.c
>> @@ -75,23 +75,6 @@ struct xsk_nl_info {
>>          int fd;
>>   };
>>
>> -/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
>> - * Unfortunately, it is not part of glibc.
>> - */
>> -static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
>> -                            int fd, __u64 offset)
>> -{
>> -#ifdef __NR_mmap2
>> -       unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
>> -       long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
>> -                          (off_t)(offset >> page_shift));
>> -
>> -       return (void *)ret;
>> -#else
>> -       return mmap(addr, length, prot, flags, fd, offset);
>> -#endif
>> -}
>> -
>>   int xsk_umem__fd(const struct xsk_umem *umem)
>>   {
>>          return umem ? umem->fd : -EINVAL;
>> @@ -211,10 +194,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>>                  goto out_socket;
>>          }
>>
>> -       map = xsk_mmap(NULL, off.fr.desc +
>> -                      umem->config.fill_size * sizeof(__u64),
>> -                      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
>> -                      umem->fd, XDP_UMEM_PGOFF_FILL_RING);
>> +       map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64),
>> +                  PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
>> +                  XDP_UMEM_PGOFF_FILL_RING);
>>          if (map == MAP_FAILED) {
>>                  err = -errno;
>>                  goto out_socket;
>> @@ -228,10 +210,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>>          fill->ring = map + off.fr.desc;
>>          fill->cached_cons = umem->config.fill_size;
>>
>> -       map = xsk_mmap(NULL,
>> -                      off.cr.desc + umem->config.comp_size * sizeof(__u64),
>> -                      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
>> -                      umem->fd, XDP_UMEM_PGOFF_COMPLETION_RING);
>> +       map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64),
>> +                  PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
>> +                  XDP_UMEM_PGOFF_COMPLETION_RING);
>>          if (map == MAP_FAILED) {
>>                  err = -errno;
>>                  goto out_mmap;
>> @@ -552,11 +533,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
>>          }
>>
>>          if (rx) {
>> -               rx_map = xsk_mmap(NULL, off.rx.desc +
>> -                                 xsk->config.rx_size * sizeof(struct xdp_desc),
>> -                                 PROT_READ | PROT_WRITE,
>> -                                 MAP_SHARED | MAP_POPULATE,
>> -                                 xsk->fd, XDP_PGOFF_RX_RING);
>> +               rx_map = mmap(NULL, off.rx.desc +
>> +                             xsk->config.rx_size * sizeof(struct xdp_desc),
>> +                             PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
>> +                             xsk->fd, XDP_PGOFF_RX_RING);
>>                  if (rx_map == MAP_FAILED) {
>>                          err = -errno;
>>                          goto out_socket;
>> @@ -571,11 +551,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
>>          xsk->rx = rx;
>>
>>          if (tx) {
>> -               tx_map = xsk_mmap(NULL, off.tx.desc +
>> -                                 xsk->config.tx_size * sizeof(struct xdp_desc),
>> -                                 PROT_READ | PROT_WRITE,
>> -                                 MAP_SHARED | MAP_POPULATE,
>> -                                 xsk->fd, XDP_PGOFF_TX_RING);
>> +               tx_map = mmap(NULL, off.tx.desc +
>> +                             xsk->config.tx_size * sizeof(struct xdp_desc),
>> +                             PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
>> +                             xsk->fd, XDP_PGOFF_TX_RING);
>>                  if (tx_map == MAP_FAILED) {
>>                          err = -errno;
>>                          goto out_mmap_rx;
>>
>>
>> If maintainers are ready to accept this I can send.
>> What do you say?
>>
>> --
>> Regards,
>> Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH net-next,v4 08/12] drivers: net: use flow block API
From: Edward Cree @ 2019-08-14 16:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20190813195126.ilwtoljk2csco73m@salvia>

On 13/08/2019 20:51, Pablo Neira Ayuso wrote:
> On Mon, Aug 12, 2019 at 06:50:09PM +0100, Edward Cree wrote:
>> Pablo, can you explain (because this commit message doesn't) why these per-
>>  driver lists are needed, and what the information/state is that has module
>>  (rather than, say, netdevice) scope?
> The idea is to update drivers to support one flow_block per subsystem,
> one for ethtool, one for tc, and so on. So far, existing drivers only
> allow for binding one single flow_block to one of the existing
> subsystems. So this limitation applies at driver level.
That argues for per-driver _code_, not for per-driver _state_.  For instance,
 each driver could (more logically) store this information in the netdev
 private data, rather than a static global.  Or even, since each driver
 instance has a unique cb_ident = netdev_priv(net_dev), this doesn't need to
 be local to the driver at all and could just belong to the device owning the
 flow_block (which isn't necessarily the device doing the offload, per
 indirect blocks).

TBH I'm still not clear why you need a flow_block per subsystem, rather than
 just having multiple subsystems feed their offload requests through the same
 flow_block but with different enum tc_setup_type or enum tc_fl_command or
 some other indication that this is "netfilter" rather than "tc" asking for a
 tc_cls_flower_offload.

I'd also like to concur with what Jakub said on v2: "this series is really
 hard to follow... the number of things called some combination of block cb
 and list makes my head hurt :/".

This really needs a design document explaining what all the bits are, how
 they fit together, and why they need to be like that.

^ permalink raw reply

* Re: tc - mirred ingress not supported at the moment
From: Stephen Hemminger @ 2019-08-14 16:14 UTC (permalink / raw)
  To: Martin Olsson; +Cc: Cong Wang, netdev
In-Reply-To: <CAAT+qEbDAuQWGZa5BQYMZfBRQM+mDS=CMb9GTPz6Nxz_WD0M8Q@mail.gmail.com>

On Wed, 14 Aug 2019 11:25:25 +0200
Martin Olsson <martin.olsson+netdev@sentorsecurity.com> wrote:

> Hi Cong!
> 
> Ah sorry.
> Already implemented. Great!
> 
> Hmmm. Then why don't the manual at
> https://www.linux.org/docs/man8/tc-mirred.html to reflect the changes?
> That was the place I checked to see if ingress was still not implemented.
> In the commit you point at, the sentence "Currently only egress is
> implemented" has been removed.

The man pages on linux.org are not controlled by kernel/iproute developers.
Not sure who builds/owns these and don't care.


> Question:
> Is there any form of performance penalty if I send the mirrored
> traffic to the ingress queue of the destination interface rather than
> to the egress queue?
> I mean, in the kernel there is the possibility to perform far more
> actions on the ingress queue than on the egress, but if I leave both
> queues at their defaults, will mirrored packets to ingress use more
> CPU cycles than to the egress destination, or are they more or less
> identical?
> 
> 
> Question 2:
> Given the commit
> https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=5eca0a3701223619a513c7209f7d9335ca1b4cfa,
> how can I see in what kernel version it was added?

Look at the tags

$ git tag --contains 5eca0a3701223619a513c7209f7d9335ca1b4cfa 
v4.10.0
v4.11.0
v4.12.0
v4.13.0
v4.14.0
v4.14.1
v4.15.0
v4.16.0
v4.17.0
v4.18.0
v4.19.0
v4.20.0
v5.0.0
v5.1.0
v5.2.0

https://stackoverflow.com/questions/27886537/how-to-find-out-which-releases-contain-a-given-git-commit

^ permalink raw reply

* Re: [PATCH net-next 0/4] net: bridge: mdb: allow dump/add/del of host-joined entries
From: Nikolay Aleksandrov @ 2019-08-14 16:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, bridge
In-Reply-To: <20190814144024.9710-1-nikolay@cumulusnetworks.com>

On 8/14/19 5:40 PM, Nikolay Aleksandrov wrote:
> Hi,
> This set makes the bridge dump host-joined mdb entries, they should be
> treated as normal entries since they take a slot and are aging out.
> We already have notifications for them but we couldn't dump them until
> now so they remained hidden. We dump them similar to how they're
> notified, in order to keep user-space compatibility with the dumped
> objects (e.g. iproute2 dumps mdbs in a format which can be fed into
> add/del commands) we allow host-joined groups also to be added/deleted via
> mdb commands. That can later be used for L2 mcast MAC manipulation as
> was recently discussed. Note that iproute2 changes are not necessary,
> this set will work with the current user-space mdb code.
> 
> Patch 01 - a trivial comment move
> Patch 02 - factors out the mdb filling code so it can be
>            re-used for the host-joined entries
> Patch 03 - dumps host-joined entries
> Patch 04 - allows manipulation of host-joined entries via standard mdb
>            calls
> 
> Thanks,
>  Nik
> 
> Nikolay Aleksandrov (4):
>   net: bridge: mdb: move vlan comments
>   net: bridge: mdb: factor out mdb filling
>   net: bridge: mdb: dump host-joined entries as well
>   net: bridge: mdb: allow add/delete for host-joined groups
> 
>  net/bridge/br_mdb.c       | 171 +++++++++++++++++++++++++-------------
>  net/bridge/br_multicast.c |  24 ++++--
>  net/bridge/br_private.h   |   2 +
>  3 files changed, 133 insertions(+), 64 deletions(-)
> 

Self-NAK
There's a double notification sent for manual add/del of host groups.
It's a trivial fix, I'll spin v2 later after running more tests.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox