Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] selftests: mptcp: remove duplicate include in mptcp_inq.c
From: Mat Martineau @ 2021-12-10 18:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Matthieu Baerts, cgel.zte, davem, shuah, netdev, mptcp,
	linux-kselftest, linux-kernel, Ye Guojin, ZealRobot
In-Reply-To: <20211210075701.06bfced2@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Fri, 10 Dec 2021, Jakub Kicinski wrote:

> On Fri, 10 Dec 2021 16:36:06 +0100 Matthieu Baerts wrote:
>>> Actually, I take that back, let's hear from Mat, he may want to take
>>> the patch via his tree.
>>
>> We "rebase" our tree on top of net-next every night. I think for such
>> small patches with no behaviour change and sent directly to netdev ML,
>> it is probably best to apply them directly. I can check with Mat if it
>> is an issue if you prefer.
>
> Please do, I'm happy to apply the patch but Mat usually prefers to take
> things thru MPTCP tree.
>

Jakub -

It is ok with me if you apply this now, for the reasons Matthieu cited.

The usual division of labor between Matthieu and I as MPTCP co-maintainers 
usually has me upstreaming the patches to netdev, but I do trust 
Matthieu's judgement on sending out Reviewed-by tags and advising direct 
appliction to the netdev trees! Also, much like you & David, having offset 
timezones can be helpful.

Also appreciate your awareness of the normal patch flow for MPTCP, and 
that you're checking that we're all on the same page.


>> I would have applied it in our MPTCP tree if we were sending PR, not to
>> bother you for such patches but I guess it is best not to have us
>> sending this patch a second time later :)
>>
>> BTW, if you prefer us sending PR over batches of patches, please tell us!
>
> Small preference for patches. It's good to have the code on the ML for
> everyone to look at and mixed PR + patches are a tiny bit more clicking
> for me.
>

Good to know.


Thanks!

--
Mat Martineau
Intel

^ permalink raw reply

* RE: [PATCH] samples: bpf: fix tracex2 due to empty sys_write count argument
From: John Fastabend @ 2021-12-10 18:00 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko
  Cc: bpf, netdev
In-Reply-To: <20211210111918.4904-1-danieltimlee@gmail.com>

Daniel T. Lee wrote:
> Currently from syscall entry, argument can't be fetched correctly as a
> result of register cleanup.
> 
>     commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface")
> 
> For example in upper commit, registers are cleaned prior to syscall.
> To be more specific, sys_write syscall has count size as a third argument.
> But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to
> register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7)
> 
> This commit fix this problem by modifying the trace event to ksys_write
> instead of sys_write syscall entry.
> 
>     # Wrong example of 'write()' syscall argument fetching
>     # ./tracex2
>     ...
>     pid 50909 cmd dd uid 0
>            syscall write() stats
>      byte_size       : count     distribution
>        1 -> 1        : 4968837  |************************************* |
> 
>     # Successful example of 'write()' syscall argument fetching
>     # (dd's write bytes at a time defaults to 512)
>     # ./tracex2
>     ...
>     pid 3095 cmd dd uid 0
>            syscall write() stats
>      byte_size       : count     distribution
>     ...
>      256 -> 511      : 0        |                                      |
>      512 -> 1023     : 4968844  |************************************* |
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>  samples/bpf/tracex2_kern.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
> index 5bc696bac27d..96dff3bea227 100644
> --- a/samples/bpf/tracex2_kern.c
> +++ b/samples/bpf/tracex2_kern.c
> @@ -78,7 +78,7 @@ struct {
>  	__uint(max_entries, 1024);
>  } my_hist_map SEC(".maps");
>  
> -SEC("kprobe/" SYSCALL(sys_write))
> +SEC("kprobe/ksys_write")
>  int bpf_prog3(struct pt_regs *ctx)
>  {
>  	long write_size = PT_REGS_PARM3(ctx);
> -- 
> 2.32.0
> 

LGTM

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* [PATCH net] netdevsim: don't overwrite read only ethtool parms
From: Filip Pokryvka @ 2021-12-10 17:50 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S . Miller, Antonio Cardace, Michal Kubecek,
	Filip Pokryvka

Ethtool ring feature has _max_pending attributes read-only.
Set only read-write attributes in nsim_set_ringparam.

This patch is useful, if netdevsim device is set-up using NetworkManager,
because NetworkManager sends 0 as MAX values, as it is pointless to
retrieve them in extra call, because they should be read-only. Then,
the device is left in incosistent state (value > MAX).

Fixes: a7fc6db099b5 ("netdevsim: support ethtool ring and coalesce settings")
Signed-off-by: Filip Pokryvka <fpokryvk@redhat.com>
---
 drivers/net/netdevsim/ethtool.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/ethtool.c b/drivers/net/netdevsim/ethtool.c
index b03a0513e..2e7c1cc16 100644
--- a/drivers/net/netdevsim/ethtool.c
+++ b/drivers/net/netdevsim/ethtool.c
@@ -77,7 +77,10 @@ static int nsim_set_ringparam(struct net_device *dev,
 {
 	struct netdevsim *ns = netdev_priv(dev);
 
-	memcpy(&ns->ethtool.ring, ring, sizeof(ns->ethtool.ring));
+	ns->ethtool.ring.rx_pending = ring->rx_pending;
+	ns->ethtool.ring.rx_jumbo_pending = ring->rx_jumbo_pending;
+	ns->ethtool.ring.rx_mini_pending = ring->rx_mini_pending;
+	ns->ethtool.ring.tx_pending = ring->tx_pending;
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
From: Tonghao Zhang @ 2021-12-10 17:46 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Eric Dumazet,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Arnd Bergmann
In-Reply-To: <CAMDZJNV3-y5jkUAJJ--10PcicKpGMwKS_3gG9O7srjomO3begw@mail.gmail.com>

On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> Hi John
> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> netdev, right ?
> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> bpf_redirect works fine on ingress with ifb
> __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> but
> __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> execute the BPF progam again)
so the packets will NOT execute the BPF progam again)

> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> tc filter add dev veth1 egress bpf direct-action obj
> test_bpf_redirect_ifb.o sec redirect_ifb
>
> the test_bpf_redirect_ifb  bpf progam:
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 DiDi Global */
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +SEC("redirect_ifb")
> +int redirect(struct __sk_buff *skb)
> +{
> +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> +}
> +
> +char __license[] SEC("license") = "GPL";
>
> The 3/3 is selftest:
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>
> > Thanks,
> > John
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
From: Tonghao Zhang @ 2021-12-10 17:43 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Eric Dumazet,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Arnd Bergmann
In-Reply-To: <61b385c5c21c3_203252085a@john.notmuch>

On Sat, Dec 11, 2021 at 12:52 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> John Fastabend wrote:
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> >
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> >
> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> >
> > Thanks,
> > John
>
> After a bit further thought (and coffee) I think this will
> break some programs that exist today. Consider the case
> where I pop a header off and resubmit to the same device
> intentionally to reprocess the pkt without the header. I've
> used this pattern in BPF a few times.
No, ifb netdev sets the skb->tc_skip_classify = 1, that means we
should not process the skb again, no matter on egress or ingress.
if the bpf programs don't set the skb->tc_skip_classify = 1, then we
can process them again.

-- 
Best regards, Tonghao

^ permalink raw reply

* Re: [PATCH net-next] net: dev: Always serialize on Qdisc::busylock in __dev_xmit_skb() on PREEMPT_RT.
From: Sebastian Andrzej Siewior @ 2021-12-10 17:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, netdev, David S. Miller, Jakub Kicinski,
	Thomas Gleixner
In-Reply-To: <CANn89i+zyeMJVhNmEEhwE0oaRvD-m0ZR9w1+ScsvpWZEuP9G5Q@mail.gmail.com>

On 2021-12-10 08:52:55 [-0800], Eric Dumazet wrote:
> 
> Problem is that if you have a qdisc, qdisc_dequeue() will not know that the
> packet queued by high prio thread needs to shortcut all prior packets and
> be sent right away.

That is correct.

> Because of that, it seems just better that a high prio thread returns
> immediately and let the dirty work (dequeue packets and send them to devices)
> be done by other threads ?

Ah okay. Lets say you have a task sending a packet every 100ms. And you
want that packet to leave the NIC asap. This is your task with
SCHED_FIFO 70 priority. IMPORTANT otherwise.
Lets say you have a ssh session on the same NIC running at SCHED_OTHER.
Nothing important obviously.

The NIC is idle, ssh sends a keep-alive, 80 bytes, one skb. 
    SCHED_OTHER, SSH, CPU0                                    FIFO 70, IMPORTANT, CPU1
  __dev_xmit_skb()
    -> spin_lock(root_lock);
    -> qdisc_run_begin()
      -> __test_and_set_bit(__QDISC_STATE2_RUNNING);
    -> sch_direct_xmit()
        *PREEMPT* (by something else)
                                                            __dev_xmit_skb()
                                                            -> spin_lock(root_lock);
                                                           <--- PI-boost
      -> spin_unlock(root_lock);   
         *PREEMPT* again     ---> PI boost ends after unlock
                                                            -> qdisc_run_begin()
                                                              -> __test_and_set_bit(__QDISC_STATE2_RUNNING) (nope);
                                                             -> dev_qdisc_enqueue()
                                                             -> qdisc_run_begin() returns false
                                                            -> spin_unlock(root_lock);

at this point, we don't return to the SSH thread, instead other RT tasks
are running. That means that the SSH thread, which owns the qdisc and
the NIC, remains preempted and the NIC idle.

300ms pass by, the IMPORTANT thread queued two additional packets.
Now, a few usecs later, all SCHED_FIFO tasks go idle on CPU0, and the
SSH thread can continue with dev_hard_start_xmit() and send its packet, 
    -> dev_hard_start_xmit()
    __qdisc_run() (yes, and the three additional packets)
    …

By always acquiring the busy-lock, in the previous scenario, the task
with the high-priority allows the low-priority task to run until
qdisc_run_end() at which point it releases the qdisc so the
high-priority task can actually send packets.

One side note: This scenario requires two CPUs (one for low priority
task that gets preempted and owns the qdisc, and one one for high
priority task that wants to sends packets).
With one CPU only the local_bh_disable() invocation would ensure that
low-priority thread left the bh-disable section entirely.

> > > Thanks!
> > >
> > > Paolo
> >

Sebastian

^ permalink raw reply

* Re: [PATCH net-next v1] net: dsa: mv88e6xxx: Trap PTP traffic
From: Kurt Kanzenbach @ 2021-12-10 17:39 UTC (permalink / raw)
  To: Tobias Waldekranz, Andrew Lunn, Vivien Didelot
  Cc: Florian Fainelli, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, netdev, Richard Cochran
In-Reply-To: <87y24t1fvk.fsf@waldekranz.com>

On Fri Dec 10 2021, Tobias Waldekranz wrote:
> On Thu, Dec 09, 2021 at 18:33, Kurt Kanzenbach <kurt@kmk-computers.de> wrote:
>> A time aware switch should trap PTP traffic to the management CPU. User space
>> daemons such as ptp4l will process these messages to implement Boundary (or
>> Transparent) Clocks.
>>
>> At the moment the mv88e6xxx driver for mv88e6341 doesn't trap these messages
>> which leads to confusion when multiple end devices are connected to the
>> switch. Therefore, setup PTP traps. Leverage the already implemented policy
>> mechanism based on destination addresses. Configure the traps only if
>> timestamping is enabled so that non time aware use case is still functioning.
>
> Do we know how PTP is supposed to work in relation to things like STP?
> I.e should you be able to run PTP over a link that is currently in
> blocking? It seems like being able to sync your clock before a topology
> change occurs would be nice. In that case, these addresses should be
> added to the ATU as MGMT instead of policy entries.

Given the fact that the l2 p2p address is already considered as
management traffic (see mv88e6390_g1_mgmt_rsvd2cpu()) maybe all PTP
addresses could be treated as such.

[snip]

>> +static int mv88e6341_setup_ptp_traps(struct mv88e6xxx_chip *chip, int port,
>> +				     bool enable)
>> +{
>> +	static const u8 ptp_destinations[][ETH_ALEN] = {
>> +		{ 0x01, 0x1b, 0x19, 0x00, 0x00, 0x00 }, /* L2 PTP */
>> +		{ 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e }, /* L2 P2P */
>> +		{ 0x01, 0x00, 0x5e, 0x00, 0x01, 0x81 }, /* IPv4 PTP */
>> +		{ 0x01, 0x00, 0x5e, 0x00, 0x00, 0x6b }, /* IPv4 P2P */
>> +		{ 0x33, 0x33, 0x00, 0x00, 0x01, 0x81 }, /* IPv6 PTP */
>> +		{ 0x33, 0x33, 0x00, 0x00, 0x00, 0x6b }, /* IPv6 P2P */
>
> How does the L3 entries above play together with IGMP/MLD? I.e. what
> happens if, after launching ptp4l, an IGMP report comes in on lanX,
> requesting that same group? Would the policy entry not be overwritten by
> mv88e6xxx_port_mdb_add?

Just tested this. Yes it is overwritten without any detection or
errors. Actually I did test UDP as well and didn't notice it. It
obviously depends on the order of events.

>
> Eventually I think we will have many interfaces to configure static
> entries in the ATU:
>
> 1. MDB entries from a bridge (already in place)
> 2. User-configured entries through ethtool's rxnfc (already in place)
> 3. Driver-internal consumers (this patch, MRP, etc.)
> 4. User-configured entries through TC.
>
> Seems to me like we need to start tracking the owners for these to stop
> them from stomping on one another.

Agreed. Some mechanism is required. Any idea how to implement it? In
case of PTP the management/policy entries should take precedence.

>
>> +	};
>> +	int ret, i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(ptp_destinations); ++i) {
>> +		struct mv88e6xxx_policy policy = { };
>> +
>> +		policy.mapping	= MV88E6XXX_POLICY_MAPPING_DA;
>> +		policy.action	= enable ? MV88E6XXX_POLICY_ACTION_TRAP :
>> +			MV88E6XXX_POLICY_ACTION_NORMAL;
>> +		policy.port	= port;
>> +		policy.vid	= 0;
>> +		ether_addr_copy(policy.addr, ptp_destinations[i]);
>> +
>> +		ret = mv88e6xxx_policy_apply(chip, port, &policy);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {
>>  	.clock_read = mv88e6165_ptp_clock_read,
>>  	.global_enable = mv88e6165_global_enable,
>> @@ -419,6 +450,34 @@ const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {
>>  	.cc_mult_dem = MV88E6XXX_CC_MULT_DEM,
>>  };
>>  
>> +const struct mv88e6xxx_ptp_ops mv88e6341_ptp_ops = {
>> +	.clock_read = mv88e6352_ptp_clock_read,
>> +	.ptp_enable = mv88e6352_ptp_enable,
>> +	.ptp_verify = mv88e6352_ptp_verify,
>> +	.event_work = mv88e6352_tai_event_work,
>> +	.port_enable = mv88e6352_hwtstamp_port_enable,
>> +	.port_disable = mv88e6352_hwtstamp_port_disable,
>> +	.setup_ptp_traps = mv88e6341_setup_ptp_traps,
>
> Is there any reason why this could not be added to the existing ops for
> 6352 instead? Their ATU's are compatible, IIRC.

No particular reason except that I don't have access to a 6352 device to
test it.

Thanks,
Kurt

^ permalink raw reply

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
From: Tonghao Zhang @ 2021-12-10 17:37 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Eric Dumazet,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Arnd Bergmann
In-Reply-To: <61b383c6373ca_1f50e20816@john.notmuch>

On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> xiangxia.m.yue@ wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles. This patch
> >   is useful when there are a lot of filters with different
> >   prio. There is ~5 prio in in production, ~1% improvement.
> >
> >   Rules as below:
> >   $ for id in $(seq 1 5); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> >
> > * bpf_redirect may be invoked in egress path. If we don't
> >   check the flags and then return immediately, the packets
> >   will loopback.
>
> This would be the naive case right? Meaning the BPF program is
> doing a redirect without any logic or is buggy?
>
> Can you map out how this happens for me, I'm not fully sure I
> understand the exact concern. Is it possible for BPF programs
> that used to see packets no longer see the packet as expected?
>
> Is this the path you are talking about?
Hi John
Tx ethx -> __dev_queue_xmit -> sch_handle_egress
->  execute BPF program on ethx with bpf_redirect(ifb0) ->
-> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
the packets loopbacks, that means bpf_redirect doesn't work with ifb
netdev, right ?
so in sch_handle_egress, I add the check skb_skip_tc_classify().

>  rx ethx  ->
>    execute BPF program on ethx with bpf_redirect(ifb0) ->
>      __skb_dequeue @ifb tc_skip_classify = 1 ->
>        dev_queue_xmit() ->
>           sch_handle_egress() ->
>             execute BPF program again
>
> I can't see why you want to skip that second tc BPF program,
> or for that matter any tc filter there. In general how do you
> know that is the correct/expected behavior? Before the above
> change it would have been called, what if its doing useful
> work.
bpf_redirect works fine on ingress with ifb
__netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
-> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
but
__netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
execute the BPF progam again)

> Also its not clear how your ifb setup is built or used. That
> might help understand your use case. I would just remove the
> IFB altogether and the above discussion is mute.
tc filter add dev veth1 egress bpf direct-action obj
test_bpf_redirect_ifb.o sec redirect_ifb

the test_bpf_redirect_ifb  bpf progam:
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 DiDi Global */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("redirect_ifb")
+int redirect(struct __sk_buff *skb)
+{
+       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
+}
+
+char __license[] SEC("license") = "GPL";

The 3/3 is selftest:
https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/

> Thanks,
> John



-- 
Best regards, Tonghao

^ permalink raw reply

* Re: [PATCH v1 bpf 1/1] libbpf: don't force user-supplied ifname string to be of fixed size
From: Andrii Nakryiko @ 2021-12-10 17:37 UTC (permalink / raw)
  To: Emmanuel Deloget
  Cc: Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Networking, bpf, open list
In-Reply-To: <15676ff5-5c5c-fd06-308f-10611c01f6a9@eho.link>

On Thu, Dec 9, 2021 at 9:55 AM Emmanuel Deloget
<emmanuel.deloget@eho.link> wrote:
>
> Hello,
>
> On 09/12/2021 18:17, Andrii Nakryiko wrote:
> > On Thu, Dec 9, 2021 at 4:03 AM Emmanuel Deloget
> > <emmanuel.deloget@eho.link> wrote:
> >>
> >> When calling either xsk_socket__create_shared() or xsk_socket__create()
> >> the user supplies a const char *ifname which is implicitely supposed to
> >> be a pointer to the start of a char[IFNAMSIZ] array. The internal
> >> function xsk_create_ctx() then blindly copy IFNAMSIZ bytes from this
> >> string into the xsk context.
> >>
> >> This is counter-intuitive and error-prone.
> >>
> >> For example,
> >>
> >>          int r = xsk_socket__create(..., "eth0", ...)
> >>
> >> may result in an invalid object because of the blind copy. The "eth0"
> >> string might be followed by random data from the ro data section,
> >> resulting in ctx->ifname being filled with the correct interface name
> >> then a bunch and invalid bytes.
> >>
> >> The same kind of issue arises when the ifname string is located on the
> >> stack:
> >>
> >>          char ifname[] = "eth0";
> >>          int r = xsk_socket__create(..., ifname, ...);
> >>
> >> Or comes from the command line
> >>
> >>          const char *ifname = argv[n];
> >>          int r = xsk_socket__create(..., ifname, ...);
> >>
> >> In both case we'll fill ctx->ifname with random data from the stack.
> >>
> >> In practice, we saw that this issue caused various small errors which,
> >> in then end, prevented us to setup a valid xsk context that would have
> >> allowed us to capture packets on our interfaces. We fixed this issue in
> >> our code by forcing our char ifname[] to be of size IFNAMSIZ but that felt
> >> weird and unnecessary.
> >
> > I might be missing something, but the eth0 example above would include
> > terminating zero at the right place, so ifname will still have
> > "eth0\0" which is a valid string. Yes there will be some garbage after
> > that, but it shouldn't matter. It could cause ASAN to complain about
> > reading beyond allocated memory, of course, but I'm curious what
> > problems you actually ran into in practice.
>
> I cannot be extremely precise on what was happening as I did not
> investigate past this (and this fixes our issue) but I suspect that
> having weird bytes in ctx->ifname polutes ifr.ifr_name as initialized in
> xsk_get_max_queues(). ioctl(SIOCETHTOOL) was then giving us an error.
> Now, I haven't looked how the kernel implements this ioctl() so I'm not
> going to say that there is a problem here as well.
>
> And since the issue is now about 2 weeks old it's now a bit murky - and
> I don't have much time to put myself in the same setup in order to
> produce a better investigation (sorry for that).
>

Ok, fine, but I still don't see how memcpy() could have caused
correctness errors. The string will be zero-terminated properly, so it
is a valid C string. The only issue I see is reading past allocated
memory (which might trigger SIGSEGV or will make ASAN complain).
Anyways, let's try strncpy() and fix this. Please send it against
bpf-next for the respin, please.


> >>
> >> Fixes: 2f6324a3937f8 (libbpf: Support shared umems between queues and devices)
> >> Signed-off-by: Emmanuel Deloget <emmanuel.deloget@eho.link>
> >> ---
> >>   tools/lib/bpf/xsk.c | 7 +++++--
> >>   1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> >> index 81f8fbc85e70..8dda80bcefcc 100644
> >> --- a/tools/lib/bpf/xsk.c
> >> +++ b/tools/lib/bpf/xsk.c
> >> @@ -944,6 +944,7 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
> >>   {
> >>          struct xsk_ctx *ctx;
> >>          int err;
> >> +       size_t ifnamlen;
> >>
> >>          ctx = calloc(1, sizeof(*ctx));
> >>          if (!ctx)
> >> @@ -965,8 +966,10 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
> >>          ctx->refcount = 1;
> >>          ctx->umem = umem;
> >>          ctx->queue_id = queue_id;
> >> -       memcpy(ctx->ifname, ifname, IFNAMSIZ - 1);
> >> -       ctx->ifname[IFNAMSIZ - 1] = '\0';
> >> +
> >> +       ifnamlen = strnlen(ifname, IFNAMSIZ);
> >> +       memcpy(ctx->ifname, ifname, ifnamlen);
> >
> > maybe use strncpy instead of strnlen + memcpy? keep the guaranteed
> > zero termination (and keep '\0', why did you change it?)
>
> Well, strncpy() calls were replaced by memcpy() a while ago (see
> 3015b500ae42 (libbpf: Use memcpy instead of strncpy to please GCC) for
> example but there are a few other examples ; most of the changes were
> made to please gcc8) so I thought that it would be a bad idea :). What
> would be the consensus on this?

3015b500ae42 ("libbpf: Use memcpy instead of strncpy to please GCC")
is different, there we are copying from properly sized array which our
code controls. memcpy() is totally reasonable there. Here we can't do
memcpy, unfortunately. Let's try strncpy(), if GCC will start
complaining about this, then GCC is clearly wrong and we'll just
disable this warning altogether (I don't remember if it ever found any
real issues anyways).


>
> Regarding '\0', I'll change that.
>
> > Also, note that xsk.c is deprecated in libbpf and has been moved into
> > libxdp, so please contribute a similar fix there.
>
> Will do.
>
> >> +       ctx->ifname[IFNAMSIZ - 1] = 0;
> >>
> >>          ctx->fill = fill;
> >>          ctx->comp = comp;
> >> --
> >> 2.32.0
> >>
>
> BTW, is there a reason why this patch failed to pass the bpf/vmtest-bpf
> test on patchwork?
>

Unrelated bpftool-related check, that isn't supposed to pass on bpf
tree. That one can be ignored.

> Best regards,
>
> -- Emmanuel Deloget

^ permalink raw reply

* [PATCH 03/12] selftests/cgroup: remove ARRAY_SIZE define from cgroup_util.h
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from cgroup_util.h and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/cgroup/cgroup_util.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
index 82e59cdf16e7..4f66d10626d2 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -2,9 +2,9 @@
 #include <stdbool.h>
 #include <stdlib.h>
 
-#define PAGE_SIZE 4096
+#include "../kselftest.h"
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define PAGE_SIZE 4096
 
 #define MB(x) (x << 20)
 
-- 
2.32.0


^ permalink raw reply related

* [PATCH 00/12] selftests: Remove ARRAY_SIZE duplicate defines
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm

ARRAY_SIZE is defined in several selftests. There are about 25+
duplicate defines in various selftests source and header files.
This patch series removes the duplicated defines.

Several tests that define ARRAY_SIZE also include kselftest.h or
kselftest_harness.h. Remove ARRAY_SIZE defines from them.

Some tests that define ARRAY_SIZE don't include headers that define
it. Remove ARRAY_SIZE define and include kselftest.h

The first patch in this series:

- Adds ARRAY_SIZE define to kselftest.h
- Adds ifndef guard around ARRAY_SIZE define in
  tools/include/linux/kernel.h and kselftest_harness.h
- Patches 2-12 do the cleanup and depend on patch 1, hence
  will have to go through kselftest tree.

Shuah Khan (12):
  tools: fix ARRAY_SIZE defines in tools and selftests hdrs
  selftests/arm64: remove ARRAY_SIZE define from vec-syscfg.c
  selftests/cgroup: remove ARRAY_SIZE define from cgroup_util.h
  selftests/core: remove ARRAY_SIZE define from close_range_test.c
  selftests/ir: remove ARRAY_SIZE define from ir_loopback.c
  selftests/landlock: remove ARRAY_SIZE define from common.h
  selftests/net: remove ARRAY_SIZE define from individual tests
  selftests/rseq: remove ARRAY_SIZE define from individual tests
  selftests/seccomp: remove ARRAY_SIZE define from seccomp_benchmark
  selftests/sparc64: remove ARRAY_SIZE define from adi-test
  selftests/timens: remove ARRAY_SIZE define from individual tests
  selftests/vm: remove ARRAY_SIZE define from individual tests

 tools/include/linux/kernel.h                          | 2 ++
 tools/testing/selftests/arm64/fp/vec-syscfg.c         | 2 --
 tools/testing/selftests/cgroup/cgroup_util.h          | 4 ++--
 tools/testing/selftests/core/close_range_test.c       | 4 ----
 tools/testing/selftests/ir/ir_loopback.c              | 1 -
 tools/testing/selftests/kselftest.h                   | 4 ++++
 tools/testing/selftests/kselftest_harness.h           | 2 ++
 tools/testing/selftests/landlock/common.h             | 4 ----
 tools/testing/selftests/net/gro.c                     | 3 ++-
 tools/testing/selftests/net/ipsec.c                   | 1 -
 tools/testing/selftests/net/reuseport_bpf.c           | 4 +---
 tools/testing/selftests/net/rxtimestamp.c             | 2 +-
 tools/testing/selftests/net/socket.c                  | 3 ++-
 tools/testing/selftests/net/tcp_fastopen_backup_key.c | 6 ++----
 tools/testing/selftests/rseq/basic_percpu_ops_test.c  | 3 +--
 tools/testing/selftests/rseq/rseq.c                   | 3 +--
 tools/testing/selftests/seccomp/seccomp_benchmark.c   | 2 +-
 tools/testing/selftests/sparc64/drivers/adi-test.c    | 4 ----
 tools/testing/selftests/timens/procfs.c               | 2 --
 tools/testing/selftests/timens/timens.c               | 2 --
 tools/testing/selftests/vm/mremap_test.c              | 1 -
 tools/testing/selftests/vm/pkey-helpers.h             | 3 ++-
 tools/testing/selftests/vm/va_128TBswitch.c           | 2 +-
 23 files changed, 24 insertions(+), 40 deletions(-)

-- 
2.32.0


^ permalink raw reply

* [PATCH] selftests/bpf: remove ARRAY_SIZE defines from tests
From: Shuah Khan @ 2021-12-10 17:34 UTC (permalink / raw)
  To: shuah, ast, daniel, andrii
  Cc: Shuah Khan, linux-kselftest, netdev, bpf, linux-kernel

ARRAY_SIZE is defined in multiple test files. Remove the definitions
and include header file for the define instead.

Remove ARRAY_SIZE define and add include bpf_util.h to bring in the
define.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/bpf/progs/netif_receive_skb.c | 5 +----
 tools/testing/selftests/bpf/progs/profiler.inc.h      | 5 +----
 tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 5 +----
 tools/testing/selftests/bpf/progs/test_sysctl_loop2.c | 4 +---
 tools/testing/selftests/bpf/progs/test_sysctl_prog.c  | 5 +----
 5 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/netif_receive_skb.c b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
index 1d8918dfbd3f..7a5ebd330689 100644
--- a/tools/testing/selftests/bpf/progs/netif_receive_skb.c
+++ b/tools/testing/selftests/bpf/progs/netif_receive_skb.c
@@ -5,6 +5,7 @@
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
 #include <bpf/bpf_core_read.h>
+#include <bpf/bpf_util.h>
 
 #include <errno.h>
 
@@ -23,10 +24,6 @@ bool skip = false;
 #define BADPTR			0
 #endif
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x)	(sizeof(x) / sizeof((x)[0]))
-#endif
-
 struct {
 	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
 	__uint(max_entries, 1);
diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h b/tools/testing/selftests/bpf/progs/profiler.inc.h
index 4896fdf816f7..aad30994ecd7 100644
--- a/tools/testing/selftests/bpf/progs/profiler.inc.h
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -4,6 +4,7 @@
 #include <bpf/bpf_core_read.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
+#include <bpf/bpf_util.h>
 
 #include "profiler.h"
 
@@ -132,10 +133,6 @@ struct {
 	__uint(max_entries, 16);
 } disallowed_exec_inodes SEC(".maps");
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
-#endif
-
 static INLINE bool IS_ERR(const void* ptr)
 {
 	return IS_ERR_VALUE((unsigned long)ptr);
diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
index 553a282d816a..c7c512e0af79 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -8,10 +8,7 @@
 #include <linux/bpf.h>
 
 #include <bpf/bpf_helpers.h>
-
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
+#include <bpf/bpf_util.h>
 
 /* tcp_mem sysctl has only 3 ints, but this test is doing TCP_MEM_LOOPS */
 #define TCP_MEM_LOOPS 28  /* because 30 doesn't fit into 512 bytes of stack */
diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_loop2.c b/tools/testing/selftests/bpf/progs/test_sysctl_loop2.c
index 2b64bc563a12..57cda15d0032 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop2.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop2.c
@@ -8,10 +8,8 @@
 #include <linux/bpf.h>
 
 #include <bpf/bpf_helpers.h>
+#include <bpf/bpf_util.h>
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
 
 /* tcp_mem sysctl has only 3 ints, but this test is doing TCP_MEM_LOOPS */
 #define TCP_MEM_LOOPS 20  /* because 30 doesn't fit into 512 bytes of stack */
diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_prog.c b/tools/testing/selftests/bpf/progs/test_sysctl_prog.c
index 5489823c83fc..6047c39eb457 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_prog.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_prog.c
@@ -8,6 +8,7 @@
 #include <linux/bpf.h>
 
 #include <bpf/bpf_helpers.h>
+#include <bpf/bpf_util.h>
 
 /* Max supported length of a string with unsigned long in base 10 (pow2 - 1). */
 #define MAX_ULONG_STR_LEN 0xF
@@ -15,10 +16,6 @@
 /* Max supported length of sysctl value string (pow2). */
 #define MAX_VALUE_STR_LEN 0x40
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
-
 const char tcp_mem_name[] = "net/ipv4/tcp_mem";
 static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
 {
-- 
2.32.0


^ permalink raw reply related

* [PATCH 08/12] selftests/rseq: remove ARRAY_SIZE define from individual tests
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from rseq tests and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/rseq/basic_percpu_ops_test.c | 3 +--
 tools/testing/selftests/rseq/rseq.c                  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
index eb3f6db36d36..b953a52ff706 100644
--- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -9,10 +9,9 @@
 #include <string.h>
 #include <stddef.h>
 
+#include "../kselftest.h"
 #include "rseq.h"
 
-#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
-
 struct percpu_lock_entry {
 	intptr_t v;
 } __attribute__((aligned(128)));
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 7159eb777fd3..fb440dfca158 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -27,10 +27,9 @@
 #include <signal.h>
 #include <limits.h>
 
+#include "../kselftest.h"
 #include "rseq.h"
 
-#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
-
 __thread volatile struct rseq __rseq_abi = {
 	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
 };
-- 
2.32.0


^ permalink raw reply related

* [PATCH 07/12] selftests/net: remove ARRAY_SIZE define from individual tests
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from net tests and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/net/gro.c                     | 3 ++-
 tools/testing/selftests/net/ipsec.c                   | 1 -
 tools/testing/selftests/net/reuseport_bpf.c           | 4 +---
 tools/testing/selftests/net/rxtimestamp.c             | 2 +-
 tools/testing/selftests/net/socket.c                  | 3 ++-
 tools/testing/selftests/net/tcp_fastopen_backup_key.c | 6 ++----
 6 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index cf37ce86b0fd..221525ccbe1d 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -57,10 +57,11 @@
 #include <string.h>
 #include <unistd.h>
 
+#include "../kselftest.h"
+
 #define DPORT 8000
 #define SPORT 1500
 #define PAYLOAD_LEN 100
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 #define NUM_PACKETS 4
 #define START_SEQ 100
 #define START_ACK 100
diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
index 3d7dde2c321b..cc10c10c5ed9 100644
--- a/tools/testing/selftests/net/ipsec.c
+++ b/tools/testing/selftests/net/ipsec.c
@@ -41,7 +41,6 @@
 
 #define pr_err(fmt, ...)	printk(fmt ": %m", ##__VA_ARGS__)
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
 
 #define IPV4_STR_SZ	16	/* xxx.xxx.xxx.xxx is longest + \0 */
diff --git a/tools/testing/selftests/net/reuseport_bpf.c b/tools/testing/selftests/net/reuseport_bpf.c
index b5277106df1f..072d709c96b4 100644
--- a/tools/testing/selftests/net/reuseport_bpf.c
+++ b/tools/testing/selftests/net/reuseport_bpf.c
@@ -24,9 +24,7 @@
 #include <sys/resource.h>
 #include <unistd.h>
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-#endif
+#include "../kselftest.h"
 
 struct test_params {
 	int recv_family;
diff --git a/tools/testing/selftests/net/rxtimestamp.c b/tools/testing/selftests/net/rxtimestamp.c
index e4613ce4ed69..9eb42570294d 100644
--- a/tools/testing/selftests/net/rxtimestamp.c
+++ b/tools/testing/selftests/net/rxtimestamp.c
@@ -18,7 +18,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/errqueue.h>
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#include "../kselftest.h"
 
 struct options {
 	int so_timestamp;
diff --git a/tools/testing/selftests/net/socket.c b/tools/testing/selftests/net/socket.c
index afca1ead677f..db1aeb8c5d1e 100644
--- a/tools/testing/selftests/net/socket.c
+++ b/tools/testing/selftests/net/socket.c
@@ -7,6 +7,8 @@
 #include <sys/socket.h>
 #include <netinet/in.h>
 
+#include "../kselftest.h"
+
 struct socket_testcase {
 	int	domain;
 	int	type;
@@ -31,7 +33,6 @@ static struct socket_testcase tests[] = {
 	{ AF_INET, SOCK_STREAM, IPPROTO_UDP, -EPROTONOSUPPORT, 1  },
 };
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 #define ERR_STRING_SZ	64
 
 static int run_tests(void)
diff --git a/tools/testing/selftests/net/tcp_fastopen_backup_key.c b/tools/testing/selftests/net/tcp_fastopen_backup_key.c
index 9c55ec44fc43..c1cb0c75156a 100644
--- a/tools/testing/selftests/net/tcp_fastopen_backup_key.c
+++ b/tools/testing/selftests/net/tcp_fastopen_backup_key.c
@@ -26,6 +26,8 @@
 #include <fcntl.h>
 #include <time.h>
 
+#include "../kselftest.h"
+
 #ifndef TCP_FASTOPEN_KEY
 #define TCP_FASTOPEN_KEY 33
 #endif
@@ -34,10 +36,6 @@
 #define PROC_FASTOPEN_KEY "/proc/sys/net/ipv4/tcp_fastopen_key"
 #define KEY_LENGTH 16
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-#endif
-
 static bool do_ipv6;
 static bool do_sockopt;
 static bool do_rotate;
-- 
2.32.0


^ permalink raw reply related

* [PATCH 12/12] selftests/vm: remove ARRAY_SIZE define from individual tests
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from vm tests and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/vm/mremap_test.c    | 1 -
 tools/testing/selftests/vm/pkey-helpers.h   | 3 ++-
 tools/testing/selftests/vm/va_128TBswitch.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index 0624d1bd71b5..7c0b0617b9f8 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -20,7 +20,6 @@
 #define VALIDATION_DEFAULT_THRESHOLD 4	/* 4MB */
 #define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
 
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
 
 struct config {
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index 622a85848f61..92f3be3dd8e5 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -13,6 +13,8 @@
 #include <ucontext.h>
 #include <sys/mman.h>
 
+#include "../kselftest.h"
+
 /* Define some kernel-like types */
 #define  u8 __u8
 #define u16 __u16
@@ -175,7 +177,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write)
 	dprintf4("pkey_reg now: %016llx\n", read_pkey_reg());
 }
 
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
 #define ALIGN_UP(x, align_to)	(((x) + ((align_to)-1)) & ~((align_to)-1))
 #define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
 #define ALIGN_PTR_UP(p, ptr_align_to)	\
diff --git a/tools/testing/selftests/vm/va_128TBswitch.c b/tools/testing/selftests/vm/va_128TBswitch.c
index 83acdff26a13..da6ec3b53ea8 100644
--- a/tools/testing/selftests/vm/va_128TBswitch.c
+++ b/tools/testing/selftests/vm/va_128TBswitch.c
@@ -9,7 +9,7 @@
 #include <sys/mman.h>
 #include <string.h>
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#include "../kselftest.h"
 
 #ifdef __powerpc64__
 #define PAGE_SIZE	(64 << 10)
-- 
2.32.0


^ permalink raw reply related

* [PATCH 10/12] selftests/sparc64: remove ARRAY_SIZE define from adi-test
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definition from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from adi-test and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/sparc64/drivers/adi-test.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/testing/selftests/sparc64/drivers/adi-test.c b/tools/testing/selftests/sparc64/drivers/adi-test.c
index 95d93c6a88a5..84e5d9fd20b0 100644
--- a/tools/testing/selftests/sparc64/drivers/adi-test.c
+++ b/tools/testing/selftests/sparc64/drivers/adi-test.c
@@ -24,10 +24,6 @@
 #define DEBUG_LEVEL_4_BIT	(0x0008)
 #define DEBUG_TIMING_BIT	(0x1000)
 
-#ifndef ARRAY_SIZE
-# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
-
 /* bit mask of enabled bits to print */
 #define DEBUG 0x0001
 
-- 
2.32.0


^ permalink raw reply related

* [PATCH 11/12] selftests/timens: remove ARRAY_SIZE define from individual tests
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from timens tests and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/timens/procfs.c | 2 --
 tools/testing/selftests/timens/timens.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c
index f2519154208a..1833ca97eb24 100644
--- a/tools/testing/selftests/timens/procfs.c
+++ b/tools/testing/selftests/timens/procfs.c
@@ -24,8 +24,6 @@
 #define DAY_IN_SEC			(60*60*24)
 #define TEN_DAYS_IN_SEC			(10*DAY_IN_SEC)
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 static int child_ns, parent_ns;
 
 static int switch_ns(int fd)
diff --git a/tools/testing/selftests/timens/timens.c b/tools/testing/selftests/timens/timens.c
index 52b6a1185f52..387220791a05 100644
--- a/tools/testing/selftests/timens/timens.c
+++ b/tools/testing/selftests/timens/timens.c
@@ -22,8 +22,6 @@
 #define DAY_IN_SEC			(60*60*24)
 #define TEN_DAYS_IN_SEC			(10*DAY_IN_SEC)
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 struct test_clock {
 	clockid_t id;
 	char *name;
-- 
2.32.0


^ permalink raw reply related

* [PATCH 09/12] selftests/seccomp: remove ARRAY_SIZE define from seccomp_benchmark
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from seccomp_benchmark and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/seccomp/seccomp_benchmark.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c b/tools/testing/selftests/seccomp/seccomp_benchmark.c
index 6e5102a7d7c9..5b5c9d558dee 100644
--- a/tools/testing/selftests/seccomp/seccomp_benchmark.c
+++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c
@@ -18,7 +18,7 @@
 #include <sys/syscall.h>
 #include <sys/types.h>
 
-#define ARRAY_SIZE(a)    (sizeof(a) / sizeof(a[0]))
+#include "../kselftest.h"
 
 unsigned long long timing(clockid_t clk_id, unsigned long long samples)
 {
-- 
2.32.0


^ permalink raw reply related

* [PATCH 06/12] selftests/landlock: remove ARRAY_SIZE define from common.h
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from common.h and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/landlock/common.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 20e2a9286d71..183b7e8e1b95 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -17,10 +17,6 @@
 
 #include "../kselftest_harness.h"
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
-
 /*
  * TEST_F_FORK() is useful when a test drop privileges but the corresponding
  * FIXTURE_TEARDOWN() requires them (e.g. to remove files from a directory
-- 
2.32.0


^ permalink raw reply related

* [PATCH 05/12] selftests/ir: remove ARRAY_SIZE define from ir_loopback.c
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from ir_loopback.c and pickup the one defined
in kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/ir/ir_loopback.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c
index af7f9c7d59bc..06256c96df12 100644
--- a/tools/testing/selftests/ir/ir_loopback.c
+++ b/tools/testing/selftests/ir/ir_loopback.c
@@ -26,7 +26,6 @@
 #include "../kselftest.h"
 
 #define TEST_SCANCODES	10
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 #define SYSFS_PATH_MAX 256
 #define DNAME_PATH_MAX 256
 
-- 
2.32.0


^ permalink raw reply related

* [PATCH 04/12] selftests/core: remove ARRAY_SIZE define from close_range_test.c
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from close_range_test.c and pickup the one defined
in kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/core/close_range_test.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c
index aa7d13d91963..749239930ca8 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -50,10 +50,6 @@ static inline int sys_close_range(unsigned int fd, unsigned int max_fd,
 	return syscall(__NR_close_range, fd, max_fd, flags);
 }
 
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
-
 TEST(core_close_range)
 {
 	int i, ret;
-- 
2.32.0


^ permalink raw reply related

* [PATCH 02/12] selftests/arm64: remove ARRAY_SIZE define from vec-syscfg.c
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.

Remove ARRAY_SIZE from vec-syscfg.c and pickup the one defined in
kselftest.h.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/arm64/fp/vec-syscfg.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/arm64/fp/vec-syscfg.c b/tools/testing/selftests/arm64/fp/vec-syscfg.c
index 272b888e018e..c90658811a83 100644
--- a/tools/testing/selftests/arm64/fp/vec-syscfg.c
+++ b/tools/testing/selftests/arm64/fp/vec-syscfg.c
@@ -21,8 +21,6 @@
 #include "../../kselftest.h"
 #include "rdvl.h"
 
-#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
-
 #define ARCH_MIN_VL SVE_VL_MIN
 
 struct vec_data {
-- 
2.32.0


^ permalink raw reply related

* [PATCH 01/12] tools: fix ARRAY_SIZE defines in tools and selftests hdrs
From: Shuah Khan @ 2021-12-10 17:33 UTC (permalink / raw)
  To: catalin.marinas, will, shuah, keescook, mic, davem, kuba, peterz,
	paulmck, boqun.feng, akpm
  Cc: Shuah Khan, linux-kselftest, linux-security-module, netdev,
	linux-mm
In-Reply-To: <cover.1639156389.git.skhan@linuxfoundation.org>

tools/include/linux/kernel.h and kselftest_harness.h are missing
ifndef guard around ARRAY_SIZE define. Fix them to avoid duplicate
define errors during compile when another file defines it. This
problem was found when compiling selftests that include a header
with ARRAY_SIZE define.

ARRAY_SIZE is defined in several selftests. There are about 25+
duplicate defines in various selftests source and header files.
Add ARRAY_SIZE to kselftest.h in preparation for removing duplicate
ARRAY_SIZE defines from individual test files.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/include/linux/kernel.h                | 2 ++
 tools/testing/selftests/kselftest.h         | 4 ++++
 tools/testing/selftests/kselftest_harness.h | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/tools/include/linux/kernel.h b/tools/include/linux/kernel.h
index 3e8df500cfbd..9701e8307db0 100644
--- a/tools/include/linux/kernel.h
+++ b/tools/include/linux/kernel.h
@@ -92,7 +92,9 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
 int scnprintf(char * buf, size_t size, const char * fmt, ...);
 int scnprintf_pad(char * buf, size_t size, const char * fmt, ...);
 
+#ifndef ARRAY_SIZE
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
+#endif
 
 #define current_gfp_context(k) 0
 #define synchronize_rcu()
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 8d50483fe204..f1180987492c 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -48,6 +48,10 @@
 #include <stdarg.h>
 #include <stdio.h>
 
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#endif
+
 /* define kselftest exit codes */
 #define KSFT_PASS  0
 #define KSFT_FAIL  1
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index ae0f0f33b2a6..75164e23f036 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -671,7 +671,9 @@
 #define EXPECT_STRNE(expected, seen) \
 	__EXPECT_STR(expected, seen, !=, 0)
 
+#ifndef ARRAY_SIZE
 #define ARRAY_SIZE(a)	(sizeof(a) / sizeof(a[0]))
+#endif
 
 /* Support an optional handler after and ASSERT_* or EXPECT_*.  The approach is
  * not thread-safe, but it should be fine in most sane test scenarios.
-- 
2.32.0


^ permalink raw reply related

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Ansuel Smith @ 2021-12-10 17:29 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <20211210171530.xh7lajqsvct7dd3r@skbuf>

On Fri, Dec 10, 2021 at 05:15:30PM +0000, Vladimir Oltean wrote:
> On Fri, Dec 10, 2021 at 06:10:45PM +0100, Ansuel Smith wrote:
> > On Fri, Dec 10, 2021 at 05:02:42PM +0000, Vladimir Oltean wrote:
> > > On Fri, Dec 10, 2021 at 04:37:52AM +0100, Ansuel Smith wrote:
> > > > On Thu, Dec 09, 2021 at 07:39:23PM +0200, Vladimir Oltean wrote:
> > > > > This patch set is provided solely for review purposes (therefore not to
> > > > > be applied anywhere) and for Ansuel to test whether they resolve the
> > > > > slowdown reported here:
> > > > > https://patchwork.kernel.org/project/netdevbpf/cover/20211207145942.7444-1-ansuelsmth@gmail.com/
> > > > > 
> > > > > The patches posted here are mainly to offer a consistent
> > > > > "master_state_change" chain of events to switches, without duplicates,
> > > > > and always starting with operational=true and ending with
> > > > > operational=false. This way, drivers should know when they can perform
> > > > > Ethernet-based register access, and need not care about more than that.
> > > > > 
> > > > > Changes in v2:
> > > > > - dropped some useless patches
> > > > > - also check master operstate.
> > > > > 
> > > > > Vladimir Oltean (4):
> > > > >   net: dsa: provide switch operations for tracking the master state
> > > > >   net: dsa: stop updating master MTU from master.c
> > > > >   net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}
> > > > >   net: dsa: replay master state events in
> > > > >     dsa_tree_{setup,teardown}_master
> > > > > 
> > > > >  include/net/dsa.h  | 11 +++++++
> > > > >  net/dsa/dsa2.c     | 80 +++++++++++++++++++++++++++++++++++++++++++---
> > > > >  net/dsa/dsa_priv.h | 13 ++++++++
> > > > >  net/dsa/master.c   | 29 ++---------------
> > > > >  net/dsa/slave.c    | 27 ++++++++++++++++
> > > > >  net/dsa/switch.c   | 15 +++++++++
> > > > >  6 files changed, 145 insertions(+), 30 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.25.1
> > > > > 
> > > > 
> > > > Hi, I tested this v2 and I still have 2 ethernet mdio failing on init.
> > > > I don't think we have other way to track this. Am I wrong?
> > > > 
> > > > All works correctly with this and promisc_on_master.
> > > > If you have other test, feel free to send me other stuff to test.
> > > > 
> > > > (I'm starting to think the fail is caused by some delay that the switch
> > > > require to actually start accepting packet or from the reinit? But I'm
> > > > not sure... don't know if you notice something from the pcap)
> > > 
> > > I've opened the pcap just now. The Ethernet MDIO packets are
> > > non-standard. When the DSA master receives them, it expects the first 6
> > > octets to be the MAC DA, because that's the format of an Ethernet frame.
> > > But the packets have this other format, according to your own writing:
> > > 
> > > /* Specific define for in-band MDIO read/write with Ethernet packet */
> > > #define QCA_HDR_MDIO_SEQ_LEN           4 /* 4 byte for the seq */
> > > #define QCA_HDR_MDIO_COMMAND_LEN       4 /* 4 byte for the command */
> > > #define QCA_HDR_MDIO_DATA1_LEN         4 /* First 4 byte for the mdio data */
> > > #define QCA_HDR_MDIO_HEADER_LEN        (QCA_HDR_MDIO_SEQ_LEN + \
> > >                                        QCA_HDR_MDIO_COMMAND_LEN + \
> > >                                        QCA_HDR_MDIO_DATA1_LEN)
> > > 
> > > #define QCA_HDR_MDIO_DATA2_LEN         12 /* Other 12 byte for the mdio data */
> > > #define QCA_HDR_MDIO_PADDING_LEN       34 /* Padding to reach the min Ethernet packet */
> > > 
> > > The first 6 octets change like crazy in your pcap. Definitely can't add
> > > that to the RX filter of the DSA master.
> > > 
> > > So yes, promisc_on_master is precisely what you need, it exists for
> > > situations like this.
> > > 
> > > Considering this, I guess it works?
> > 
> > Yes it works! We can totally accept 2 mdio timeout out of a good way to
> > track the master port. It's probably related to other stuff like switch
> > delay or other.
> > 
> > Wonder the next step is wait for this to be accepted and then I can
> > propose a v3 of my patch? Or net-next is closed now and I should just
> > send v3 RFC saying it does depend on this?
> 
> Wait a minute, I don't think I understood your previous reply.
> With promisc_on_master, is there or is there not any timeout?

With promisc_on_master I have only 2 timeout.

> My understanding was this: DSA tells you when the master is up and
> operational. That information is correct, except it isn't sufficient and
> you don't see the replies back. Later during boot, you have some init
> scripts triggered by user space that create a bridge interface and put
> the switch ports under the bridge. The bridge puts the switch interfaces
> in promiscuous mode, because that's what bridges do. Then DSA propagates
> the promiscuous mode from the switch ports to the DSA master, and once
> the master is promiscuous, the Ethernet MDIO starts working too.
> Now, with promisc_on_master set, the DSA master is already promiscuous
> by the time DSA tells you that it's up and running. Hence your message
> that "All works correctly with this and promisc_on_master."
> What did I misunderstand?

You got all correct. But still I have these 2 timeout at the very start.
Let me give you another pastebin to make this more clear. [0]
Transaction done is when the Ethernet packet is received and processed.
I added some pr with the events received by switch.c

I should check if the tagger receive some packet before the
"function timeout". 
What I mean with "acceptable state" is that aside from the 2
timeout everything else works correctly withtout any slowdown in the
init process.

[0] https://pastebin.com/VfGB5hAQ

-- 
	Ansuel

^ permalink raw reply

* Re: [PATCH 1/2] xfrm: interface with if_id 0 should return error
From: Eyal Birger @ 2021-12-10 17:22 UTC (permalink / raw)
  To: antony.antony
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Jakub Kicinski,
	Linux Kernel Network Developers
In-Reply-To: <0bfebd4e5f317cbf301750d5dd5cc706d4385d7f.1639064087.git.antony.antony@secunet.com>

On Thu, Dec 9, 2021 at 5:36 PM Antony Antony <antony.antony@secunet.com> wrote:
>
> xfrm interface if_id = 0 would cause xfrm policy lookup errors since
> commit 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces")
>
> Now fail to create an xfrm interface when if_id = 0
>
> With this commit:
>  ip link add ipsec0  type xfrm dev lo  if_id 0
>  Error: if_id must be non zero.
>
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> ---
>  net/xfrm/xfrm_interface.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
> index 41de46b5ffa9..57448fc519fc 100644
> --- a/net/xfrm/xfrm_interface.c
> +++ b/net/xfrm/xfrm_interface.c
> @@ -637,11 +637,16 @@ static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
>                         struct netlink_ext_ack *extack)
>  {
>         struct net *net = dev_net(dev);
> -       struct xfrm_if_parms p;
> +       struct xfrm_if_parms p = {};
>         struct xfrm_if *xi;
>         int err;
>
>         xfrmi_netlink_parms(data, &p);
> +       if (!p.if_id) {
> +               NL_SET_ERR_MSG(extack, "if_id must be non zero");
> +               return -EINVAL;
> +       }
> +
>         xi = xfrmi_locate(net, &p);
>         if (xi)
>                 return -EEXIST;
> @@ -666,7 +671,12 @@ static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[],
>  {
>         struct xfrm_if *xi = netdev_priv(dev);
>         struct net *net = xi->net;
> -       struct xfrm_if_parms p;
> +       struct xfrm_if_parms p = {};
> +
> +       if (!p.if_id) {
> +               NL_SET_ERR_MSG(extack, "if_id must be non zero");
> +               return -EINVAL;
> +       }
>
>         xfrmi_netlink_parms(data, &p);
>         xi = xfrmi_locate(net, &p);

Looks good. Maybe this needs a "Fixes:" tag?

Reviewed-by: Eyal Birger <eyal.birger@gmail.com>

Thanks,
Eyal.

> --
> 2.30.2
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox