Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, linux-kernel, Andrew Lunn, Richard Cochran,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean,
	David S. Miller
In-Reply-To: <20190820084833.6019-3-hubert.feurstein@vahle.at>

On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:

> +	/* PTP offset compensation:
> +	 * After the MDIO access is completed (from the chip perspective), the
> +	 * switch chip will snapshot the PHC timestamp. To make sure our system
> +	 * timestamp corresponds to the PHC timestamp, we have to add the
> +	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> +	 * takes the average of pre_ts and post_ts to calculate the final
> +	 * system timestamp. With this in mind, we have to add ptp_sts_offset
> +	 * twice to post_ts, in order to not introduce an constant time offset.
> +	 */
> +	if (sts)
> +		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);

This correction looks good to me.

Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?

This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH] can: flexcan: free error skb if enqueueing failed
From: Sean Nyekjaer @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Martin Hundebøll, Wolfgang Grandegger, Marc Kleine-Budde,
	linux-can
  Cc: David S . Miller, netdev, Joakim Zhang
In-Reply-To: <20190715185308.104333-1-martin@geanix.com>

CC'ing Joakim Zhang

On 15/07/2019 20.53, Martin Hundebøll wrote:
> If the call to can_rx_offload_queue_sorted() fails, the passed skb isn't
> consumed, so the caller must do so.
> 
> Fixes: 30164759db1b ("can: flexcan: make use of rx-offload's irq_offload_fifo")
> Signed-off-by: Martin Hundebøll <martin@geanix.com>
> ---
>   drivers/net/can/flexcan.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 1c66fb2ad76b..21f39e805d42 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -688,7 +688,8 @@ static void flexcan_irq_bus_err(struct net_device *dev, u32 reg_esr)
>   	if (tx_errors)
>   		dev->stats.tx_errors++;
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> @@ -732,7 +733,8 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>   	if (unlikely(new_state == CAN_STATE_BUS_OFF))
>   		can_bus_off(dev);
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
> 

^ permalink raw reply

* [PATCH bpf-next] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Quentin Monnet @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Implement the show_fdinfo hook for BTF FDs file operations, and make it
print the id and the size of the BTF object. This allows for a quick
retrieval of the BTF id from its FD; or it can help understanding what
type of object (BTF) the file descriptor points to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 kernel/bpf/btf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..39e184f1b27c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3376,6 +3376,19 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 	btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
 }
 
+#ifdef CONFIG_PROC_FS
+static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	const struct btf *btf = filp->private_data;
+
+	seq_printf(m,
+		   "btf_id:\t%u\n"
+		   "data_size:\t%u\n",
+		   btf->id,
+		   btf->data_size);
+}
+#endif
+
 static int btf_release(struct inode *inode, struct file *filp)
 {
 	btf_put(filp->private_data);
@@ -3383,6 +3396,9 @@ static int btf_release(struct inode *inode, struct file *filp)
 }
 
 const struct file_operations btf_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= bpf_btf_show_fdinfo,
+#endif
 	.release	= btf_release,
 };
 
-- 
2.17.1


^ permalink raw reply related

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190820093800.GN2588@breakpoint.cc>



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 3:08 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@strlen.de>
> > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > To: Vakul Garg <vakul.garg@nxp.com>
> > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > running single
> > > > > static ipsec tunnel.
> > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > ipsec encap
> > > > > test (on my dual core arm board).
> > > > > >
> > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > policy in variable
> > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > and hence the
> > > > > lockup.
> > > > > >
> > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > condition can it
> > > > > become '0'?
> > > > >
> > > > > Yes, when policy is destroyed and the last user calls
> > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > >
> > > > It seems that policy reference count never gets decremented during
> > > > packet
> > > ipsec encap.
> > > > It is getting incremented for every frame that hits the policy.
> > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > >
> > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > current tree as well?
> >
> > I am yet to try it on 4.19.
> > Can you help me with the right fix? Which part of code should it get
> decremented?
> > I am not conversant with xfrm code.
> 
> Normally policy reference counts get decremented when the skb is free'd, via
> dst destruction (xfrm_dst_destroy()).
> 
> Do you see a dst leak as well?

Can you please guide me how to detect it?

(I am checking refcount on recent kernel and will let you know.)

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: Delete the VID from the upstream port as well
From: Vladimir Oltean @ 2019-08-20  9:54 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820015138.GB975@t480s.localdomain>

Hi Vivien!

On Tue, 20 Aug 2019 at 08:51, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> Vladimir,
>
> On Tue, 20 Aug 2019 02:59:59 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Commit b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members")
> > is littering a lot. After deleting a VLAN added on a DSA port, it still
> > remains installed in the hardware filter of the upstream port. Fix this.
>
> Littering a lot, really?
>
> FYI we are not removing the target VLAN from the hardware yet because it would
> be too expensive to cache data in DSA core in order to know if the VID is not
> used by any other slave port of the fabric anymore, and thus safe to remove.
>
> Keeping the VID programmed for DSA and CPU ports is simpler for the moment,
> as an hardware VLAN with only these ports as members is unlikely to harm.
>
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/switch.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> > index 09d9286b27cc..84ab2336131e 100644
> > --- a/net/dsa/switch.c
> > +++ b/net/dsa/switch.c
> > @@ -295,11 +295,20 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
> >                              struct dsa_notifier_vlan_info *info)
> >  {
> >       const struct switchdev_obj_port_vlan *vlan = info->vlan;
> > +     int port;
> >
> >       if (!ds->ops->port_vlan_del)
> >               return -EOPNOTSUPP;
> >
> > +     /* Build a mask of VLAN members */
> > +     bitmap_zero(ds->bitmap, ds->num_ports);
> >       if (ds->index == info->sw_index)
> > +             set_bit(info->port, ds->bitmap);
> > +     for (port = 0; port < ds->num_ports; port++)
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     set_bit(port, ds->bitmap);
> > +
> > +     for_each_set_bit(port, ds->bitmap, ds->num_ports)
> >               return ds->ops->port_vlan_del(ds, info->port, vlan);
>
> You return right away from the loop? You use info->port instead of port?
>
> >
> >       return 0;
>
> Even if you patch wasn't badly broken, "bridge vlan del" targeting a single
> switch port would also remove the VLAN from the CPU port and thus breaking
> offloaded 802.1q. It would also remove it from the DSA ports interconnecting
> multiple switches, thus breaking the 802.1q conduit for the whole fabric.
>
> So you're not fixing anything here, but you're breaking single-chip and
> cross-chip hardware VLAN. Seriously wtf is this patch?
>
> NAK!
>
>
>         Vivien

I can agree that this isn't one of my brightest moments. But at least
we get to see Cunningham's law in action :)
When dsa_8021q is cleaning up the switch's VLAN table for the bridge
to use it, it is good to really clean it up, i.e. not leave any VLAN
installed on the upstream ports.
But I think this is just an academical concern at this point. In
vlan_filtering mode, the CPU port will accept VLAN frames with the
dsa_8021q ID's, but they will eventually get dropped due to no
destination. The real breaker is the pvid change. If something like
patch 4/6 gets accepted I will drop this one.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Christophe de Dinechin @ 2019-08-20  9:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Alex Williamson, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia@nvidia.com, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668B6221E477A873688CDBD1AB0@AM0PR05MB4866.eurprd05.prod.outlook.com>


Parav Pandit writes:

> + Dave.
>
> Hi Jiri, Dave, Alex, Kirti, Cornelia,
>
> Please provide your feedback on it, how shall we proceed?
>
> Hence, I would like to discuss below options.
>
> Option-1: mdev index
> Introduce an optional mdev index/handle as u32 during mdev create time.
> User passes mdev index/handle as input.
>
> phys_port_name=mIndex=m%u
> mdev_index will be available in sysfs as mdev attribute for udev to name the mdev's netdev.
>
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID index=10 > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */
> mdev_netdev=enm10
>
> Pros:
> 1. mdevctl and any other existing tools are unaffected.
> 2. netdev stack, ovs and other switching platforms are unaffected.
> 3. achieves unique phys_port_name for representor netdev
> 4. achieves unique mdev eth netdev name for the mdev using udev/systemd extension.
> 5. Aligns well with mdev and netdev subsystem and similar to existing sriov bdf's.
>
> Option-2: shorter mdev name
> Extend mdev to have shorter mdev device name in addition to UUID.
> such as 'foo', 'bar'.
> Mdev will continue to have UUID.
> phys_port_name=mdev_name
>
> Pros:
> 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> It is common practice to upgrade iproute2 package along with the kernel.
> Similar practice to be done with mdevctl.
> 2. Newer users of mdevctl who wants to work with non_UUID names, will use newer mdevctl/tools.
> Cons:
> 1. Dual naming scheme of mdev might affect some of the existing tools.
> It's unclear how/if it actually affects.
> mdevctl [2] is very recently developed and can be enhanced for dual naming scheme.
>
> Option-3: mdev uuid alias
> Instead of shorter mdev name or mdev index, have alpha-numeric name alias.
> Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID alias=foo > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> examle netdevs:
> repnetdev = ens2f0_mfoo
> mdev_netdev=enmfoo
>
> Pros:
> 1. All same as option-1.
> 2. Doesn't affect existing mdev naming scheme.
> Cons:
> 1. Index scheme of option-1 is better which can number large number of mdevs with fewer characters, simplifying the management tool.

I believe that Alex pointed out another "Cons" to all three options,
which is that it forces user-space to resolve potential race conditions
when creating an index or short name or alias.

Also, what happens if `index=10` is not provided on the command-line?
Does that make the device unusable for your purpose?


--
Cheers,
Christophe de Dinechin (IRC c3d)

^ permalink raw reply

* Re: [PATCH] ipvs: change type of delta and previous_delta in ip_vs_seq.
From: Julian Anastasov @ 2019-08-20 10:00 UTC (permalink / raw)
  To: zhang kai; +Cc: Wensong Zhang, Simon Horman, lvs-devel, netdev
In-Reply-To: <20190820003718.GA16620@toolchain>


	Hello,

	Cc list trimmed...

On Tue, 20 Aug 2019, zhang kai wrote:

> In NAT forwarding mode, Applications may decrease the size of packets,
> and TCP sequences will get smaller, so both of variables will be negetive
> values in this case.

	As long as nobody cares about their sign, the type should not
matter. You can not solve all signed/unsigned mismatches with such
small patch. Or you are seeing some problem, may be in debug?

> 
> Signed-off-by: zhang kai <zhangkaiheb@126.com>
> ---
>  include/net/ip_vs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 3759167f91f5..de7e75063c7c 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -346,8 +346,8 @@ enum ip_vs_sctp_states {
>   */
>  struct ip_vs_seq {
>  	__u32			init_seq;	/* Add delta from this seq */
> -	__u32			delta;		/* Delta in sequence numbers */
> -	__u32			previous_delta;	/* Delta in sequence numbers
> +	__s32			delta;		/* Delta in sequence numbers */
> +	__s32			previous_delta;	/* Delta in sequence numbers
>  						 * before last resized pkt */
>  };
>  
> -- 
> 2.17.1

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:01 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Igor Russkikh, Andrew Lunn, Antoine Tenart, davem@davemloft.net,
	f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132959.GC8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:29:59PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 16:18:40 +0000, Igor Russkikh wrote:
> > On 13.08.2019 16:17, Andrew Lunn wrote:
> 
> > That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> 
> Agreed, I think an offload that cannot be disabled is quite problematic.
> 
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> +1
> 
> > 2) I think, Antoine, its not totally true that otherwise the user macsec API
> > will be broken/changed. netlink api is the same, the only thing we may want to
> > add is an optional parameter to force selection of SW macsec engine.
> 
> Yes, I think we need an offload on/off parameter (and IMO it should
> probably be off by default). Then, if offloading is requested but
> cannot be satisfied (unsupported key length, too many SAs, etc), or if
> incompatible settings are requested (mixing offloaded and
> non-offloaded SCs on a device that cannot do it), return an error.
> 
> If we also export that offload parameter during netlink dumps, we can
> inspect the state of the system, which helps for debugging.

So it seems the ability to enable or disable the offloading on a given
interface is the main missing feature. I'll add that, however I'll
probably (at least at first):

- Have the interface to be fully offloaded or fully handled in s/w (with
  errors being thrown if a given configuration isn't supported). Having
  both at the same time on a given interface would be tricky because of
  the MACsec validation parameter.

- Won't allow to enable/disable the offloading of there are rules in
  place, as we're not sure the same rules would be accepted by the other
  implementation.

I'm not sure if we should allow to mix the implementations on a given
physical interface (by having two MACsec interfaces attached) as the
validation would be impossible to do (we would have no idea if a
packet was correctly handled by the offloading part or just not being
a MACsec packet in the first place, in Rx).

I agree the offloading should be disabled by default, and only enabled
by an user explicitly.

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Igor Russkikh, Antoine Tenart, davem@davemloft.net,
	sd@queasysnail.net, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813162823.GH15047@lunn.ch>

Hi Andrew,

On Tue, Aug 13, 2019 at 06:28:23PM +0200, Andrew Lunn wrote:
> > 1) With current implementation it's impossible to install SW macsec engine onto
> > the device which supports HW offload. That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> It would also be nice to add extra information to the netlink API to
> indicate if HW or SW is being used. In other places where we offload
> to accelerators we have such additional information.

Agreed, in addition to being able to enable/disable the offloading we
should have a way to know if a MACsec interface is being offloaded or
not.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* [PATCH bpf-next] xsk: proper socket state check in xsk_poll
From: Björn Töpel @ 2019-08-20 10:04 UTC (permalink / raw)
  To: syzbot+c82697e3043781e08802, ast, daniel, netdev
  Cc: bjorn.topel, bpf, davem, hawk, jakub.kicinski, john.fastabend,
	jonathan.lemon, kafai, linux-kernel, magnus.karlsson,
	songliubraving, syzkaller-bugs, xdp-newbies, yhs, hdanton
In-Reply-To: <0000000000009167320590823a8c@google.com>

From: Björn Töpel <bjorn.topel@intel.com>

The poll() implementation for AF_XDP sockets did not perform the
proper state checks, prior accessing the socket umem. This patch fixes
that by performing a xsk_is_bound() check.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	return err;
 }
 
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	struct net_device *dev = READ_ONCE(xs->dev);
+
+	return dev && xs->state == XSK_BOUND;
+}
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 
-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(!(xs->dev->flags & IFF_UP)))
 		return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
 	struct net_device *dev = xs->dev;
 	struct xdp_umem *umem = xs->umem;
 
+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
 	if (umem->need_wakeup)
 		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
 						umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
 
-	if (!dev || xs->state != XSK_BOUND)
+	if (!xsk_is_bound(xs))
 		return;
 
 	xs->state = XSK_UNBOUND;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Antoine Tenart, Igor Russkikh, davem@davemloft.net,
	andrew@lunn.ch, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132500.GA8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:25:00PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 10:58:17 +0200, Antoine Tenart wrote:
> > 
> > As for the need for xmit / handle_frame ops (for a MAC w/ MACsec
> > offloading), I'd say the xmit / handle_frame ops of the real net device
> > driver could be used as the one of the MACsec virtual interface do not
> > do much (regardless of the implementation choice discussed above).
> 
> There's no "handle_frame" op on a real device. macsec_handle_frame is
> an rx_handler specificity that grabs packets from a real device and
> sends them into a virtual device stacked on top of it. A real device
> just hands packets over to the stack via NAPI.

Right, so if there is a need for a custom implementation of xmit /
hamdle_frame we could let the offloading driver provide it. I'll
probably let the first MAC implementation add it, as we agreed not to
add an API with no user (and mine is done in the PHY).

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 5/9] net: phy: add MACsec ops in phy_device
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, sd, andrew, hkallweit1, netdev,
	linux-kernel, thomas.petazzoni, alexandre.belloni, allan.nielsen,
	camelia.groza, Simon.Edelhaus
In-Reply-To: <1521a28b-a0af-b3fb-d1bf-af82ec2f3d47@gmail.com>

Hi Florian,

On Wed, Aug 14, 2019 at 04:15:03PM -0700, Florian Fainelli wrote:
> On 8/8/19 7:05 AM, Antoine Tenart wrote:
> > This patch adds a reference to MACsec ops in the phy_device, to allow
> > PHYs to support offloading MACsec operations. The phydev lock will be
> > held while calling those helpers.
> > 
> > Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> > ---
> >  include/linux/phy.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 462b90b73f93..6947a19587e4 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -22,6 +22,10 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/mod_devicetable.h>
> >  
> > +#ifdef CONFIG_MACSEC
> > +#include <net/macsec.h>
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> > +
> >  #include <linux/atomic.h>
> >  
> >  #define PHY_DEFAULT_FEATURES	(SUPPORTED_Autoneg | \
> > @@ -345,6 +349,7 @@ struct phy_c45_device_ids {
> >   * attached_dev: The attached enet driver's device instance ptr
> >   * adjust_link: Callback for the enet controller to respond to
> >   * changes in the link state.
> > + * macsec_ops: MACsec offloading ops.
> >   *
> >   * speed, duplex, pause, supported, advertising, lp_advertising,
> >   * and autoneg are used like in mii_if_info
> > @@ -438,6 +443,11 @@ struct phy_device {
> >  
> >  	void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier);
> >  	void (*adjust_link)(struct net_device *dev);
> > +
> > +#if defined(CONFIG_MACSEC)
> > +	/* MACsec management functions */
> > +	const struct macsec_ops *macsec_ops;
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> likewise.

I'll fix it.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* RE: [PATCH] can: flexcan: free error skb if enqueueing failed
From: Joakim Zhang @ 2019-08-20 10:12 UTC (permalink / raw)
  To: Sean Nyekjaer, Martin Hundebøll, Wolfgang Grandegger,
	Marc Kleine-Budde, linux-can@vger.kernel.org
  Cc: David S . Miller, netdev@vger.kernel.org
In-Reply-To: <6bddb702-e9ba-1c9e-7d7a-eb974d2e0fdd@geanix.com>


> -----Original Message-----
> From: Sean Nyekjaer <sean@geanix.com>
> Sent: 2019年8月20日 17:50
> To: Martin Hundebøll <martin@geanix.com>; Wolfgang Grandegger
> <wg@grandegger.com>; Marc Kleine-Budde <mkl@pengutronix.de>;
> linux-can@vger.kernel.org
> Cc: David S . Miller <davem@davemloft.net>; netdev@vger.kernel.org; Joakim
> Zhang <qiangqing.zhang@nxp.com>
> Subject: Re: [PATCH] can: flexcan: free error skb if enqueueing failed
> 
> CC'ing Joakim Zhang

Looks good, so add my tag:
Acked-by: Joakim Zhang <qiangqing.zhang@nxp.com>

Best Regards,
Joakim Zhang
> On 15/07/2019 20.53, Martin Hundebøll wrote:
> > If the call to can_rx_offload_queue_sorted() fails, the passed skb
> > isn't consumed, so the caller must do so.
> >
> > Fixes: 30164759db1b ("can: flexcan: make use of rx-offload's
> > irq_offload_fifo")
> > Signed-off-by: Martin Hundebøll <martin@geanix.com>
> > ---
> >   drivers/net/can/flexcan.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> > index 1c66fb2ad76b..21f39e805d42 100644
> > --- a/drivers/net/can/flexcan.c
> > +++ b/drivers/net/can/flexcan.c
> > @@ -688,7 +688,8 @@ static void flexcan_irq_bus_err(struct net_device
> *dev, u32 reg_esr)
> >   	if (tx_errors)
> >   		dev->stats.tx_errors++;
> >
> > -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> > +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> > +		kfree_skb(skb);
> >   }
> >
> >   static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> > @@ -732,7 +733,8 @@ static void flexcan_irq_state(struct net_device *dev,
> u32 reg_esr)
> >   	if (unlikely(new_state == CAN_STATE_BUS_OFF))
> >   		can_bus_off(dev);
> >
> > -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> > +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> > +		kfree_skb(skb);
> >   }
> >
> >   static inline struct flexcan_priv *rx_offload_to_priv(struct
> > can_rx_offload *offload)
> >

^ permalink raw reply

* Re: [PATCH 2/3] macb: Update compatibility string for SiFive FU540-C000
From: Nicolas.Ferre @ 2019-08-20 10:16 UTC (permalink / raw)
  To: schwab, paul.walmsley, davem, jakub.kicinski
  Cc: yash.shah, robh+dt, netdev, devicetree, linux-kernel, linux-riscv,
	mark.rutland, palmer, aou, ynezz, sachin.ghadi
In-Reply-To: <mvm5zmskxs3.fsf@suse.de>

On 20/08/2019 at 11:10, Andreas Schwab wrote:
> External E-Mail
> 
> 
> On Aug 13 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
> 
>> Dave, Nicolas,
>>
>> On Mon, 22 Jul 2019, Yash Shah wrote:
>>
>>> On Fri, Jul 19, 2019 at 5:36 PM <Nicolas.Ferre@microchip.com> wrote:
>>>>
>>>> On 19/07/2019 at 13:10, Yash Shah wrote:
>>>>> Update the compatibility string for SiFive FU540-C000 as per the new
>>>>> string updated in the binding doc.
>>>>> Reference: https://lkml.org/lkml/2019/7/17/200
>>>>
>>>> Maybe referring to lore.kernel.org is better:
>>>> https://lore.kernel.org/netdev/CAJ2_jOFEVZQat0Yprg4hem4jRrqkB72FKSeQj4p8P5KA-+rgww@mail.gmail.com/
>>>
>>> Sure. Will keep that in mind for future reference.
>>>
>>>>
>>>>> Signed-off-by: Yash Shah <yash.shah@sifive.com>
>>>>
>>>> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
>>>
>>> Thanks.
>>
>> Am assuming you'll pick this up for the -net tree for v5.4-rc1 or earlier.
>> If not, please let us know.
> 
> This is still missing in v5.4-rc5, which means that networking is broken.

Andreas, Paul,

The patchwork state for the 2 first patches of this series is "Changes 
Requested". It's probably due to my advice of using lore.kernel.org (or 
something else).

I'm perfectly fine in accepting the patches are they are today but can't 
change their patchwork state myself. We would need Dave or Jakub to take 
them.

Dave, Jakub,

All tags are collected in patchwork and all should be fine on DT (Rob) 
side as well.
Please tell me if you're waiting some sign from me.

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply

* Re: [PATCH net-next 0/6] Dynamic toggling of vlan_filtering for SJA1105 DSA
From: Vladimir Oltean @ 2019-08-20 10:18 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820040443.GB4919@t480s.localdomain>

On Tue, 20 Aug 2019 at 11:04, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> On Tue, 20 Aug 2019 02:59:56 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > This patchset addresses a few limitations in DSA and the bridge core
> > that made it impossible for this sequence of commands to work:
> >
> >   ip link add name br0 type bridge
> >   ip link set dev swp2 master br0
> >   echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> >
> > Only this sequence was previously working:
> >
> >   ip link add name br0 type bridge vlan_filtering 1
> >   ip link set dev swp2 master br0
>
> This is not quite true, these sequences of commands do "work". What I see
> though is that with the first sequence, the PVID 1 won't be programmed in
> the hardware. But the second sequence does program the hardware.
>
> But following bridge members will be correctly programmed with the VLAN
> though. The sequence below programs the hardware with VLAN 1 for swp3 as
> well as CPU and DSA ports, but not for swp2:
>
>     ip link add name br0 type bridge
>     ip link set dev swp2 master br0
>     echo 1 > /sys/class/net/br0/bridge/vlan_filtering
>     ip link set dev swp3 master br0
>
> This is unfortunately also true for any 802.1Q VLANs. For example, only VID
> 43 is programmed with the following sequence, but not VID 1 and VID 42:
>
>     ip link add name br0 type bridge
>     ip link set dev swp2 master br0
>     bridge vlan add dev swp2 vid 42
>     echo 1 > /sys/class/net/br0/bridge/vlan_filtering
>     bridge vlan add dev swp2 vid 43
>
> So I understand that because VLANs are not propagated by DSA to the hardware
> when VLAN filtering is disabled, a port may not be programmed with its
> bridge's default PVID, and this is causing a problem for tag_8021q.
>
> Please reword so that we understand better what is the issue being fixed here.
>

Not exactly, no.
The fact that a port in DSA is not programmed with the bridge's
default_pvid is just a fact.
Right now, letting the bridge install the default_pvid into the ports
even breaks dsa_8021q when enslaving the ports to a vlan_filtering=0
bridge, but that is perhaps only a timing issue and can be resolved in
other ways.

To understand my issue I need to make a side-note.
The sja1105 needs to be reset at runtime for changing some settings. I
am working on a separate patchset that tries to make those switch
resets as seamless as possible. The end goal is to not disrupt
linuxptp (ptp4l, phc2sys) operation when enabling the tc-taprio
offload. That means:
- saving and restoring the PTP time after reset
- preventing switch reset to happen during TX timestamping
- preventing any clock operation on the PTP timer during switch reset
Since toggling vlan_filtering also needs to reset the switch, I am
simply using that as a handy tool to test how seamless the switch
reset is.
But currently, toggling vlan_filtering breaks traffic, due to nobody
restoring the correct (from the bridge's perspective) pvid on the
ports. So I need to fix this to continue the testing.
On the user ports, I can just query the bridge, and that's exactly
what I'm doing.
On the CPU and DSA ports, I can't query the bridge because the bridge
knows nothing about them. And I also can't query the pvid from the DSA
core - it can change it, but not know what it is. So I need to be
proactive and make DSA avoid changing their pvid needlessly.
With this set of changes, the goal was for traffic to 'just work' when
changing the vlan_filtering setting back and forth. This would also
benefit other future users of dsa_8021q.
There is still work to be done. dsa_8021q uses VID range 1024-2559
privately. If the bridge had been using any VLAN in that range during
vlan_filtering=1, then dsa_8021q should also restore that VLAN.
Currently I'm not doing that because I'm still not clear whether it's
just as simple as calling br_vlan_get_info(slave, {rx,tx}_vid,
&vinfo); and adding that back (I don't completely understand the
interaction with the implicit rules on the CPU ports - I am concerned
that if the switch driver sees the VID is already installed on the CPU
port, it will just skip the command).
Ido Schimmel mentioned that static FDB entries with VLAN != PVID need
to be preserved. I am already doing something in that direction there
- in vlan_filtering=0 mode I am switching to shared VLAN learning
mode. This means the switch is ignoring the VLAN from the FDB entries,
and just uses the DMAC for L2 routing. When going back to the
vlan_filtering=1 mode, the VLAN from the static FDB entries goes back
in use. I think that's ok.
There is also stuff that cannot be reasonably expected to be preserved
across switch resets. Learned FDB entries is one thing. Traffic
(including regular, but not management, traffic originated from the
CPU) will be temporarily dropped.

> >
> > On SJA1105, the situation is further complicated by the fact that
> > toggling vlan_filtering is causing a switch reset. However, the hardware
> > state restoration logic is already there in the driver. It is a matter
> > of the layers above which need a few fixups.
> >
> > Also see this discussion thread:
> > https://www.spinics.net/lists/netdev/msg581042.html
> >
> > Patch 1/6 is not functionally related but also related to dsa_8021q
> > handling of VLANs and this is a good opportunity to bring up the subject
> > for discussion.
>
> So please send 1/6 as a separate patch and bring up the discussion there.
>

Ok.

>
> Thanks,
>
>         Vivien

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH net-next 4/6] net: dsa: Don't program the VLAN as pvid on the upstream port
From: Vladimir Oltean @ 2019-08-20 10:22 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820020709.GD975@t480s.localdomain>

Hi Vivien,

On Tue, 20 Aug 2019 at 09:07, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> On Tue, 20 Aug 2019 03:00:00 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Commit b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members")
> > programs the VLAN from the bridge into the specified port as well as the
> > upstream port, with the same set of flags.
> >
> > Consider the typical case of installing pvid 1 on user port 1, pvid 2 on
> > user port 2, etc. The upstream port would end up having a pvid equal to
> > the last user port whose pvid was programmed from the bridge. Less than
> > useful.
> >
> > So just don't change the pvid of the upstream port and let it be
> > whatever the driver set it internally to be.
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/switch.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> > index 84ab2336131e..02ccc53f1926 100644
> > --- a/net/dsa/switch.c
> > +++ b/net/dsa/switch.c
> > @@ -239,17 +239,21 @@ dsa_switch_vlan_prepare_bitmap(struct dsa_switch *ds,
> >                              const struct switchdev_obj_port_vlan *vlan,
> >                              const unsigned long *bitmap)
> >  {
> > +     struct switchdev_obj_port_vlan v = *vlan;
> >       int port, err;
> >
> >       if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
> >               return -EOPNOTSUPP;
> >
> >       for_each_set_bit(port, bitmap, ds->num_ports) {
> > -             err = dsa_port_vlan_check(ds, port, vlan);
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     v.flags &= ~BRIDGE_VLAN_INFO_PVID;
>
> So you keep the BRIDGE_VLAN_INFO_PVID flag cleared for all other ports that
> come after any CPU or DSA port?
>

It looks like the convenient hardware decision of making the CPU port
on my board also be the numerically highest one strikes again :)
I always find bugs when I change the CPU port to another number.
This is another example (also related to the inclusion of upstream
ports in the VLAN bitmap, like they all seem to be):
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d34d2baa9173f6e0c0f22d005d18e83d1cb54d8d

> > +
> > +             err = dsa_port_vlan_check(ds, port, &v);
> >               if (err)
> >                       return err;
> >
> > -             err = ds->ops->port_vlan_prepare(ds, port, vlan);
> > +             err = ds->ops->port_vlan_prepare(ds, port, &v);
> >               if (err)
> >                       return err;
> >       }
> > @@ -262,10 +266,14 @@ dsa_switch_vlan_add_bitmap(struct dsa_switch *ds,
> >                          const struct switchdev_obj_port_vlan *vlan,
> >                          const unsigned long *bitmap)
> >  {
> > +     struct switchdev_obj_port_vlan v = *vlan;
> >       int port;
> >
> > -     for_each_set_bit(port, bitmap, ds->num_ports)
> > -             ds->ops->port_vlan_add(ds, port, vlan);
> > +     for_each_set_bit(port, bitmap, ds->num_ports) {
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     v.flags &= ~BRIDGE_VLAN_INFO_PVID;
>
> Same here. Did you intend to initialize your switchdev_obj_port_vlan structure
> _within_ the for_each_set_bit loop maybe?
>

Thanks for pointing this out.

> > +             ds->ops->port_vlan_add(ds, port, &v);
> > +     }
> >  }
> >
> >  static int dsa_switch_vlan_add(struct dsa_switch *ds,
>
> Do you even test your patches?

No.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup
From: Sean Nyekjaer @ 2019-08-20 10:25 UTC (permalink / raw)
  To: Joakim Zhang, mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx,
	Martin Hundebøll
In-Reply-To: <20190816081749.19300-2-qiangqing.zhang@nxp.com>



On 16/08/2019 10.20, Joakim Zhang wrote:
> As reproted by Sean Nyekjaer below:
> When suspending, when there is still can traffic on the interfaces the
> flexcan immediately wakes the platform again. As it should :-). But it
> throws this error msg:
> [ 3169.378661] PM: noirq suspend of devices failed
> 
> On the way down to suspend the interface that throws the error message does
> call flexcan_suspend but fails to call flexcan_noirq_suspend. That means the
> flexcan_enter_stop_mode is called, but on the way out of suspend the driver
> only calls flexcan_resume and skips flexcan_noirq_resume, thus it doesn't call
> flexcan_exit_stop_mode. This leaves the flexcan in stop mode, and with the
> current driver it can't recover from this even with a soft reboot, it requires
> a hard reboot.
> 
> The best way to exit stop mode is in Wake Up interrupt context, and then
> suspend() and resume() functions can be symmetric. However, stop mode
> request and ack will be controlled by SCU(System Control Unit) firmware(manage
> clock,power,stop mode, etc. by Cortex-M4 core) in coming i.MX8(QM/QXP). And SCU
> firmware interface can't be available in interrupt context.
> 
> For compatibillity, the wake up mechanism can't be symmetric, so we need
> in_stop_mode hack.
> 
> Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
> Reported-by: Sean Nyekjaer <sean@geanix.com>
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> 

Unfortunatly it's still possible to reproduce the deadlock with this 
patch...

[  689.921717] flexcan: probe of 2094000.flexcan failed with error -110

My test setup:
PC with CAN-USB dongle connected to can0 and can1.

PC:
$ while true; do cansend can0 '123#DEADBEEF'; done

iMX6ull:
root@iwg26:~# systemctl suspend 
 

[  365.858054] systemd[1]: Reached target Sleep.
root@iwg26:~# [  365.939826] systemd[1]: Starting Suspend...
[  366.115839] systemd-sleep[248]: Suspending system...
[  366.517949] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  366.518249] PM: Device 2094000.flexcan failed to suspend: error -110
[  366.518406] PM: Some devices failed to suspend, or early wake event 
detected
[  366.732162] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  366.732285] PM: Device 2090000.flexcan failed to suspend: error -110
[  366.732330] PM: Some devices failed to suspend, or early wake event 
detected
[  366.890637] systemd-sleep[248]: System resumed.
[  366.923062] systemd[1]: Started Suspend.
[  366.942819] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  366.954791] systemd[1]: Stopped target Sleep.
[  366.962402] systemd[1]: Reached target Suspend.
[  366.977546] systemd-logind[135]: Operation 'sleep' finished.
[  366.979194] systemd[1]: suspend.target: Unit not needed anymore. 
Stopping.
[  366.993831] systemd[1]: Stopped target Suspend.
[  367.139972] systemd-networkd[220]: usb0: Lost carrier
[  367.294077] systemd-networkd[220]: usb0: Gained carrier

root@iwg26:~# candump can0 | head -n 2 

   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2 

   can1  123   [4]  DE AD BE EF
   can1  123   [4]  DE AD BE EF
root@iwg26:~# systemctl suspend 

root@iwg26:~# [  385.106658] systemd[1]: Reached target Sleep.
[  385.147602] systemd[1]: Starting Suspend...
[  385.246421] systemd-sleep[260]: Suspending system...
[  385.634733] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  385.634855] PM: Device 2090000.flexcan failed to suspend: error -110
[  385.634897] PM: Some devices failed to suspend, or early wake event 
detected
[  385.856251] PM: noirq suspend of devices failed
[  385.998364] systemd-sleep[260]: System resumed.
[  386.023390] systemd[1]: Started Suspend.
[  386.031570] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  386.055886] systemd[1]: Stopped target Sleep.
[  386.061430] systemd[1]: Reached target Suspend.
[  386.066142] systemd[1]: suspend.target: Unit not needed anymore. 
Stopping.
[  386.112575] systemd-networkd[220]: usb0: Lost carrier
[  386.116797] systemd-logind[135]: Operation 'sleep' finished.
[  386.146161] systemd[1]: Stopped target Suspend.
[  386.260866] systemd-networkd[220]: usb0: Gained carrier
root@iwg26:~# candump can0 | head -n 2
   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2 

   can1  123   [4]  DE AD BE EF
   can1  123   [4]  DE AD BE EF
root@iwg26:~# systemctl suspend 

[  396.919303] systemd[1]: Reached target Sleep.
root@iwg26:~# [  396.964722] systemd[1]: Starting Suspend...
[  397.067336] systemd-sleep[268]: Suspending system...
[  397.574571] PM: noirq suspend of devices failed
[  397.834731] PM: noirq suspend of devices failed
[  397.807996] systemd-networkd[220]: usb0: Lost carrier
[  398.156295] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  398.156339] PM: Device 2094000.flexcan failed to suspend: error -110
[  398.156509] PM: Some devices failed to suspend, or early wake event 
detected
[  398.053555] systemd-sleep[268]: Failed to write /sys/power/state: 
Device or resource busy
[  398.074751] systemd[1]: systemd-suspend.service: Main process exited, 
code=exited, status=1/FAILURE
[  398.076779] systemd[1]: systemd-suspend.service: Failed with result 
'exit-code'.
[  398.109255] systemd[1]: Failed to start Suspend.
[  398.118704] systemd[1]: Dependency failed for Suspend.
[  398.136283] systemd-logind[135]: Operation 'sleep' finished.
[  398.137770] systemd[1]: suspend.target: Job suspend.target/start 
failed with result 'dependency'.
[  398.139105] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  398.167590] systemd[1]: Stopped target Sleep.
[  398.201558] systemd-networkd[220]: usb0: Gained carrier

root@iwg26:~# candump can0 | head -n 2
   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2

nothing on can1 anymore :-(

root@iwg26:~# rmmod flexcan
[  622.884746] systemd-networkd[220]: can1: Lost carrier
[  623.046766] systemd-networkd[220]: can0: Lost carrier
root@iwg26:~# insmod /mnt/flexcan.ko
[  628.323981] flexcan 2094000.flexcan: registering netdev failed

and can1 fails to register with:
[  628.347485] flexcan: probe of 2094000.flexcan failed with error -110

/Sean

^ permalink raw reply

* Capturing syscall arguments: `kprobe`, `kretprobe` or `tracepoint` or `raw_tracepoint` type programs
From: Tahsin Rahman @ 2019-08-20 10:26 UTC (permalink / raw)
  To: netdev

Hi,
I'm new to ebpf. I want to write an ebpf program that can trace the
syscall arguments and return values.
According to my research, I can do this using `kprobe`, `kretprobe` or
`tracepoint` or `raw_tracepoint` type of bpf programs.

- What factors should I consider when choosing one type of program over another?

- Is the main difference among them is performance benefits? I'd be
great help if one can point me to any documentations about the
performance difference among different types of ebpf programs.

- How can I benchmark these programs?

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: dsa: tag_8021q: Restore bridge pvid when enabling vlan_filtering
From: Vladimir Oltean @ 2019-08-20 10:28 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, Andrew Lunn, Ido Schimmel, Roopa Prabhu, nikolay,
	David S. Miller, netdev
In-Reply-To: <bf0c064e-6304-ba31-8f45-3a6226ed8939@gmail.com>

Hi Florian,

On Tue, 20 Aug 2019 at 06:33, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 8/19/2019 5:00 PM, Vladimir Oltean wrote:
> > The bridge core assumes that enabling/disabling vlan_filtering will
> > translate into the simple toggling of a flag for switchdev drivers.
> >
> > That is clearly not the case for sja1105, which alters the VLAN table
> > and the pvids in order to obtain port separation in standalone mode.
> >
> > So, since the bridge will not call any vlan operation through switchdev
> > after enabling vlan_filtering, we need to ensure we're in a functional
> > state ourselves.
> >
> > Hence read the pvid that the bridge is aware of, and program that into
> > our ports.
>
> That is arguably applicable with DSA at large and not just specifically
> for tag_8021q.c no? Is there a reason why you are not seeking to solve
> this on a more global scale?
>

Perhaps because I don't have a good feel for what are other DSA
drivers' struggles with restoring the pvid, even after re-reading the
"What to do when a bridge port gets its pvid deleted?" thread.
I understand b53 has a similar need, and for that purpose maybe you
can EXPORT_SYMBOL_GPL(dsa_port_restore_pvid) and use it, but
otherwise, what is the more global scale?

> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/tag_8021q.c | 32 +++++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c
> > index 67a1bc635a7b..6423beb1efcd 100644
> > --- a/net/dsa/tag_8021q.c
> > +++ b/net/dsa/tag_8021q.c
> > @@ -93,6 +93,33 @@ int dsa_8021q_rx_source_port(u16 vid)
> >  }
> >  EXPORT_SYMBOL_GPL(dsa_8021q_rx_source_port);
> >
> > +static int dsa_port_restore_pvid(struct dsa_switch *ds, int port)
> > +{
> > +     struct bridge_vlan_info vinfo;
> > +     struct net_device *slave;
> > +     u16 pvid;
> > +     int err;
> > +
> > +     if (!dsa_is_user_port(ds, port))
> > +             return 0;
> > +
> > +     slave = ds->ports[port].slave;
> > +
> > +     err = br_vlan_get_pvid(slave, &pvid);
> > +     if (err < 0) {
> > +             dev_err(ds->dev, "Couldn't determine bridge PVID\n");
> > +             return err;
> > +     }
> > +
> > +     err = br_vlan_get_info(slave, pvid, &vinfo);
> > +     if (err < 0) {
> > +             dev_err(ds->dev, "Couldn't determine PVID attributes\n");
> > +             return err;
> > +     }
> > +
> > +     return dsa_port_vid_add(&ds->ports[port], pvid, vinfo.flags);
> > +}
> > +
> >  /* RX VLAN tagging (left) and TX VLAN tagging (right) setup shown for a single
> >   * front-panel switch port (here swp0).
> >   *
> > @@ -223,7 +250,10 @@ int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
> >               return err;
> >       }
> >
> > -     return 0;
> > +     if (!enabled)
> > +             err = dsa_port_restore_pvid(ds, port);
> > +
> > +     return err;
> >  }
> >  EXPORT_SYMBOL_GPL(dsa_port_setup_8021q_tagging);
> >
> >
>
> --
> Florian

Regards,
-Vladimir

^ permalink raw reply

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20 10:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB46204E237BB1E495FC799E588BAB0@DB7PR04MB4620.eurprd04.prod.outlook.com>



> 
> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 3:08 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> >
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Florian Westphal <fw@strlen.de>
> > > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > > To: Vakul Garg <vakul.garg@nxp.com>
> > > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > > >
> > > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > > running single
> > > > > > static ipsec tunnel.
> > > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > > ipsec encap
> > > > > > test (on my dual core arm board).
> > > > > > >
> > > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > > policy in variable
> > > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > > and hence the
> > > > > > lockup.
> > > > > > >
> > > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > > condition can it
> > > > > > become '0'?
> > > > > >
> > > > > > Yes, when policy is destroyed and the last user calls
> > > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > > >
> > > > > It seems that policy reference count never gets decremented during
> > > > > packet
> > > > ipsec encap.
> > > > > It is getting incremented for every frame that hits the policy.
> > > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > > >
> > > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > > current tree as well?
> > >
> > > I am yet to try it on 4.19.
> > > Can you help me with the right fix? Which part of code should it get
> > decremented?
> > > I am not conversant with xfrm code.
> >
> > Normally policy reference counts get decremented when the skb is free'd,
> via
> > dst destruction (xfrm_dst_destroy()).
> >
> > Do you see a dst leak as well?
> 
> Can you please guide me how to detect it?
> 
> (I am checking refcount on recent kernel and will let you know.)

Policy refcount is decreasing properly on 4.19.
Same should be on the latest kernel too.



^ permalink raw reply

* [PATCH net-next 0/2] netfilter: payload mangling offload support
From: Pablo Neira Ayuso @ 2019-08-20 10:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jakub.kicinski, jiri, vladbu

Hi,

This patchset adds payload mangling offload support for Netfilter:

1) Adapt existing drivers to allow for mangling up to four 32-bit words
   with one single flow_rule action. Hence, once single action can be
   used to mangle an IPv6 address.

2) Add support for netfilter packet mangling.

Please, apply.

Pablo Neira Ayuso (2):
  net: flow_offload: mangle 128-bit packet field with one action
  netfilter: nft_payload: packet mangling offload support

 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 44 ++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 50 +++++++++----
 drivers/net/ethernet/netronome/nfp/flower/action.c | 69 ++++++++++++------
 include/net/flow_offload.h                         |  9 ++-
 net/netfilter/nft_payload.c                        | 82 ++++++++++++++++++++++
 net/sched/cls_api.c                                |  7 +-
 6 files changed, 207 insertions(+), 54 deletions(-)

-- 
2.11.0



^ permalink raw reply

* [PATCH net-next 1/2] net: flow_offload: mangle 128-bit packet field with one action
From: Pablo Neira Ayuso @ 2019-08-20 10:52 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jakub.kicinski, jiri, vladbu

The existing infrastructure needs the front-end to generate up to four
actions (one for each 32-bit word) to mangle an IPv6 address. This patch
allows you to mangle fields than are longer than 4-bytes with one single
action. Drivers have been adapted to this new representation following a
simple approach, that is, iterate over the array of words and configure
the hardware IR to make the packet mangling. FLOW_ACTION_MANGLE_MAX_WORDS
defines the maximum number of words from one given offset (currently 4
words).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 44 ++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 50 +++++++++++-----
 drivers/net/ethernet/netronome/nfp/flower/action.c | 69 ++++++++++++++--------
 include/net/flow_offload.h                         |  9 ++-
 net/sched/cls_api.c                                |  7 ++-
 5 files changed, 125 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index e447976bdd3e..6a961e29a904 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -428,13 +428,17 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 		case FLOW_ACTION_MANGLE: {
 			u32 mask, val, offset;
 			u8 htype;
+			int i;
 
 			htype = act->mangle.htype;
-			mask = act->mangle.mask;
-			val = act->mangle.val;
 			offset = act->mangle.offset;
 
-			process_pedit_field(fs, val, mask, offset, htype);
+			for (i = 0; i < act->mangle.words; i++) {
+				mask = act->mangle.data[i].mask;
+				val = act->mangle.data[i].val;
+				process_pedit_field(fs, val, mask, offset, htype);
+				offset += sizeof(u32);
+			}
 			}
 			break;
 		default:
@@ -456,16 +460,9 @@ static bool valid_l4_mask(u32 mask)
 	return hi && lo ? false : true;
 }
 
-static bool valid_pedit_action(struct net_device *dev,
-			       const struct flow_action_entry *act)
+static bool __valid_pedit_action(struct net_device *dev, u8 htype,
+				 __be32 mask, __be32 offset)
 {
-	u32 mask, offset;
-	u8 htype;
-
-	htype = act->mangle.htype;
-	mask = act->mangle.mask;
-	offset = act->mangle.offset;
-
 	switch (htype) {
 	case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 		switch (offset) {
@@ -541,6 +538,29 @@ static bool valid_pedit_action(struct net_device *dev,
 		netdev_err(dev, "%s: Unsupported pedit type\n", __func__);
 		return false;
 	}
+
+	return true;
+}
+
+static bool valid_pedit_action(struct net_device *dev,
+			       const struct flow_action_entry *act)
+{
+	u32 mask, offset;
+	u8 htype;
+	int i;
+
+	htype = act->mangle.htype;
+	offset = act->mangle.offset;
+
+	for (i = 0; i < act->mangle.words; i++) {
+		mask = act->mangle.data[i].mask;
+
+		if (!__valid_pedit_action(dev, htype, mask, offset))
+			return false;
+
+		offset += sizeof(u32);
+	}
+
 	return true;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index c57f7533a6d0..bb24616ee27f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2411,6 +2411,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 	int err = -EOPNOTSUPP;
 	u32 mask, val, offset;
 	u8 htype;
+	int i;
 
 	htype = act->mangle.htype;
 	err = -EOPNOTSUPP; /* can't be all optimistic */
@@ -2426,15 +2427,19 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 		goto out_err;
 	}
 
-	mask = act->mangle.mask;
-	val = act->mangle.val;
 	offset = act->mangle.offset;
 
-	err = set_pedit_val(htype, ~mask, val, offset, &hdrs[cmd]);
-	if (err)
-		goto out_err;
+	for (i = 0; i < act->mangle.words; i++) {
+		val = act->mangle.data[i].val;
+		mask = act->mangle.data[i].mask;
 
-	hdrs[cmd].pedits++;
+		err = set_pedit_val(htype, ~mask, val, offset, &hdrs[cmd]);
+		if (err)
+			goto out_err;
+
+		offset += sizeof(u32);
+		hdrs[cmd].pedits++;
+	}
 
 	return 0;
 out_err:
@@ -2523,14 +2528,8 @@ struct ipv6_hoplimit_word {
 	__u8	hop_limit;
 };
 
-static bool is_action_keys_supported(const struct flow_action_entry *act)
+static bool __is_action_keys_supported(u8 htype, u32 offset, u32 mask)
 {
-	u32 mask, offset;
-	u8 htype;
-
-	htype = act->mangle.htype;
-	offset = act->mangle.offset;
-	mask = ~act->mangle.mask;
 	/* For IPv4 & IPv6 header check 4 byte word,
 	 * to determine that modified fields
 	 * are NOT ttl & hop_limit only.
@@ -2557,6 +2556,26 @@ static bool is_action_keys_supported(const struct flow_action_entry *act)
 	return false;
 }
 
+static bool is_action_keys_supported(const struct flow_action_entry *act)
+{
+	u32 mask, offset;
+	u8 htype;
+	int i;
+
+	htype = act->mangle.htype;
+	offset = act->mangle.offset;
+
+	for (i = 0; i < act->mangle.words; i++) {
+		mask = ~act->mangle.data[i].mask;
+		if (!__is_action_keys_supported(htype, offset, mask))
+			return false;
+
+		offset += sizeof(u32);
+	}
+
+	return true;
+}
+
 static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
 					  struct flow_action *flow_action,
 					  u32 actions,
@@ -2654,8 +2673,9 @@ static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
 		.id = FLOW_ACTION_MANGLE,
 		.mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
 		.mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
-		.mangle.mask = ~(u32)be16_to_cpu(*(__be16 *)&mask16),
-		.mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
+		.mangle.data[0].mask = ~(u32)be16_to_cpu(*(__be16 *)&mask16),
+		.mangle.data[0].val = (u32)be16_to_cpu(*(__be16 *)&val16),
+		.mangle.words = 1,
 	};
 	u8 match_prio_mask, match_prio_val;
 	void *headers_c, *headers_v;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 1b019fdfcd97..15bace2354dc 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -485,7 +485,7 @@ static void nfp_fl_set_helper32(u32 value, u32 mask, u8 *p_exact, u8 *p_mask)
 }
 
 static int
-nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
+nfp_fl_set_eth(const struct flow_action_entry *act, u32 idx, u32 off,
 	       struct nfp_fl_set_eth *set_eth, struct netlink_ext_ack *extack)
 {
 	u32 exact, mask;
@@ -495,8 +495,8 @@ nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
-	exact = act->mangle.val;
+	mask = ~act->mangle.data[idx].mask;
+	exact = act->mangle.data[idx].val;
 
 	if (exact & ~mask) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit ethernet action");
@@ -520,7 +520,7 @@ struct ipv4_ttl_word {
 };
 
 static int
-nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
+nfp_fl_set_ip4(const struct flow_action_entry *act, u32 idx, u32 off,
 	       struct nfp_fl_set_ip4_addrs *set_ip_addr,
 	       struct nfp_fl_set_ip4_ttl_tos *set_ip_ttl_tos,
 	       struct netlink_ext_ack *extack)
@@ -532,8 +532,8 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	__be32 exact, mask;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
+	mask = (__force __be32)~act->mangle.data[idx].mask;
+	exact = (__force __be32)act->mangle.data[idx].val;
 
 	if (exact & ~mask) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 action");
@@ -662,7 +662,7 @@ nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
 }
 
 static int
-nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
+nfp_fl_set_ip6(const struct flow_action_entry *act, u32 idx, u32 off,
 	       struct nfp_fl_set_ipv6_addr *ip_dst,
 	       struct nfp_fl_set_ipv6_addr *ip_src,
 	       struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl,
@@ -673,8 +673,8 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 	u8 word;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
+	mask = (__force __be32)~act->mangle.data[idx].mask;
+	exact = (__force __be32)act->mangle.data[idx].val;
 
 	if (exact & ~mask) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 action");
@@ -702,7 +702,7 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 }
 
 static int
-nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
+nfp_fl_set_tport(const struct flow_action_entry *act, u32 idx, u32 off,
 		 struct nfp_fl_set_tport *set_tport, int opcode,
 		 struct netlink_ext_ack *extack)
 {
@@ -713,8 +713,8 @@ nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
-	exact = act->mangle.val;
+	mask = ~act->mangle.data[idx].mask;
+	exact = act->mangle.data[idx].val;
 
 	if (exact & ~mask) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit L4 action");
@@ -860,32 +860,31 @@ nfp_fl_commit_mangle(struct flow_cls_offload *flow, char *nfp_action,
 }
 
 static int
-nfp_fl_pedit(const struct flow_action_entry *act,
-	     struct flow_cls_offload *flow, char *nfp_action, int *a_len,
-	     u32 *csum_updated, struct nfp_flower_pedit_acts *set_act,
-	     struct netlink_ext_ack *extack)
+__nfp_fl_pedit(const struct flow_action_entry *act, u32 idx, u32 offset,
+	       struct flow_cls_offload *flow, char *nfp_action, int *a_len,
+	       u32 *csum_updated, struct nfp_flower_pedit_acts *set_act,
+	       struct netlink_ext_ack *extack)
 {
 	enum flow_action_mangle_base htype;
-	u32 offset;
 
 	htype = act->mangle.htype;
-	offset = act->mangle.offset;
 
 	switch (htype) {
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH:
-		return nfp_fl_set_eth(act, offset, &set_act->set_eth, extack);
+		return nfp_fl_set_eth(act, idx, offset, &set_act->set_eth,
+				      extack);
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
-		return nfp_fl_set_ip4(act, offset, &set_act->set_ip_addr,
+		return nfp_fl_set_ip4(act, idx, offset, &set_act->set_ip_addr,
 				      &set_act->set_ip_ttl_tos, extack);
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
-		return nfp_fl_set_ip6(act, offset, &set_act->set_ip6_dst,
+		return nfp_fl_set_ip6(act, idx, offset, &set_act->set_ip6_dst,
 				      &set_act->set_ip6_src,
 				      &set_act->set_ip6_tc_hl_fl, extack);
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
-		return nfp_fl_set_tport(act, offset, &set_act->set_tport,
+		return nfp_fl_set_tport(act, idx, offset, &set_act->set_tport,
 					NFP_FL_ACTION_OPCODE_SET_TCP, extack);
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP:
-		return nfp_fl_set_tport(act, offset, &set_act->set_tport,
+		return nfp_fl_set_tport(act, idx, offset, &set_act->set_tport,
 					NFP_FL_ACTION_OPCODE_SET_UDP, extack);
 	default:
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: pedit on unsupported header");
@@ -894,6 +893,30 @@ nfp_fl_pedit(const struct flow_action_entry *act,
 }
 
 static int
+nfp_fl_pedit(const struct flow_action_entry *act,
+	     struct flow_cls_offload *flow, char *nfp_action, int *a_len,
+	     u32 *csum_updated, struct nfp_flower_pedit_acts *set_act,
+	     struct netlink_ext_ack *extack)
+{
+	u32 offset, idx;
+	int err;
+
+	offset = act->mangle.offset;
+
+	for (idx = 0; idx < act->mangle.words; idx++) {
+		err = __nfp_fl_pedit(act, idx, offset, flow,
+				     nfp_action, a_len, csum_updated, set_act,
+				     extack);
+		if (err < 0)
+			return err;
+
+		offset += sizeof(u32);
+	}
+
+	return 0;
+}
+
+static int
 nfp_flower_output_action(struct nfp_app *app,
 			 const struct flow_action_entry *act,
 			 struct nfp_fl_payload *nfp_fl, int *a_len,
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index e8069b6c474c..d3fc6b7dcd6a 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -153,6 +153,8 @@ enum flow_action_mangle_base {
 	FLOW_ACT_MANGLE_HDR_TYPE_UDP,
 };
 
+#define FLOW_ACTION_MANGLE_MAX_WORDS	4
+
 struct flow_action_entry {
 	enum flow_action_id		id;
 	union {
@@ -166,8 +168,11 @@ struct flow_action_entry {
 		struct {				/* FLOW_ACTION_PACKET_EDIT */
 			enum flow_action_mangle_base htype;
 			u32		offset;
-			u32		mask;
-			u32		val;
+			struct {
+				u32	mask;
+				u32	val;
+			} data[FLOW_ACTION_MANGLE_MAX_WORDS];
+			u32		words;
 		} mangle;
 		const struct ip_tunnel_info *tunnel;	/* FLOW_ACTION_TUNNEL_ENCAP */
 		u32			csum_flags;	/* FLOW_ACTION_CSUM */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e0d8b456e9f5..041cd4000389 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3077,9 +3077,12 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 					goto err_out;
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
-				entry->mangle.mask = tcf_pedit_mask(act, k);
-				entry->mangle.val = tcf_pedit_val(act, k);
+				entry->mangle.data[0].mask =
+					tcf_pedit_mask(act, k);
+				entry->mangle.data[0].val =
+					tcf_pedit_val(act, k);
 				entry->mangle.offset = tcf_pedit_offset(act, k);
+				entry->mangle.words = 1;
 				entry = &flow_action->entries[++j];
 			}
 		} else if (is_tcf_csum(act)) {
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next 2/2] netfilter: nft_payload: packet mangling offload support
From: Pablo Neira Ayuso @ 2019-08-20 10:52 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jakub.kicinski, jiri, vladbu
In-Reply-To: <20190820105225.13943-1-pablo@netfilter.org>

This patch allows for mangling packet fields using hardware offload
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_payload.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 22a80eb60222..d758c8900835 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -562,12 +562,94 @@ static int nft_payload_set_dump(struct sk_buff *skb, const struct nft_expr *expr
 	return -1;
 }
 
+static int nft_payload_offload_set_nh(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.l3num) {
+	case htons(ETH_P_IP):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP4;
+		break;
+	case htons(ETH_P_IPV6):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP6;
+		break;
+	}
+
+	return type;
+}
+
+static int nft_payload_offload_set_th(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.protonum) {
+	case IPPROTO_TCP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_TCP;
+		break;
+	case IPPROTO_UDP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_UDP;
+		break;
+	}
+
+	return type;
+}
+
+static inline u32 nft_payload_mask(int len)
+{
+	return (1 << (len * BITS_PER_BYTE)) - 1;
+}
+
+static int nft_payload_set_offload(struct nft_offload_ctx *ctx,
+				   struct nft_flow_rule *flow,
+				   const struct nft_expr *expr)
+{
+	const struct nft_payload_set *priv = nft_expr_priv(expr);
+	struct nft_offload_reg *sreg = &ctx->regs[priv->sreg];
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+	struct flow_action_entry *entry;
+	u32 words;
+	int i;
+
+	switch (priv->base) {
+	case NFT_PAYLOAD_LL_HEADER:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_ETH;
+		break;
+	case NFT_PAYLOAD_NETWORK_HEADER:
+		type = nft_payload_offload_set_nh(ctx, flow, priv);
+		break;
+	case NFT_PAYLOAD_TRANSPORT_HEADER:
+		type = nft_payload_offload_set_th(ctx, flow, priv);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+	words = round_up(priv->len, sizeof(u32)) / sizeof(u32);
+
+	entry = &flow->rule->action.entries[ctx->num_actions++];
+	entry->mangle.htype	= type;
+	entry->mangle.offset	= priv->offset;
+	for (i = 0; i < words; i++) {
+		entry->mangle.data[i].mask =
+			~htonl(nft_payload_mask(priv->len));
+		entry->mangle.data[i].val = sreg->data.data[i];
+	}
+	entry->mangle.words	= words;
+
+	return type != FLOW_ACT_MANGLE_UNSPEC ? 0 : -EOPNOTSUPP;
+}
+
 static const struct nft_expr_ops nft_payload_set_ops = {
 	.type		= &nft_payload_type,
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload_set)),
 	.eval		= nft_payload_set_eval,
 	.init		= nft_payload_set_init,
 	.dump		= nft_payload_set_dump,
+	.offload	= nft_payload_set_offload,
 };
 
 static const struct nft_expr_ops *
-- 
2.11.0



^ permalink raw reply related

* RE: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup
From: Joakim Zhang @ 2019-08-20 11:24 UTC (permalink / raw)
  To: Sean Nyekjaer, mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx,
	Martin Hundebøll
In-Reply-To: <dd8f5269-8403-702b-b054-e031423ffc73@geanix.com>


> -----Original Message-----
> From: Sean Nyekjaer <sean@geanix.com>
> Sent: 2019年8月20日 18:25
> To: Joakim Zhang <qiangqing.zhang@nxp.com>; mkl@pengutronix.de;
> linux-can@vger.kernel.org
> Cc: wg@grandegger.com; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; Martin Hundebøll <martin@geanix.com>
> Subject: Re: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self
> wakeup
> 
> 
> 
> On 16/08/2019 10.20, Joakim Zhang wrote:
> > As reproted by Sean Nyekjaer below:
> > When suspending, when there is still can traffic on the interfaces the
> > flexcan immediately wakes the platform again. As it should :-). But it
> > throws this error msg:
> > [ 3169.378661] PM: noirq suspend of devices failed
> >
> > On the way down to suspend the interface that throws the error message
> > does call flexcan_suspend but fails to call flexcan_noirq_suspend.
> > That means the flexcan_enter_stop_mode is called, but on the way out
> > of suspend the driver only calls flexcan_resume and skips
> > flexcan_noirq_resume, thus it doesn't call flexcan_exit_stop_mode.
> > This leaves the flexcan in stop mode, and with the current driver it
> > can't recover from this even with a soft reboot, it requires a hard reboot.
> >
> > The best way to exit stop mode is in Wake Up interrupt context, and
> > then
> > suspend() and resume() functions can be symmetric. However, stop mode
> > request and ack will be controlled by SCU(System Control Unit)
> > firmware(manage clock,power,stop mode, etc. by Cortex-M4 core) in
> > coming i.MX8(QM/QXP). And SCU firmware interface can't be available in
> interrupt context.
> >
> > For compatibillity, the wake up mechanism can't be symmetric, so we
> > need in_stop_mode hack.
> >
> > Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
> > Reported-by: Sean Nyekjaer <sean@geanix.com>
> > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> >
> 
> Unfortunatly it's still possible to reproduce the deadlock with this patch...
> 
> [  689.921717] flexcan: probe of 2094000.flexcan failed with error -110
> 
> My test setup:
> PC with CAN-USB dongle connected to can0 and can1.
> 
> PC:
> $ while true; do cansend can0 '123#DEADBEEF'; done
> 
> iMX6ull:
> root@iwg26:~# systemctl suspend
> 
> 
> [  365.858054] systemd[1]: Reached target Sleep.
> root@iwg26:~# [  365.939826] systemd[1]: Starting Suspend...
> [  366.115839] systemd-sleep[248]: Suspending system...
> [  366.517949] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns
> -110 [  366.518249] PM: Device 2094000.flexcan failed to suspend: error -110
> [  366.518406] PM: Some devices failed to suspend, or early wake event
> detected [  366.732162] dpm_run_callback():
> platform_pm_suspend+0x0/0x5c returns -110 [  366.732285] PM: Device
> 2090000.flexcan failed to suspend: error -110 [  366.732330] PM: Some
> devices failed to suspend, or early wake event detected [  366.890637]
> systemd-sleep[248]: System resumed.

CAN1, CAN0 suspended failed, then CAN0, CAN1 resumed back, so CAN0/CAN1 can work fine.

> [  366.923062] systemd[1]: Started Suspend.
> [  366.942819] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
> [  366.954791] systemd[1]: Stopped target Sleep.
> [  366.962402] systemd[1]: Reached target Suspend.
> [  366.977546] systemd-logind[135]: Operation 'sleep' finished.
> [  366.979194] systemd[1]: suspend.target: Unit not needed anymore.
> Stopping.
> [  366.993831] systemd[1]: Stopped target Suspend.
> [  367.139972] systemd-networkd[220]: usb0: Lost carrier [  367.294077]
> systemd-networkd[220]: usb0: Gained carrier
> 
> root@iwg26:~# candump can0 | head -n 2
> 
>    can0  123   [4]  DE AD BE EF
>    can0  123   [4]  DE AD BE EF
> root@iwg26:~# candump can1 | head -n 2
> 
>    can1  123   [4]  DE AD BE EF
>    can1  123   [4]  DE AD BE EF
> root@iwg26:~# systemctl suspend
> 
> root@iwg26:~# [  385.106658] systemd[1]: Reached target Sleep.
> [  385.147602] systemd[1]: Starting Suspend...
> [  385.246421] systemd-sleep[260]: Suspending system...
> [  385.634733] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns
> -110 [  385.634855] PM: Device 2090000.flexcan failed to suspend: error -110
> [  385.634897] PM: Some devices failed to suspend, or early wake event
> detected [  385.856251] PM: noirq suspend of devices failed [  385.998364]
> systemd-sleep[260]: System resumed.

CAN0 suspended failed, CAN1 noirq suspended failed, then CAN1, CAN0 resumed back, so CAN0/CAN1 can work fine.

> [  386.023390] systemd[1]: Started Suspend.
> [  386.031570] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
> [  386.055886] systemd[1]: Stopped target Sleep.
> [  386.061430] systemd[1]: Reached target Suspend.
> [  386.066142] systemd[1]: suspend.target: Unit not needed anymore.
> Stopping.
> [  386.112575] systemd-networkd[220]: usb0: Lost carrier [  386.116797]
> systemd-logind[135]: Operation 'sleep' finished.
> [  386.146161] systemd[1]: Stopped target Suspend.
> [  386.260866] systemd-networkd[220]: usb0: Gained carrier root@iwg26:~#
> candump can0 | head -n 2
>    can0  123   [4]  DE AD BE EF
>    can0  123   [4]  DE AD BE EF
> root@iwg26:~# candump can1 | head -n 2
> 
>    can1  123   [4]  DE AD BE EF
>    can1  123   [4]  DE AD BE EF
> root@iwg26:~# systemctl suspend
> 
> [  396.919303] systemd[1]: Reached target Sleep.
> root@iwg26:~# [  396.964722] systemd[1]: Starting Suspend...
> [  397.067336] systemd-sleep[268]: Suspending system...
> [  397.574571] PM: noirq suspend of devices failed [  397.834731] PM: noirq
> suspend of devices failed [  397.807996] systemd-networkd[220]: usb0: Lost
> carrier [  398.156295] dpm_run_callback(): platform_pm_suspend+0x0/0x5c
> returns -110 [  398.156339] PM: Device 2094000.flexcan failed to suspend:
> error -110 [  398.156509] PM: Some devices failed to suspend, or early wake
> event detected [  398.053555] systemd-sleep[268]: Failed to write
> /sys/power/state:
> Device or resource busy

But the log here is very strange and chaotic, it looks like CAN0 suspended failed, then resumed back, so CAN0 can work fine.
CAN1 noirq suspend failed, but have not resumed back, so CAN1 still in stop mode, cannot work. I think this may be other device noirq suspend failed
broke the resume of CAN1.

Could you do more debug to help locate the issue?

> [  398.074751] systemd[1]: systemd-suspend.service: Main process exited,
> code=exited, status=1/FAILURE [  398.076779] systemd[1]:

> systemd-suspend.service: Failed with result 'exit-code'.
> [  398.109255] systemd[1]: Failed to start Suspend.
> [  398.118704] systemd[1]: Dependency failed for Suspend.
> [  398.136283] systemd-logind[135]: Operation 'sleep' finished.
> [  398.137770] systemd[1]: suspend.target: Job suspend.target/start failed
> with result 'dependency'.
> [  398.139105] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
> [  398.167590] systemd[1]: Stopped target Sleep.
> [  398.201558] systemd-networkd[220]: usb0: Gained carrier

Log here also strange.

Best Regards,
Joakim Zhang
> root@iwg26:~# candump can0 | head -n 2
>    can0  123   [4]  DE AD BE EF
>    can0  123   [4]  DE AD BE EF
> root@iwg26:~# candump can1 | head -n 2
> 
> nothing on can1 anymore :-(
> 
> root@iwg26:~# rmmod flexcan
> [  622.884746] systemd-networkd[220]: can1: Lost carrier [  623.046766]
> systemd-networkd[220]: can0: Lost carrier root@iwg26:~# insmod
> /mnt/flexcan.ko [  628.323981] flexcan 2094000.flexcan: registering netdev
> failed
> 
> and can1 fails to register with:
> [  628.347485] flexcan: probe of 2094000.flexcan failed with error -110
> 
> /Sean

^ permalink raw reply

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-20 11:25 UTC (permalink / raw)
  To: Christophe de Dinechin
  Cc: Alex Williamson, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <m1o90kduow.fsf@dinechin.org>



> -----Original Message-----
> From: Christophe de Dinechin <christophe.de.dinechin@gmail.com>
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> 
> Parav Pandit writes:
> 
> > + Dave.
> >
> > Hi Jiri, Dave, Alex, Kirti, Cornelia,
> >
> > Please provide your feedback on it, how shall we proceed?
> >
> > Hence, I would like to discuss below options.
> >
> > Option-1: mdev index
> > Introduce an optional mdev index/handle as u32 during mdev create time.
> > User passes mdev index/handle as input.
> >
> > phys_port_name=mIndex=m%u
> > mdev_index will be available in sysfs as mdev attribute for udev to name the
> mdev's netdev.
> >
> > example mdev create command:
> > UUID=$(uuidgen)
> > echo $UUID index=10 >
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > example netdevs:
> > repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */
> > mdev_netdev=enm10
> >
> > Pros:
> > 1. mdevctl and any other existing tools are unaffected.
> > 2. netdev stack, ovs and other switching platforms are unaffected.
> > 3. achieves unique phys_port_name for representor netdev 4. achieves
> > unique mdev eth netdev name for the mdev using udev/systemd extension.
> > 5. Aligns well with mdev and netdev subsystem and similar to existing sriov
> bdf's.
> >
> > Option-2: shorter mdev name
> > Extend mdev to have shorter mdev device name in addition to UUID.
> > such as 'foo', 'bar'.
> > Mdev will continue to have UUID.
> > phys_port_name=mdev_name
> >
> > Pros:
> > 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> > It is common practice to upgrade iproute2 package along with the kernel.
> > Similar practice to be done with mdevctl.
> > 2. Newer users of mdevctl who wants to work with non_UUID names, will use
> newer mdevctl/tools.
> > Cons:
> > 1. Dual naming scheme of mdev might affect some of the existing tools.
> > It's unclear how/if it actually affects.
> > mdevctl [2] is very recently developed and can be enhanced for dual naming
> scheme.
> >
> > Option-3: mdev uuid alias
> > Instead of shorter mdev name or mdev index, have alpha-numeric name
> alias.
> > Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> > example mdev create command:
> > UUID=$(uuidgen)
> > echo $UUID alias=foo >
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > example netdevs:
> > examle netdevs:
> > repnetdev = ens2f0_mfoo
> > mdev_netdev=enmfoo
> >
> > Pros:
> > 1. All same as option-1.
> > 2. Doesn't affect existing mdev naming scheme.
> > Cons:
> > 1. Index scheme of option-1 is better which can number large number of
> mdevs with fewer characters, simplifying the management tool.
> 
> I believe that Alex pointed out another "Cons" to all three options, which is that
> it forces user-space to resolve potential race conditions when creating an index
> or short name or alias.
> 
This race condition exists for at least two subsystems that I know of, i.e. netdev and rdma.
If a device with a given name exists, subsystem returns error.
When user space gets error code EEXIST, and it can picks up different identifier(s).

> Also, what happens if `index=10` is not provided on the command-line?
> Does that make the device unusable for your purpose?
Yes, it is unusable to an extent.
Currently we have DEVLINK_PORT_FLAVOUR_PCI_VF in include/uapi/linux/devlink.h
Similar to it, we need to have DEVLINK_PORT_FLAVOUR_MDEV for mdev eswitch ports.
This port flavour needs to generate phys_port_name(). This should be user parameter driven.
Because representor netdevice name is generated based on this parameter.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox