Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v7 net-next 00/13] tcp: BIG TCP implementation
From: patchwork-bot+netdevbpf @ 2022-05-16  9:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, netdev, alexanderduyck, lixiaoyan, edumazet
In-Reply-To: <20220513183408.686447-1-eric.dumazet@gmail.com>

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 13 May 2022 11:33:55 -0700 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> This series implements BIG TCP as presented in netdev 0x15:
> 
> https://netdevconf.info/0x15/session.html?BIG-TCP
> 
> Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> 
> [...]

Here is the summary with links:
  - [v7,net-next,01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
    https://git.kernel.org/netdev/net-next/c/89527be8d8d6
  - [v7,net-next,02/13] net: allow gso_max_size to exceed 65536
    https://git.kernel.org/netdev/net-next/c/7c4e983c4f3c
  - [v7,net-next,03/13] net: limit GSO_MAX_SIZE to 524280 bytes
    https://git.kernel.org/netdev/net-next/c/34b92e8d19da
  - [v7,net-next,04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP
    https://git.kernel.org/netdev/net-next/c/9957b38b5e7a
  - [v7,net-next,05/13] ipv6: add struct hop_jumbo_hdr definition
    https://git.kernel.org/netdev/net-next/c/7c96d8ec96bb
  - [v7,net-next,06/13] ipv6/gso: remove temporary HBH/jumbo header
    https://git.kernel.org/netdev/net-next/c/09f3d1a3a52c
  - [v7,net-next,07/13] ipv6/gro: insert temporary HBH/jumbo header
    https://git.kernel.org/netdev/net-next/c/81fbc812132c
  - [v7,net-next,08/13] net: allow gro_max_size to exceed 65536
    https://git.kernel.org/netdev/net-next/c/0fe79f28bfaf
  - [v7,net-next,09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output
    https://git.kernel.org/netdev/net-next/c/80e425b61342
  - [v7,net-next,10/13] net: loopback: enable BIG TCP packets
    https://git.kernel.org/netdev/net-next/c/d6f938ce52f9
  - [v7,net-next,11/13] veth: enable BIG TCP packets
    https://git.kernel.org/netdev/net-next/c/d406099d6a15
  - [v7,net-next,12/13] mlx4: support BIG TCP packets
    https://git.kernel.org/netdev/net-next/c/1169a64265c4
  - [v7,net-next,13/13] mlx5: support BIG TCP packets
    https://git.kernel.org/netdev/net-next/c/de78960e025f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2] net: phy: marvell: Add errata section 5.1 for Alaska PHY
From: Marek Behún @ 2022-05-16  9:25 UTC (permalink / raw)
  To: Stefan Roese
  Cc: netdev, Leszek Polak, Andrew Lunn, Heiner Kallweit, Russell King,
	David S . Miller
In-Reply-To: <20220516070859.549170-1-sr@denx.de>

On Mon, 16 May 2022 09:08:59 +0200
Stefan Roese <sr@denx.de> wrote:

> From: Leszek Polak <lpolak@arri.de>
> 
> As per Errata Section 5.1, if EEE is intended to be used, some register
> writes must be done once after every hardware reset. This patch now adds
> the necessary register writes as listed in the Marvell errata.
> 
> Without this fix we experience ethernet problems on some of our boards
> equipped with a new version of this ethernet PHY (different supplier).
> 
> The fix applies to Marvell Alaska 88E1510/88E1518/88E1512/88E1514
> Rev. A0.
> 
> Signed-off-by: Leszek Polak <lpolak@arri.de>
> Signed-off-by: Stefan Roese <sr@denx.de>
> Cc: Marek Behún <kabel@kernel.org>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> v2:
> - Implement struct with errata reg values and loop over this
>   struct instead of using single phy_write() call for each PHY
>   reg value, as suggested by Marek
> 
>  drivers/net/phy/marvell.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 2702faf7b0f6..41353f615a66 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -1177,7 +1177,44 @@ static int m88e1318_config_init(struct phy_device *phydev)
>  
>  static int m88e1510_config_init(struct phy_device *phydev)
>  {
> +	static const struct {
> +		u16 reg17, reg16;
> +	} errata_vals[] = {
> +		{ 0x214b, 0x2144 },
> +		{ 0x0c28, 0x2146 },
> +		{ 0xb233, 0x214d },
> +		{ 0xcc0c, 0x2159 },
> +	};
>  	int err;
> +	int i;
> +
> +	/* As per Marvell Release Notes - Alaska 88E1510/88E1518/88E1512/
> +	 * 88E1514 Rev A0, Errata Section 5.1:
> +	 * If EEE is intended to be used, the following register writes
> +	 * must be done once after every hardware reset.
> +	 */
> +	err = marvell_set_page(phydev, 0x00FF);
> +	if (err < 0)
> +		return err;
> +
> +	for (i = 0; i < ARRAY_SIZE(errata_vals); ++i) {
> +		err = phy_write(phydev, 17, errata_vals[i].reg17);
> +		if (err)
> +			return err;
> +		err = phy_write(phydev, 16, errata_vals[i].reg16);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = marvell_set_page(phydev, 0x00FB);
> +	if (err < 0)
> +		return err;
> +	err = phy_write(phydev, 07, 0xC00D);
> +	if (err < 0)
> +		return err;
> +	err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
> +	if (err < 0)
> +		return err;
>  
>  	/* SGMII-to-Copper mode initialization */
>  	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {

Reviewed-by: Marek Behún <kabel@kernel.org>

^ permalink raw reply

* [PATCH -next] octeontx2-pf: Use memset_startat() helper in otx2_stop()
From: Xiu Jianfeng @ 2022-05-16  9:23 UTC (permalink / raw)
  To: sgoutham, gakula, sbhatta, hkelam, davem, edumazet, kuba, pabeni,
	xiujianfeng
  Cc: netdev, linux-kernel

Use memset_startat() helper to simplify the code, there is no functional
change in this patch.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
---
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 2b204c1c23de..208c98664ade 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1796,8 +1796,7 @@ int otx2_stop(struct net_device *netdev)
 	kfree(qset->rq);
 	kfree(qset->napi);
 	/* Do not clear RQ/SQ ringsize settings */
-	memset((void *)qset + offsetof(struct otx2_qset, sqe_cnt), 0,
-	       sizeof(*qset) - offsetof(struct otx2_qset, sqe_cnt));
+	memset_startat(qset, 0, sqe_cnt);
 	return 0;
 }
 EXPORT_SYMBOL(otx2_stop);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v4 0/5] Add Renesas RZ/V2M Ethernet support
From: patchwork-bot+netdevbpf @ 2022-05-16  9:20 UTC (permalink / raw)
  To: Phil Edworthy
  Cc: mturquette, sboyd, robh+dt, krzysztof.kozlowski+dt, davem,
	edumazet, kuba, pabeni, phil.edworthy, geert+renesas, s.shtylyov,
	sergei.shtylyov, biju.das.jz, prabhakar.mahadev-lad.rj,
	Chris.Paterson2, magnus.damm, linux-clk, netdev, devicetree,
	linux-renesas-soc
In-Reply-To: <20220512114722.35965-1-phil.edworthy@renesas.com>

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 12 May 2022 12:47:17 +0100 you wrote:
> The RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though
> some small parts are the same as R-Car Gen2.
> Other differences are:
> * It has separate data (DI), error (Line 1) and management (Line 2) irqs
>   rather than one irq for all three.
> * Instead of using the High-speed peripheral bus clock for gPTP, it has
>   a separate gPTP reference clock.
> 
> [...]

Here is the summary with links:
  - [v4,1/5] dt-bindings: net: renesas,etheravb: Document RZ/V2M SoC
    https://git.kernel.org/netdev/net-next/c/a7931ac16128
  - [v4,2/5] ravb: Separate handling of irq enable/disable regs into feature
    https://git.kernel.org/netdev/net-next/c/cb99badde146
  - [v4,3/5] ravb: Support separate Line0 (Desc), Line1 (Err) and Line2 (Mgmt) irqs
    https://git.kernel.org/netdev/net-next/c/b0265dcba3d6
  - [v4,4/5] ravb: Use separate clock for gPTP
    https://git.kernel.org/netdev/net-next/c/72069a7b2821
  - [v4,5/5] ravb: Add support for RZ/V2M
    https://git.kernel.org/netdev/net-next/c/e1154be73153

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 01/17] netfilter: ecache: use dedicated list for event redelivery
From: patchwork-bot+netdevbpf @ 2022-05-16  9:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, kuba, pabeni
In-Reply-To: <20220513214329.1136459-2-pablo@netfilter.org>

Hello:

This series was applied to netdev/net-next.git (master)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Fri, 13 May 2022 23:43:13 +0200 you wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> This disentangles event redelivery and the percpu dying list.
> 
> Because entries are now stored on a dedicated list, all
> entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries
> still have confirmed bit set -- the reference count is at least 1.
> 
> [...]

Here is the summary with links:
  - [net-next,01/17] netfilter: ecache: use dedicated list for event redelivery
    https://git.kernel.org/netdev/net-next/c/2ed3bf188b33
  - [net-next,02/17] netfilter: conntrack: include ecache dying list in dumps
    https://git.kernel.org/netdev/net-next/c/0d3cc504ba9c
  - [net-next,03/17] netfilter: conntrack: remove the percpu dying list
    https://git.kernel.org/netdev/net-next/c/1397af5bfd7d
  - [net-next,04/17] netfilter: cttimeout: decouple unlink and free on netns destruction
    https://git.kernel.org/netdev/net-next/c/78222bacfca9
  - [net-next,05/17] netfilter: remove nf_ct_unconfirmed_destroy helper
    https://git.kernel.org/netdev/net-next/c/17438b42ce14
  - [net-next,06/17] netfilter: extensions: introduce extension genid count
    https://git.kernel.org/netdev/net-next/c/c56716c69ce1
  - [net-next,07/17] netfilter: cttimeout: decouple unlink and free on netns destruction
    https://git.kernel.org/netdev/net-next/c/42df4fb9b1be
  - [net-next,08/17] netfilter: conntrack: remove __nf_ct_unconfirmed_destroy
    https://git.kernel.org/netdev/net-next/c/ace53fdc262f
  - [net-next,09/17] netfilter: conntrack: remove unconfirmed list
    https://git.kernel.org/netdev/net-next/c/8a75a2c17410
  - [net-next,10/17] netfilter: conntrack: avoid unconditional local_bh_disable
    https://git.kernel.org/netdev/net-next/c/0bcfbafbcd34
  - [net-next,11/17] netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*()
    https://git.kernel.org/netdev/net-next/c/8169ff584003
  - [net-next,12/17] netfilter: nfnetlink: allow to detect if ctnetlink listeners exist
    https://git.kernel.org/netdev/net-next/c/2794cdb0b97b
  - [net-next,13/17] netfilter: conntrack: un-inline nf_ct_ecache_ext_add
    https://git.kernel.org/netdev/net-next/c/b0a7ab4a7765
  - [net-next,14/17] netfilter: conntrack: add nf_conntrack_events autodetect mode
    https://git.kernel.org/netdev/net-next/c/90d1daa45849
  - [net-next,15/17] netfilter: prefer extension check to pointer check
    https://git.kernel.org/netdev/net-next/c/8edc81311100
  - [net-next,16/17] netfilter: flowtable: nft_flow_route use more data for reverse route
    https://git.kernel.org/netdev/net-next/c/3412e1641828
  - [net-next,17/17] netfilter: conntrack: skip verification of zero UDP checksum
    https://git.kernel.org/netdev/net-next/c/4f9bd53084d1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/2] Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"
From: patchwork-bot+netdevbpf @ 2022-05-16  9:10 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: netdev, davem, kuba, linux-can, kernel, jarkko.nikula,
	chee.houx.ong, aman.kumar, kumari.pallavi, stable
In-Reply-To: <20220514185742.407230-2-mkl@pengutronix.de>

Hello:

This series was applied to netdev/net.git (master)
by Marc Kleine-Budde <mkl@pengutronix.de>:

On Sat, 14 May 2022 20:57:41 +0200 you wrote:
> From: Jarkko Nikula <jarkko.nikula@linux.intel.com>
> 
> This reverts commit 0e8ffdf3b86dfd44b651f91b12fcae76c25c453b.
> 
> Commit 0e8ffdf3b86d ("can: m_can: pci: use custom bit timings for
> Elkhart Lake") broke the test case using bitrate switching.
> 
> [...]

Here is the summary with links:
  - [net,1/2] Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"
    https://git.kernel.org/netdev/net/c/14ea4a470494
  - [net,2/2] can: m_can: remove support for custom bit timing, take #2
    https://git.kernel.org/netdev/net/c/d6da7881020f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [GIT PULL] virtio: last minute fixup
From: Michael S. Tsirkin @ 2022-05-16  9:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Ellerman, Konstantin Ryabitsev, KVM list, virtualization,
	Netdev, Linux Kernel Mailing List, mie
In-Reply-To: <CAHk-=wgnYGY=10sRDzXCC2bmappjBTRNNbr8owvGLEW-xuV7Vw@mail.gmail.com>

On Thu, May 12, 2022 at 10:10:34AM -0700, Linus Torvalds wrote:
> On Thu, May 12, 2022 at 6:30 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >
> > Links to other random places don't serve that function.
> 
> What "function"?
> 
> This is my argument. Those Link: things need to have a *reason*.
> 
> Saying "they are a change ID" is not a reason. That's just a random
> word-salad. You need to have an active reason that you can explain,
> not just say "look, I want to add a message ID to every commit".

So I want to go to my inbox and compare the patch as received with what
is in my tree.  What did I change? And I tweak both the patch content
when applying and the subject so these are not good indicators.  Is this
at all convincing?

-- 
MST


^ permalink raw reply

* Re: [PATCH] vhost_net: fix double fget()
From: Michael S. Tsirkin @ 2022-05-16  8:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: viro, kvm, virtualization, netdev, linux-kernel, linux-fsdevel,
	ebiggers, davem
In-Reply-To: <20220516084213.26854-1-jasowang@redhat.com>

On Mon, May 16, 2022 at 04:42:13PM +0800, Jason Wang wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Here's another piece of code assuming that repeated fget() will yield the
> same opened file: in vhost_net_set_backend() we have
> 
>         sock = get_socket(fd);
>         if (IS_ERR(sock)) {
>                 r = PTR_ERR(sock);
>                 goto err_vq;
>         }
> 
>         /* start polling new socket */
>         oldsock = vhost_vq_get_backend(vq);
>         if (sock != oldsock) {
> ...
>                 vhost_vq_set_backend(vq, sock);
> ...
>                 if (index == VHOST_NET_VQ_RX)
>                         nvq->rx_ring = get_tap_ptr_ring(fd);
> 
> with
> static struct socket *get_socket(int fd)
> {
>         struct socket *sock;
> 
>         /* special case to disable backend */
>         if (fd == -1)
>                 return NULL;
>         sock = get_raw_socket(fd);
>         if (!IS_ERR(sock))
>                 return sock;
>         sock = get_tap_socket(fd);
>         if (!IS_ERR(sock))
>                 return sock;
>         return ERR_PTR(-ENOTSOCK);
> }
> and
> static struct ptr_ring *get_tap_ptr_ring(int fd)
> {
>         struct ptr_ring *ring;
>         struct file *file = fget(fd);
> 
>         if (!file)
>                 return NULL;
>         ring = tun_get_tx_ring(file);
>         if (!IS_ERR(ring))
>                 goto out;
>         ring = tap_get_ptr_ring(file);
>         if (!IS_ERR(ring))
>                 goto out;
>         ring = NULL;
> out:
>         fput(file);
>         return ring;
> }
> 
> Again, there is no promise that fd will resolve to the same thing for
> lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> enough with the guts of drivers/vhost to tell how easy it is to turn
> into attack, but it looks like trouble.  If nothing else, the pointer
> returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> anything - the reference to sock will _usually_ suffice, but that
> doesn't help any if we get a different socket on that second fget().
> 
> One possible way to fix it would be the patch below; objections?
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

and this is stable material I guess.

> ---
>  drivers/vhost/net.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 28ef323882fb..0bd7d91de792 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1449,13 +1449,9 @@ static struct socket *get_raw_socket(int fd)
>  	return ERR_PTR(r);
>  }
>  
> -static struct ptr_ring *get_tap_ptr_ring(int fd)
> +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
>  {
>  	struct ptr_ring *ring;
> -	struct file *file = fget(fd);
> -
> -	if (!file)
> -		return NULL;
>  	ring = tun_get_tx_ring(file);
>  	if (!IS_ERR(ring))
>  		goto out;
> @@ -1464,7 +1460,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
>  		goto out;
>  	ring = NULL;
>  out:
> -	fput(file);
>  	return ring;
>  }
>  
> @@ -1551,8 +1546,12 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>  		r = vhost_net_enable_vq(n, vq);
>  		if (r)
>  			goto err_used;
> -		if (index == VHOST_NET_VQ_RX)
> -			nvq->rx_ring = get_tap_ptr_ring(fd);
> +		if (index == VHOST_NET_VQ_RX) {
> +			if (sock)
> +				nvq->rx_ring = get_tap_ptr_ring(sock->file);
> +			else
> +				nvq->rx_ring = NULL;
> +		}
>  
>  		oldubufs = nvq->ubufs;
>  		nvq->ubufs = ubufs;
> -- 
> 2.25.1


^ permalink raw reply

* [PATCH] vhost_net: fix double fget()
From: Jason Wang @ 2022-05-16  8:42 UTC (permalink / raw)
  To: viro, mst, jasowang, kvm
  Cc: virtualization, netdev, linux-kernel, linux-fsdevel, ebiggers,
	davem

From: Al Viro <viro@zeniv.linux.org.uk>

Here's another piece of code assuming that repeated fget() will yield the
same opened file: in vhost_net_set_backend() we have

        sock = get_socket(fd);
        if (IS_ERR(sock)) {
                r = PTR_ERR(sock);
                goto err_vq;
        }

        /* start polling new socket */
        oldsock = vhost_vq_get_backend(vq);
        if (sock != oldsock) {
...
                vhost_vq_set_backend(vq, sock);
...
                if (index == VHOST_NET_VQ_RX)
                        nvq->rx_ring = get_tap_ptr_ring(fd);

with
static struct socket *get_socket(int fd)
{
        struct socket *sock;

        /* special case to disable backend */
        if (fd == -1)
                return NULL;
        sock = get_raw_socket(fd);
        if (!IS_ERR(sock))
                return sock;
        sock = get_tap_socket(fd);
        if (!IS_ERR(sock))
                return sock;
        return ERR_PTR(-ENOTSOCK);
}
and
static struct ptr_ring *get_tap_ptr_ring(int fd)
{
        struct ptr_ring *ring;
        struct file *file = fget(fd);

        if (!file)
                return NULL;
        ring = tun_get_tx_ring(file);
        if (!IS_ERR(ring))
                goto out;
        ring = tap_get_ptr_ring(file);
        if (!IS_ERR(ring))
                goto out;
        ring = NULL;
out:
        fput(file);
        return ring;
}

Again, there is no promise that fd will resolve to the same thing for
lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
enough with the guts of drivers/vhost to tell how easy it is to turn
into attack, but it looks like trouble.  If nothing else, the pointer
returned by tun_get_tx_ring() is not guaranteed to be pinned down by
anything - the reference to sock will _usually_ suffice, but that
doesn't help any if we get a different socket on that second fget().

One possible way to fix it would be the patch below; objections?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28ef323882fb..0bd7d91de792 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1449,13 +1449,9 @@ static struct socket *get_raw_socket(int fd)
 	return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
+static struct ptr_ring *get_tap_ptr_ring(struct file *file)
 {
 	struct ptr_ring *ring;
-	struct file *file = fget(fd);
-
-	if (!file)
-		return NULL;
 	ring = tun_get_tx_ring(file);
 	if (!IS_ERR(ring))
 		goto out;
@@ -1464,7 +1460,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
 		goto out;
 	ring = NULL;
 out:
-	fput(file);
 	return ring;
 }
 
@@ -1551,8 +1546,12 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		r = vhost_net_enable_vq(n, vq);
 		if (r)
 			goto err_used;
-		if (index == VHOST_NET_VQ_RX)
-			nvq->rx_ring = get_tap_ptr_ring(fd);
+		if (index == VHOST_NET_VQ_RX) {
+			if (sock)
+				nvq->rx_ring = get_tap_ptr_ring(sock->file);
+			else
+				nvq->rx_ring = NULL;
+		}
 
 		oldubufs = nvq->ubufs;
 		nvq->ubufs = ubufs;
-- 
2.25.1


^ permalink raw reply related

* [PATCH] net: qede: Remove unnecessary synchronize_irq() before free_irq()
From: cgel.zte @ 2022-05-16  8:22 UTC (permalink / raw)
  To: aelior
  Cc: manishc, davem, edumazet, kuba, pabeni, netdev, linux-kernel,
	Minghao Chi, Zeal Robot

From: Minghao Chi <chi.minghao@zte.com.cn>

Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index b4e5a15e308b..f56b679adb4b 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1916,7 +1916,6 @@ static void qede_sync_free_irqs(struct qede_dev *edev)
 
 	for (i = 0; i < edev->int_info.used_cnt; i++) {
 		if (edev->int_info.msix_cnt) {
-			synchronize_irq(edev->int_info.msix[i].vector);
 			free_irq(edev->int_info.msix[i].vector,
 				 &edev->fp_array[i]);
 		} else {
-- 
2.25.1



^ permalink raw reply related

* [PATCH] net: vxge: Remove unnecessary synchronize_irq() before free_irq()
From: cgel.zte @ 2022-05-16  8:19 UTC (permalink / raw)
  To: jdmason
  Cc: davem, edumazet, kuba, pabeni, netdev, linux-kernel, Minghao Chi,
	Zeal Robot

From: Minghao Chi <chi.minghao@zte.com.cn>

Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
---
 drivers/net/ethernet/neterion/vxge/vxge-main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index d2de8ac44f72..fa5d4ddf429b 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -2405,7 +2405,6 @@ static void vxge_rem_msix_isr(struct vxgedev *vdev)
 	for (intr_cnt = 0; intr_cnt < (vdev->no_of_vpath * 2 + 1);
 		intr_cnt++) {
 		if (vdev->vxge_entries[intr_cnt].in_use) {
-			synchronize_irq(vdev->entries[intr_cnt].vector);
 			free_irq(vdev->entries[intr_cnt].vector,
 				vdev->vxge_entries[intr_cnt].arg);
 			vdev->vxge_entries[intr_cnt].in_use = 0;
@@ -2427,7 +2426,6 @@ static void vxge_rem_isr(struct vxgedev *vdev)
 	    vdev->config.intr_type == MSI_X) {
 		vxge_rem_msix_isr(vdev);
 	} else if (vdev->config.intr_type == INTA) {
-			synchronize_irq(vdev->pdev->irq);
 			free_irq(vdev->pdev->irq, vdev);
 	}
 }
-- 
2.25.1



^ permalink raw reply related

* Re: [PATCH ipsec,v2] xfrm: fix "disable_policy" flag use when arriving from different devices
From: Nicolas Dichtel @ 2022-05-16  8:09 UTC (permalink / raw)
  To: Eyal Birger, davem, yoshfuji, dsahern, edumazet, kuba, pabeni,
	steffen.klassert, herbert
  Cc: netdev, Shmulik Ladkani
In-Reply-To: <20220513203402.1290131-1-eyal.birger@gmail.com>


Le 13/05/2022 à 22:34, Eyal Birger a écrit :
> In IPv4 setting the "disable_policy" flag on a device means no policy
> should be enforced for traffic originating from the device. This was
> implemented by seting the DST_NOPOLICY flag in the dst based on the
> originating device.
> 
> However, dsts are cached in nexthops regardless of the originating
> devices, in which case, the DST_NOPOLICY flag value may be incorrect.
> 
> Consider the following setup:
> 
>                      +------------------------------+
>                      | ROUTER                       |
>   +-------------+    | +-----------------+          |
>   | ipsec src   |----|-|ipsec0           |          |
>   +-------------+    | |disable_policy=0 |   +----+ |
>                      | +-----------------+   |eth1|-|-----
>   +-------------+    | +-----------------+   +----+ |
>   | noipsec src |----|-|eth0             |          |
>   +-------------+    | |disable_policy=1 |          |
>                      | +-----------------+          |
>                      +------------------------------+
> 
> Where ROUTER has a default route towards eth1.
> 
> dst entries for traffic arriving from eth0 would have DST_NOPOLICY
> and would be cached and therefore can be reused by traffic originating
> from ipsec0, skipping policy check.
> 
> Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead
> of the DST in IN/FWD IPv4 policy checks.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
From: Xu Kuohai @ 2022-05-16  7:58 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <YoH6yAtmzPQtWiFM@FVFF77S0Q05N>

On 5/16/2022 3:18 PM, Mark Rutland wrote:
> On Mon, May 16, 2022 at 02:55:46PM +0800, Xu Kuohai wrote:
>> On 5/13/2022 10:59 PM, Mark Rutland wrote:
>>> On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote:
>>>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>>>> it to replace nop with jump, or replace jump with nop.
>>>>
>>>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>>>> Acked-by: Song Liu <songliubraving@fb.com>
>>>> ---
>>>>  arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 63 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>>>> index 8ab4035dea27..3f9bdfec54c4 100644
>>>> --- a/arch/arm64/net/bpf_jit_comp.c
>>>> +++ b/arch/arm64/net/bpf_jit_comp.c
>>>> @@ -9,6 +9,7 @@
>>>>  
>>>>  #include <linux/bitfield.h>
>>>>  #include <linux/bpf.h>
>>>> +#include <linux/memory.h>
>>>>  #include <linux/filter.h>
>>>>  #include <linux/printk.h>
>>>>  #include <linux/slab.h>
>>>> @@ -18,6 +19,7 @@
>>>>  #include <asm/cacheflush.h>
>>>>  #include <asm/debug-monitors.h>
>>>>  #include <asm/insn.h>
>>>> +#include <asm/patching.h>
>>>>  #include <asm/set_memory.h>
>>>>  
>>>>  #include "bpf_jit.h"
>>>> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr)
>>>>  {
>>>>  	return vfree(addr);
>>>>  }
>>>> +
>>>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>>>> +			     void *addr, u32 *insn)
>>>> +{
>>>> +	if (!addr)
>>>> +		*insn = aarch64_insn_gen_nop();
>>>> +	else
>>>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>>>> +						    (unsigned long)addr,
>>>> +						    type);
>>>> +
>>>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>>>> +}
>>>> +
>>>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>>>> +		       void *old_addr, void *new_addr)
>>>> +{
>>>> +	int ret;
>>>> +	u32 old_insn;
>>>> +	u32 new_insn;
>>>> +	u32 replaced;
>>>> +	enum aarch64_insn_branch_type branch_type;
>>>> +
>>>> +	if (!is_bpf_text_address((long)ip))
>>>> +		/* Only poking bpf text is supported. Since kernel function
>>>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>>>> +		 * functions. For kernel funcitons, bpf_arch_text_poke() is only
>>>> +		 * called after a failed poke with ftrace. In this case, there
>>>> +		 * is probably something wrong with fentry, so there is nothing
>>>> +		 * we can do here. See register_fentry, unregister_fentry and
>>>> +		 * modify_fentry for details.
>>>> +		 */
>>>> +		return -EINVAL;
>>>
>>> If you rely on ftrace to poke functions, why do you need to patch text
>>> at all? Why does the rest of this function exist?
>>>
>>> I really don't like having another piece of code outside of ftrace
>>> patching the ftrace patch-site; this needs a much better explanation.
>>>
>>
>> Sorry for the incorrect explaination in the comment. I don't think it's
>> reasonable to patch ftrace patch-site without ftrace code either.
>>
>> The patching logic in register_fentry, unregister_fentry and
>> modify_fentry is as follows:
>>
>> if (tr->func.ftrace_managed)
>>         ret = register_ftrace_direct((long)ip, (long)new_addr);
>> else
>>         ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr,
>>                                  true);
>>
>> ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is
>> only used to patch bpf prog and bpf trampoline, which are not managed by
>> ftrace.
> 
> Sorry, I had misunderstood. Thanks for the correction!
> 
> I'll have another look with that in mind.
>>>>> +
>>>> +	if (poke_type == BPF_MOD_CALL)
>>>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>>>> +	else
>>>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
>>>> +
>>>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	mutex_lock(&text_mutex);
>>>> +	if (aarch64_insn_read(ip, &replaced)) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (replaced != old_insn) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn);
>>>
>>> ... and where does the actual synchronization come from in this case?
>>
>> aarch64_insn_patch_text_nosync() replaces an instruction atomically, so
>> no other CPUs will fetch a half-new and half-old instruction.
>>
>> The scenario here is that there is a chance that another CPU fetches the
>> old instruction after bpf_arch_text_poke() finishes, that is, different
>> CPUs may execute different versions of instructions at the same time.
>>
>> 1. When a new trampoline is attached, it doesn't seem to be an issue for
>> different CPUs to jump to different trampolines temporarily.
>>
>> 2. When an old trampoline is freed, we should wait for all other CPUs to
>> exit the trampoline and make sure the trampoline is no longer reachable,
>> IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu
>> tasks to do this.
> 
> It would be good to have a comment for these points>

will add a comment for this in v4, thanks!

> Thanks,
> Mark.
> .


^ permalink raw reply

* [PATCH] i40e: Remove unnecessary synchronize_irq() before free_irq()
From: cgel.zte @ 2022-05-16  7:27 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: anthony.l.nguyen, davem, edumazet, kuba, pabeni, intel-wired-lan,
	netdev, linux-kernel, Minghao Chi, Zeal Robot

From: Minghao Chi <chi.minghao@zte.com.cn>

Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 332a608dbaa6..2b654a53a3f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4037,7 +4037,6 @@ static void i40e_free_misc_vector(struct i40e_pf *pf)
 	i40e_flush(&pf->hw);
 
 	if (pf->flags & I40E_FLAG_MSIX_ENABLED && pf->msix_entries) {
-		synchronize_irq(pf->msix_entries[0].vector);
 		free_irq(pf->msix_entries[0].vector, pf);
 		clear_bit(__I40E_MISC_IRQ_REQUESTED, pf->state);
 	}
@@ -4776,7 +4775,6 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
 			irq_set_affinity_notifier(irq_num, NULL);
 			/* remove our suggested affinity mask for this IRQ */
 			irq_update_affinity_hint(irq_num, NULL);
-			synchronize_irq(irq_num);
 			free_irq(irq_num, vsi->q_vectors[i]);
 
 			/* Tear down the interrupt queue link list
-- 
2.25.1



^ permalink raw reply related

* [PATCH] qed: Remove unnecessary synchronize_irq() before free_irq()
From: cgel.zte @ 2022-05-16  7:26 UTC (permalink / raw)
  To: aelior
  Cc: manishc, davem, edumazet, kuba, pabeni, netdev, linux-kernel,
	Minghao Chi, Zeal Robot

From: Minghao Chi <chi.minghao@zte.com.cn>

Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
---
 drivers/net/ethernet/qlogic/qed/qed_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index c5003fa1a25e..c91898be7c03 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -823,7 +823,6 @@ static void qed_slowpath_irq_free(struct qed_dev *cdev)
 		for_each_hwfn(cdev, i) {
 			if (!cdev->hwfns[i].b_int_requested)
 				break;
-			synchronize_irq(cdev->int_params.msix_table[i].vector);
 			free_irq(cdev->int_params.msix_table[i].vector,
 				 &cdev->hwfns[i].sp_dpc);
 		}
--
2.25.1



^ permalink raw reply related

* Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
From: Mark Rutland @ 2022-05-16  7:18 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <264ecbe1-4514-d6c8-182b-3af4babb457e@huawei.com>

On Mon, May 16, 2022 at 02:55:46PM +0800, Xu Kuohai wrote:
> On 5/13/2022 10:59 PM, Mark Rutland wrote:
> > On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote:
> >> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> >> it to replace nop with jump, or replace jump with nop.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> >> ---
> >>  arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++
> >>  1 file changed, 63 insertions(+)
> >>
> >> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> >> index 8ab4035dea27..3f9bdfec54c4 100644
> >> --- a/arch/arm64/net/bpf_jit_comp.c
> >> +++ b/arch/arm64/net/bpf_jit_comp.c
> >> @@ -9,6 +9,7 @@
> >>  
> >>  #include <linux/bitfield.h>
> >>  #include <linux/bpf.h>
> >> +#include <linux/memory.h>
> >>  #include <linux/filter.h>
> >>  #include <linux/printk.h>
> >>  #include <linux/slab.h>
> >> @@ -18,6 +19,7 @@
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/debug-monitors.h>
> >>  #include <asm/insn.h>
> >> +#include <asm/patching.h>
> >>  #include <asm/set_memory.h>
> >>  
> >>  #include "bpf_jit.h"
> >> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr)
> >>  {
> >>  	return vfree(addr);
> >>  }
> >> +
> >> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> >> +			     void *addr, u32 *insn)
> >> +{
> >> +	if (!addr)
> >> +		*insn = aarch64_insn_gen_nop();
> >> +	else
> >> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> >> +						    (unsigned long)addr,
> >> +						    type);
> >> +
> >> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> >> +}
> >> +
> >> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> >> +		       void *old_addr, void *new_addr)
> >> +{
> >> +	int ret;
> >> +	u32 old_insn;
> >> +	u32 new_insn;
> >> +	u32 replaced;
> >> +	enum aarch64_insn_branch_type branch_type;
> >> +
> >> +	if (!is_bpf_text_address((long)ip))
> >> +		/* Only poking bpf text is supported. Since kernel function
> >> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> >> +		 * functions. For kernel funcitons, bpf_arch_text_poke() is only
> >> +		 * called after a failed poke with ftrace. In this case, there
> >> +		 * is probably something wrong with fentry, so there is nothing
> >> +		 * we can do here. See register_fentry, unregister_fentry and
> >> +		 * modify_fentry for details.
> >> +		 */
> >> +		return -EINVAL;
> > 
> > If you rely on ftrace to poke functions, why do you need to patch text
> > at all? Why does the rest of this function exist?
> > 
> > I really don't like having another piece of code outside of ftrace
> > patching the ftrace patch-site; this needs a much better explanation.
> > 
> 
> Sorry for the incorrect explaination in the comment. I don't think it's
> reasonable to patch ftrace patch-site without ftrace code either.
> 
> The patching logic in register_fentry, unregister_fentry and
> modify_fentry is as follows:
> 
> if (tr->func.ftrace_managed)
>         ret = register_ftrace_direct((long)ip, (long)new_addr);
> else
>         ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr,
>                                  true);
> 
> ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is
> only used to patch bpf prog and bpf trampoline, which are not managed by
> ftrace.

Sorry, I had misunderstood. Thanks for the correction!

I'll have another look with that in mind.

> >> +
> >> +	if (poke_type == BPF_MOD_CALL)
> >> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> >> +	else
> >> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	mutex_lock(&text_mutex);
> >> +	if (aarch64_insn_read(ip, &replaced)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (replaced != old_insn) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn);
> > 
> > ... and where does the actual synchronization come from in this case?
> 
> aarch64_insn_patch_text_nosync() replaces an instruction atomically, so
> no other CPUs will fetch a half-new and half-old instruction.
> 
> The scenario here is that there is a chance that another CPU fetches the
> old instruction after bpf_arch_text_poke() finishes, that is, different
> CPUs may execute different versions of instructions at the same time.
> 
> 1. When a new trampoline is attached, it doesn't seem to be an issue for
> different CPUs to jump to different trampolines temporarily.
>
> 2. When an old trampoline is freed, we should wait for all other CPUs to
> exit the trampoline and make sure the trampoline is no longer reachable,
> IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu
> tasks to do this.

It would be good to have a comment for these points.

Thanks,
Mark.

^ permalink raw reply

* [PATCH v2] net: phy: marvell: Add errata section 5.1 for Alaska PHY
From: Stefan Roese @ 2022-05-16  7:08 UTC (permalink / raw)
  To: netdev
  Cc: Leszek Polak, Marek Behún, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller

From: Leszek Polak <lpolak@arri.de>

As per Errata Section 5.1, if EEE is intended to be used, some register
writes must be done once after every hardware reset. This patch now adds
the necessary register writes as listed in the Marvell errata.

Without this fix we experience ethernet problems on some of our boards
equipped with a new version of this ethernet PHY (different supplier).

The fix applies to Marvell Alaska 88E1510/88E1518/88E1512/88E1514
Rev. A0.

Signed-off-by: Leszek Polak <lpolak@arri.de>
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
---
v2:
- Implement struct with errata reg values and loop over this
  struct instead of using single phy_write() call for each PHY
  reg value, as suggested by Marek

 drivers/net/phy/marvell.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 2702faf7b0f6..41353f615a66 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -1177,7 +1177,44 @@ static int m88e1318_config_init(struct phy_device *phydev)
 
 static int m88e1510_config_init(struct phy_device *phydev)
 {
+	static const struct {
+		u16 reg17, reg16;
+	} errata_vals[] = {
+		{ 0x214b, 0x2144 },
+		{ 0x0c28, 0x2146 },
+		{ 0xb233, 0x214d },
+		{ 0xcc0c, 0x2159 },
+	};
 	int err;
+	int i;
+
+	/* As per Marvell Release Notes - Alaska 88E1510/88E1518/88E1512/
+	 * 88E1514 Rev A0, Errata Section 5.1:
+	 * If EEE is intended to be used, the following register writes
+	 * must be done once after every hardware reset.
+	 */
+	err = marvell_set_page(phydev, 0x00FF);
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(errata_vals); ++i) {
+		err = phy_write(phydev, 17, errata_vals[i].reg17);
+		if (err)
+			return err;
+		err = phy_write(phydev, 16, errata_vals[i].reg16);
+		if (err)
+			return err;
+	}
+
+	err = marvell_set_page(phydev, 0x00FB);
+	if (err < 0)
+		return err;
+	err = phy_write(phydev, 07, 0xC00D);
+	if (err < 0)
+		return err;
+	err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
+	if (err < 0)
+		return err;
 
 	/* SGMII-to-Copper mode initialization */
 	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
-- 
2.36.1


^ permalink raw reply related

* Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
From: Xu Kuohai @ 2022-05-16  6:55 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <Yn5yb9F4uYkio4Xe@lakrids>

On 5/13/2022 10:59 PM, Mark Rutland wrote:
> On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote:
>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>> it to replace nop with jump, or replace jump with nop.
>>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
>> ---
>>  arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 63 insertions(+)
>>
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 8ab4035dea27..3f9bdfec54c4 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -9,6 +9,7 @@
>>  
>>  #include <linux/bitfield.h>
>>  #include <linux/bpf.h>
>> +#include <linux/memory.h>
>>  #include <linux/filter.h>
>>  #include <linux/printk.h>
>>  #include <linux/slab.h>
>> @@ -18,6 +19,7 @@
>>  #include <asm/cacheflush.h>
>>  #include <asm/debug-monitors.h>
>>  #include <asm/insn.h>
>> +#include <asm/patching.h>
>>  #include <asm/set_memory.h>
>>  
>>  #include "bpf_jit.h"
>> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr)
>>  {
>>  	return vfree(addr);
>>  }
>> +
>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>> +			     void *addr, u32 *insn)
>> +{
>> +	if (!addr)
>> +		*insn = aarch64_insn_gen_nop();
>> +	else
>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>> +						    (unsigned long)addr,
>> +						    type);
>> +
>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>> +}
>> +
>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>> +		       void *old_addr, void *new_addr)
>> +{
>> +	int ret;
>> +	u32 old_insn;
>> +	u32 new_insn;
>> +	u32 replaced;
>> +	enum aarch64_insn_branch_type branch_type;
>> +
>> +	if (!is_bpf_text_address((long)ip))
>> +		/* Only poking bpf text is supported. Since kernel function
>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>> +		 * functions. For kernel funcitons, bpf_arch_text_poke() is only
>> +		 * called after a failed poke with ftrace. In this case, there
>> +		 * is probably something wrong with fentry, so there is nothing
>> +		 * we can do here. See register_fentry, unregister_fentry and
>> +		 * modify_fentry for details.
>> +		 */
>> +		return -EINVAL;
> 
> If you rely on ftrace to poke functions, why do you need to patch text
> at all? Why does the rest of this function exist?
> 
> I really don't like having another piece of code outside of ftrace
> patching the ftrace patch-site; this needs a much better explanation.
> 

Sorry for the incorrect explaination in the comment. I don't think it's
reasonable to patch ftrace patch-site without ftrace code either.

The patching logic in register_fentry, unregister_fentry and
modify_fentry is as follows:

if (tr->func.ftrace_managed)
        ret = register_ftrace_direct((long)ip, (long)new_addr);
else
        ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr,
                                 true);

ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is
only used to patch bpf prog and bpf trampoline, which are not managed by
ftrace.

>> +
>> +	if (poke_type == BPF_MOD_CALL)
>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>> +	else
>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
>> +
>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	mutex_lock(&text_mutex);
>> +	if (aarch64_insn_read(ip, &replaced)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	if (replaced != old_insn) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn);
> 
> ... and where does the actual synchronization come from in this case?
> 

aarch64_insn_patch_text_nosync() replaces an instruction atomically, so
no other CPUs will fetch a half-new and half-old instruction.

The scenario here is that there is a chance that another CPU fetches the
old instruction after bpf_arch_text_poke() finishes, that is, different
CPUs may execute different versions of instructions at the same time.

1. When a new trampoline is attached, it doesn't seem to be an issue for
different CPUs to jump to different trampolines temporarily.

2. When an old trampoline is freed, we should wait for all other CPUs to
exit the trampoline and make sure the trampoline is no longer reachable,
IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu
tasks to do this.

> Thanks,
> Mark.
> 
>> +out:
>> +	mutex_unlock(&text_mutex);
>> +	return ret;
>> +}
>> -- 
>> 2.30.2
>>
> .


^ permalink raw reply

* Re: [PATCH net] NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
From: Krzysztof Kozlowski @ 2022-05-16  6:44 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel
  Cc: davem, edumazet, kuba, pabeni, gregkh, alexander.deucher, broonie,
	netdev
In-Reply-To: <20220516021028.54063-1-duoming@zju.edu.cn>

On 16/05/2022 04:10, Duoming Zhou wrote:
> There are sleep in atomic context bugs when the request to secure
> element of st21nfca is timeout. The root cause is that kzalloc and
> alloc_skb with GFP_KERNEL parameter is called in st21nfca_se_wt_timeout
> which is a timer handler. The call tree shows the execution paths that
> could lead to bugs:
> 
>    (Interrupt context)
> st21nfca_se_wt_timeout
>   nfc_hci_send_event
>     nfc_hci_hcp_message_tx
>       kzalloc(..., GFP_KERNEL) //may sleep
>       alloc_skb(..., GFP_KERNEL) //may sleep
> 
> This patch changes allocation mode of kzalloc and alloc_skb from
> GFP_KERNEL to GFP_ATOMIC in order to prevent atomic context from
> sleeping. The GFP_ATOMIC flag makes memory allocation operation
> could be used in atomic context.
> 
> Fixes: 8b8d2e08bf0d ("NFC: HCI support")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  net/nfc/hci/hcp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/nfc/hci/hcp.c b/net/nfc/hci/hcp.c
> index 05c60988f59..1caf9c2086f 100644
> --- a/net/nfc/hci/hcp.c
> +++ b/net/nfc/hci/hcp.c
> @@ -30,7 +30,7 @@ int nfc_hci_hcp_message_tx(struct nfc_hci_dev *hdev, u8 pipe,
>  	int hci_len, err;
>  	bool firstfrag = true;
>  
> -	cmd = kzalloc(sizeof(struct hci_msg), GFP_KERNEL);
> +	cmd = kzalloc(sizeof(*cmd), GFP_ATOMIC);

No, this does not look correct. This function can sleep, so it can use
GFP_KERNEL. Please just look at the function before replacing any flags...



Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH net-next 7/9] net: tcp: add skb drop reasons to tcp connect requesting
From: kernel test robot @ 2022-05-16  6:34 UTC (permalink / raw)
  To: menglong8.dong, edumazet
  Cc: kbuild-all, rostedt, mingo, davem, yoshfuji, dsahern, kuba,
	pabeni, imagedong, kafai, talalahmad, keescook, dongli.zhang,
	linux-kernel, netdev
In-Reply-To: <20220516034519.184876-8-imagedong@tencent.com>

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git d9713088158b23973266e07fdc85ff7d68791a8c
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20220516/202205161441.bl8a0hGC-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d93679590223760e685126e344dfddd7d7c08cc3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
        git checkout d93679590223760e685126e344dfddd7d7c08cc3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/dccp/ipv4.c:584:5: error: conflicting types for 'dccp_v4_conn_request'; have 'int(struct sock *, struct sk_buff *, enum skb_drop_reason *)'
     584 | int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
         |     ^~~~~~~~~~~~~~~~~~~~
   In file included from net/dccp/ipv4.c:30:
   net/dccp/dccp.h:258:5: note: previous declaration of 'dccp_v4_conn_request' with type 'int(struct sock *, struct sk_buff *)'
     258 | int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
         |     ^~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/linkage.h:7,
                    from include/linux/kernel.h:17,
                    from include/linux/uio.h:8,
                    from include/linux/socket.h:8,
                    from include/uapi/linux/in.h:24,
                    from include/linux/in.h:19,
                    from include/linux/dccp.h:6,
                    from net/dccp/ipv4.c:9:
   net/dccp/ipv4.c:660:19: error: conflicting types for 'dccp_v4_conn_request'; have 'int(struct sock *, struct sk_buff *, enum skb_drop_reason *)'
     660 | EXPORT_SYMBOL_GPL(dccp_v4_conn_request);
         |                   ^~~~~~~~~~~~~~~~~~~~
   include/linux/export.h:98:28: note: in definition of macro '___EXPORT_SYMBOL'
      98 |         extern typeof(sym) sym;                                                 \
         |                            ^~~
   include/linux/export.h:160:41: note: in expansion of macro '__EXPORT_SYMBOL'
     160 | #define _EXPORT_SYMBOL(sym, sec)        __EXPORT_SYMBOL(sym, sec, "")
         |                                         ^~~~~~~~~~~~~~~
   include/linux/export.h:164:41: note: in expansion of macro '_EXPORT_SYMBOL'
     164 | #define EXPORT_SYMBOL_GPL(sym)          _EXPORT_SYMBOL(sym, "_gpl")
         |                                         ^~~~~~~~~~~~~~
   net/dccp/ipv4.c:660:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
     660 | EXPORT_SYMBOL_GPL(dccp_v4_conn_request);
         | ^~~~~~~~~~~~~~~~~
   In file included from net/dccp/ipv4.c:30:
   net/dccp/dccp.h:258:5: note: previous declaration of 'dccp_v4_conn_request' with type 'int(struct sock *, struct sk_buff *)'
     258 | int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
         |     ^~~~~~~~~~~~~~~~~~~~
--
   net/mptcp/subflow.c: In function 'subflow_v6_conn_request':
>> net/mptcp/subflow.c:568:24: error: too few arguments to function 'subflow_v4_conn_request'
     568 |                 return subflow_v4_conn_request(sk, skb);
         |                        ^~~~~~~~~~~~~~~~~~~~~~~
   net/mptcp/subflow.c:535:12: note: declared here
     535 | static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb,
         |            ^~~~~~~~~~~~~~~~~~~~~~~


vim +584 net/dccp/ipv4.c

   583	
 > 584	int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
   585				 enum skb_drop_reason *reason)
   586	{
   587		struct inet_request_sock *ireq;
   588		struct request_sock *req;
   589		struct dccp_request_sock *dreq;
   590		const __be32 service = dccp_hdr_request(skb)->dccph_req_service;
   591		struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
   592	
   593		/* Never answer to DCCP_PKT_REQUESTs send to broadcast or multicast */
   594		if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
   595			return 0;	/* discard, don't send a reset here */
   596	
   597		if (dccp_bad_service_code(sk, service)) {
   598			dcb->dccpd_reset_code = DCCP_RESET_CODE_BAD_SERVICE_CODE;
   599			goto drop;
   600		}
   601		/*
   602		 * TW buckets are converted to open requests without
   603		 * limitations, they conserve resources and peer is
   604		 * evidently real one.
   605		 */
   606		dcb->dccpd_reset_code = DCCP_RESET_CODE_TOO_BUSY;
   607		if (inet_csk_reqsk_queue_is_full(sk))
   608			goto drop;
   609	
   610		if (sk_acceptq_is_full(sk))
   611			goto drop;
   612	
   613		req = inet_reqsk_alloc(&dccp_request_sock_ops, sk, true);
   614		if (req == NULL)
   615			goto drop;
   616	
   617		if (dccp_reqsk_init(req, dccp_sk(sk), skb))
   618			goto drop_and_free;
   619	
   620		dreq = dccp_rsk(req);
   621		if (dccp_parse_options(sk, dreq, skb))
   622			goto drop_and_free;
   623	
   624		if (security_inet_conn_request(sk, skb, req))
   625			goto drop_and_free;
   626	
   627		ireq = inet_rsk(req);
   628		sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
   629		sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
   630		ireq->ir_mark = inet_request_mark(sk, skb);
   631		ireq->ireq_family = AF_INET;
   632		ireq->ir_iif = sk->sk_bound_dev_if;
   633	
   634		/*
   635		 * Step 3: Process LISTEN state
   636		 *
   637		 * Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookie
   638		 *
   639		 * Setting S.SWL/S.SWH to is deferred to dccp_create_openreq_child().
   640		 */
   641		dreq->dreq_isr	   = dcb->dccpd_seq;
   642		dreq->dreq_gsr	   = dreq->dreq_isr;
   643		dreq->dreq_iss	   = dccp_v4_init_sequence(skb);
   644		dreq->dreq_gss     = dreq->dreq_iss;
   645		dreq->dreq_service = service;
   646	
   647		if (dccp_v4_send_response(sk, req))
   648			goto drop_and_free;
   649	
   650		inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
   651		reqsk_put(req);
   652		return 0;
   653	
   654	drop_and_free:
   655		reqsk_free(req);
   656	drop:
   657		__DCCP_INC_STATS(DCCP_MIB_ATTEMPTFAILS);
   658		return -1;
   659	}
   660	EXPORT_SYMBOL_GPL(dccp_v4_conn_request);
   661	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* [PATCH net-next] ax25: merge repeat codes in ax25_dev_device_down()
From: Lu Wei @ 2022-05-16  6:28 UTC (permalink / raw)
  To: jreuter, ralf, davem, edumazet, kuba, pabeni, linux-hams, netdev,
	linux-kernel

Merge repeat codes to reduce the duplication.

Signed-off-by: Lu Wei <luwei32@huawei.com>
---
 net/ax25/ax25_dev.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index d2a244e1c260..b80fccbac62a 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -115,23 +115,13 @@ void ax25_dev_device_down(struct net_device *dev)
 
 	if ((s = ax25_dev_list) == ax25_dev) {
 		ax25_dev_list = s->next;
-		spin_unlock_bh(&ax25_dev_lock);
-		ax25_dev_put(ax25_dev);
-		dev->ax25_ptr = NULL;
-		dev_put_track(dev, &ax25_dev->dev_tracker);
-		ax25_dev_put(ax25_dev);
-		return;
+		goto unlock_put;
 	}
 
 	while (s != NULL && s->next != NULL) {
 		if (s->next == ax25_dev) {
 			s->next = ax25_dev->next;
-			spin_unlock_bh(&ax25_dev_lock);
-			ax25_dev_put(ax25_dev);
-			dev->ax25_ptr = NULL;
-			dev_put_track(dev, &ax25_dev->dev_tracker);
-			ax25_dev_put(ax25_dev);
-			return;
+			goto unlock_put;
 		}
 
 		s = s->next;
@@ -139,6 +129,14 @@ void ax25_dev_device_down(struct net_device *dev)
 	spin_unlock_bh(&ax25_dev_lock);
 	dev->ax25_ptr = NULL;
 	ax25_dev_put(ax25_dev);
+	return;
+
+unlock_put:
+	spin_unlock_bh(&ax25_dev_lock);
+	ax25_dev_put(ax25_dev);
+	dev->ax25_ptr = NULL;
+	dev_put_track(dev, &ax25_dev->dev_tracker);
+	ax25_dev_put(ax25_dev);
 }
 
 int ax25_fwd_ioctl(unsigned int cmd, struct ax25_fwd_struct *fwd)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net] NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
From: Krzysztof Kozlowski @ 2022-05-16  6:15 UTC (permalink / raw)
  To: Duoming Zhou, netdev
  Cc: linux-kernel, davem, edumazet, kuba, pabeni, gregkh,
	alexander.deucher, broonie
In-Reply-To: <20220513133355.113222-1-duoming@zju.edu.cn>

On 13/05/2022 15:33, Duoming Zhou wrote:
> There are sleep in atomic context bugs when the request to secure
> element of st-nci is timeout. The root cause is that nci_skb_alloc
> with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is
> a timer handler. The call paths that could trigger bugs are shown below:
> 
>     (interrupt context 1)
> st_nci_se_wt_timeout
>   nci_hci_send_event
>     nci_hci_send_data
>       nci_skb_alloc(..., GFP_KERNEL) //may sleep
> 
>    (interrupt context 2)
> st_nci_se_wt_timeout
>   nci_hci_send_event
>     nci_hci_send_data
>       nci_send_data
>         nci_queue_tx_data_frags
>           nci_skb_alloc(..., GFP_KERNEL) //may sleep
> 
> This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to
> GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC
> flag makes memory allocation operation could be used in atomic context.
> 
> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation ")
> Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>


Best regards,
Krzysztof

^ permalink raw reply

* [PATCH net-next v3 2/2] net/smc: rdma write inline if qp has sufficient inline space
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel, kernel test robot
In-Reply-To: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>

Rdma write with inline flag when sending small packages,
whose length is shorter than the qp's max_inline_data, can
help reducing latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.5us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.2 us      10.6 us (-0.6 us)
2B          11.2 us      10.7 us (-0.5 us)
4B          11.3 us      10.7 us (-0.6 us)
8B          11.2 us      10.6 us (-0.6 us)
16B         11.3 us      10.7 us (-0.6 us)
32B         11.3 us      10.6 us (-0.7 us)
64B         11.2 us      11.2 us (0 us)
128B        11.2 us      11.2 us (0 us)
256B        11.2 us      11.2 us (0 us)
512B        11.4 us      11.3 us (-0.1 us)
1KB         11.4 us      11.5 us (0.1 us)
2KB         11.5 us      11.5 us (0 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
---
 net/smc/smc_tx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 98ca9229fe87..805a546e8c04 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -391,12 +391,20 @@ static int smcr_tx_rdma_writes(struct smc_connection *conn, size_t len,
 	int rc;
 
 	for (dstchunk = 0; dstchunk < 2; dstchunk++) {
-		struct ib_sge *sge =
-			wr_rdma_buf->wr_tx_rdma[dstchunk].wr.sg_list;
+		struct ib_rdma_wr *wr = &wr_rdma_buf->wr_tx_rdma[dstchunk];
+		struct ib_sge *sge = wr->wr.sg_list;
+		u64 base_addr = dma_addr;
+
+		if (dst_len < link->qp_attr.cap.max_inline_data) {
+			base_addr = (uintptr_t)conn->sndbuf_desc->cpu_addr;
+			wr->wr.send_flags |= IB_SEND_INLINE;
+		} else {
+			wr->wr.send_flags &= ~IB_SEND_INLINE;
+		}
 
 		num_sges = 0;
 		for (srcchunk = 0; srcchunk < 2; srcchunk++) {
-			sge[srcchunk].addr = dma_addr + src_off;
+			sge[srcchunk].addr = base_addr + src_off;
 			sge[srcchunk].length = src_len;
 			num_sges++;
 
@@ -410,8 +418,7 @@ static int smcr_tx_rdma_writes(struct smc_connection *conn, size_t len,
 			src_len = dst_len - src_len; /* remainder */
 			src_len_sum += src_len;
 		}
-		rc = smc_tx_rdma_write(conn, dst_off, num_sges,
-				       &wr_rdma_buf->wr_tx_rdma[dstchunk]);
+		rc = smc_tx_rdma_write(conn, dst_off, num_sges, wr);
 		if (rc)
 			return rc;
 		if (dst_len_sum == len)
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related

* [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel, kernel test robot
In-Reply-To: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>

As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      11.2 us (-0.7 us)
2B          11.7 us      11.2 us (-0.5 us)
4B          11.7 us      11.3 us (-0.4 us)
8B          11.6 us      11.2 us (-0.4 us)
16B         11.7 us      11.3 us (-0.4 us)
32B         11.7 us      11.3 us (-0.4 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.4 us (-0.4 us)
1KB         11.9 us      11.4 us (-0.5 us)
2KB         12.1 us      11.5 us (-0.6 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
---
 net/smc/smc_ib.c | 1 +
 net/smc/smc_wr.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index a3e2d3b89568..dcda4165d107 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -671,6 +671,7 @@ int smc_ib_create_queue_pair(struct smc_link *lnk)
 			.max_recv_wr = SMC_WR_BUF_CNT * 3,
 			.max_send_sge = SMC_IB_MAX_SEND_SGE,
 			.max_recv_sge = sges_per_buf,
+			.max_inline_data = 0,
 		},
 		.sq_sig_type = IB_SIGNAL_REQ_WR,
 		.qp_type = IB_QPT_RC,
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 24be1d03fef9..26f8f240d9e8 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -554,10 +554,11 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk)
 static void smc_wr_init_sge(struct smc_link *lnk)
 {
 	int sges_per_buf = (lnk->lgr->smc_version == SMC_V2) ? 2 : 1;
+	bool send_inline = (lnk->qp_attr.cap.max_inline_data > SMC_WR_TX_SIZE);
 	u32 i;
 
 	for (i = 0; i < lnk->wr_tx_cnt; i++) {
-		lnk->wr_tx_sges[i].addr =
+		lnk->wr_tx_sges[i].addr = send_inline ? (uintptr_t)(&lnk->wr_tx_bufs[i]) :
 			lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
 		lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
 		lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
@@ -575,6 +576,8 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 		lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
 		lnk->wr_tx_ibs[i].send_flags =
 			IB_SEND_SIGNALED | IB_SEND_SOLICITED;
+		if (send_inline)
+			lnk->wr_tx_ibs[i].send_flags |= IB_SEND_INLINE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[1].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.sg_list =
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related

* [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel

Send cdc msgs and write data inline if qp has sufficent inline
space, helps latency reducing. 

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      10.6 us (-1.3 us)
2B          11.7 us      10.7 us (-1.0 us)
4B          11.7 us      10.7 us (-1.0 us)
8B          11.6 us      10.6 us (-1.0 us)
16B         11.7 us      10.7 us (-1.0 us)
32B         11.7 us      10.6 us (-1.1 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.3 us (-0.5 us)
1KB         11.9 us      11.5 us (-0.4 us)
2KB         12.1 us      11.5 us (-0.6 us)

Guangguan Wang (2):
  net/smc: send cdc msg inline if qp has sufficient inline space
  net/smc: rdma write inline if qp has sufficient inline space

 net/smc/smc_ib.c |  1 +
 net/smc/smc_tx.c | 17 ++++++++++++-----
 net/smc/smc_wr.c |  5 ++++-
 3 files changed, 17 insertions(+), 6 deletions(-)

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox