Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net: fec: Coverity issue: Dereference null return value
From: Paolo Abeni @ 2022-12-20 14:46 UTC (permalink / raw)
  To: wei.fang, davem, edumazet, kuba, xiaoning.wang, shenwei.wang,
	linux-imx
  Cc: netdev, linux-kernel
In-Reply-To: <20221215091149.936369-1-wei.fang@nxp.com>

Hello,

On Thu, 2022-12-15 at 17:11 +0800, wei.fang@nxp.com wrote:
> From: Wei Fang <wei.fang@nxp.com>
> 
> The build_skb might return a null pointer but there is no check on the
> return value in the fec_enet_rx_queue(). So a null pointer dereference
> might occur. To avoid this, we check the return value of build_skb. If
> the return value is a null pointer, the driver will recycle the page and
> update the statistic of ndev. Then jump to rx_processing_done to clear
> the status flags of the BD so that the hardware can recycle the BD.
> 
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>

You need to include a suitable fixes tag here. Please repost adding it
and retaining  Alex's reviwed-by tag, thanks!

Paolo


^ permalink raw reply

* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Heiner Kallweit @ 2022-12-20 14:40 UTC (permalink / raw)
  To: Enguerrand de Ribaucourt, netdev
  Cc: pabeni, woojung.huh, davem, UNGLinuxDriver, Andrew Lunn,
	Russell King - ARM Linux
In-Reply-To: <20221220131921.806365-2-enguerrand.de-ribaucourt@savoirfairelinux.com>

On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> made non static. For consistency with the other exported functions in
> this file, EXPORT_SYMBOL should be used.
> 
No, it wasn't forgotten. It's intentional. The function is supposed to
be used within phylib only.

None of the phylib maintainers was on the addressee list of your patch.
Seems you didn't check with get_maintainers.pl.

You should explain your use case to the phylib maintainers. Maybe lan78xx
uses phylib in a wrong way, maybe an extension to phylib is needed.
Best start with explaining why lan78xx_link_status_change() needs to
fiddle with the PHY interrupt. It would help be helpful to understand
what "chip" refers to in the comment. The MAC, or the PHY?
Does the lan78xx code assume that a specific PHY is used, and the
functionality would actually belong to the respective PHY driver?

> Fixes: 3dd4ef1bdbac ("net: phy: make phy_disable_interrupts() non-static")
> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
> ---
>  drivers/net/phy/phy.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index e5b6cb1a77f9..33250da76466 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -992,6 +992,7 @@ int phy_disable_interrupts(struct phy_device *phydev)
>  	/* Disable PHY interrupts */
>  	return phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
>  }
> +EXPORT_SYMBOL(phy_disable_interrupts);
>  
>  /**
>   * phy_interrupt - PHY interrupt handler


^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: qualcomm: rmnet: add tx packets aggregation
From: Daniele Palmas @ 2022-12-20 14:28 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David Miller, Jakub Kicinski, Eric Dumazet,
	Subash Abhinov Kasiviswanathan, Sean Tranchetti, Jonathan Corbet,
	Alexander Lobakin, Gal Pressman, Bjørn Mork,
	Greg Kroah-Hartman, Dave Taht, netdev
In-Reply-To: <CAGRyCJFL5VmeserfoTMY4bR+EWKSEWrdhSTSY8UQsAZphg8PWw@mail.gmail.com>

Hello Paolo,

Il giorno ven 9 dic 2022 alle ore 08:38 Daniele Palmas
<dnlplm@gmail.com> ha scritto:
>
> Il giorno mer 7 dic 2022 alle ore 13:46 Paolo Abeni
> <pabeni@redhat.com> ha scritto:
> > > +static void rmnet_map_flush_tx_packet_work(struct work_struct *work)
> > > +{
> > > +     struct sk_buff *skb = NULL;
> > > +     struct rmnet_port *port;
> > > +
> > > +     port = container_of(work, struct rmnet_port, agg_wq);
> > > +
> > > +     spin_lock_bh(&port->agg_lock);
> > > +     if (likely(port->agg_state == -EINPROGRESS)) {
> > > +             /* Buffer may have already been shipped out */
> > > +             if (likely(port->skbagg_head)) {
> > > +                     skb = port->skbagg_head;
> > > +                     reset_aggr_params(port);
> > > +             }
> > > +             port->agg_state = 0;
> > > +     }
> > > +
> > > +     spin_unlock_bh(&port->agg_lock);
> > > +     if (skb)
> > > +             rmnet_send_skb(port, skb);
> > > +}
> > > +
> > > +static enum hrtimer_restart rmnet_map_flush_tx_packet_queue(struct hrtimer *t)
> > > +{
> > > +     struct rmnet_port *port;
> > > +
> > > +     port = container_of(t, struct rmnet_port, hrtimer);
> > > +
> > > +     schedule_work(&port->agg_wq);
> >
> > Why you need to schedule a work and you can't instead call the core of
> > rmnet_map_flush_tx_packet_work() here? it looks like the latter does
> > not need process context...
> >
>
> Ack.
>

looks like removing the work is not as straightforward as I thought.

Now the timer cb has become:

static enum hrtimer_restart rmnet_map_flush_tx_packet_cb(struct hrtimer *t)
{
    struct sk_buff *skb = NULL;
    struct rmnet_port *port;

    port = container_of(t, struct rmnet_port, hrtimer);

    spin_lock_bh(&port->agg_lock);
    if (likely(port->agg_state == -EINPROGRESS)) {
        /* Buffer may have already been shipped out */
        if (likely(port->skbagg_head)) {
            skb = port->skbagg_head;
            reset_aggr_params(port);
        }
        port->agg_state = 0;
    }
    spin_unlock_bh(&port->agg_lock);

    if (skb)
        rmnet_send_skb(port, skb);

    return HRTIMER_NORESTART;
}

but this is causing the following warning:

[ 3106.701296] WARNING: CPU: 15 PID: 0 at kernel/softirq.c:375
__local_bh_enable_ip+0x54/0x70
...
[ 3106.701537] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           OE
     6.1.0-rc5-rmnet-v4-warn #1
[ 3106.701543] Hardware name: LENOVO 30DH00H2IX/1048, BIOS S08KT40A 08/23/2021
[ 3106.701546] RIP: 0010:__local_bh_enable_ip+0x54/0x70
[ 3106.701554] Code: a9 00 ff ff 00 74 27 65 ff 0d 08 bb 75 61 65 8b
05 01 bb 75 61 85 c0 74 06 5d c3 cc cc cc cc 0f 1f 44 00 00 5d c3 cc
cc cc cc <0f> 0b eb bf 65 66 8b 05 e0 ca 76 61 66 85 c0 74 cc e8 e6 fd
ff ff
[ 3106.701559] RSP: 0018:ffffb8aa80510ec8 EFLAGS: 00010006
[ 3106.701564] RAX: 0000000080010202 RBX: ffff932d7b687868 RCX: 0000000000000000
[ 3106.701569] RDX: 0000000000000001 RSI: 0000000000000201 RDI: ffffffffc0bd5f7c
[ 3106.701573] RBP: ffffb8aa80510ec8 R08: ffff933bdc3e31a0 R09: 000002d355c2f99d
[ 3106.701576] R10: 0000000000000000 R11: ffffb8aa80510ff8 R12: ffff932d7b687828
[ 3106.701580] R13: ffff932d7b687000 R14: ffff932cc1a76400 R15: ffff933bdc3e3180
[ 3106.701584] FS:  0000000000000000(0000) GS:ffff933bdc3c0000(0000)
knlGS:0000000000000000
[ 3106.701589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3106.701593] CR2: 00007ffc26dae080 CR3: 0000000209f04003 CR4: 00000000007706e0
[ 3106.701597] PKRU: 55555554
[ 3106.701599] Call Trace:
[ 3106.701602]  <IRQ>
[ 3106.701608]  _raw_spin_unlock_bh+0x1d/0x30
[ 3106.701623]  rmnet_map_flush_tx_packet_cb+0x4c/0x90 [rmnet]
[ 3106.701640]  ? rmnet_send_skb+0x90/0x90 [rmnet]
[ 3106.701655]  __hrtimer_run_queues+0x106/0x260
[ 3106.701664]  hrtimer_interrupt+0x101/0x220
[ 3106.701671]  __sysvec_apic_timer_interrupt+0x61/0x110
[ 3106.701677]  sysvec_apic_timer_interrupt+0x7b/0x90
[ 3106.701685]  </IRQ>
[ 3106.701687]  <TASK>
[ 3106.701689]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 3106.701694] RIP: 0010:cpuidle_enter_state+0xde/0x6e0
[ 3106.701704] Code: eb d6 60 e8 74 01 68 ff 8b 53 04 49 89 c7 0f 1f
44 00 00 31 ff e8 b2 23 67 ff 80 7d d0 00 0f 85 da 00 00 00 fb 0f 1f
44 00 00 <45> 85 f6 0f 88 01 02 00 00 4d 63 ee 49 83 fd 09 0f 87 b6 04
00 00
[ 3106.701709] RSP: 0018:ffffb8aa801dbe38 EFLAGS: 00000246
[ 3106.701713] RAX: ffff933bdc3f1380 RBX: ffffd8aa7fbc0700 RCX: 000000000000001f
[ 3106.701717] RDX: 000000000000000f RSI: 0000000000000002 RDI: 0000000000000000
[ 3106.701720] RBP: ffffb8aa801dbe88 R08: 000002d355d34146 R09: 000000000006e988
[ 3106.701723] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffffa04ba5c0
[ 3106.701727] R13: 0000000000000001 R14: 0000000000000001 R15: 000002d355d34146
[ 3106.701735]  ? cpuidle_enter_state+0xce/0x6e0
[ 3106.701744]  cpuidle_enter+0x2e/0x50
[ 3106.701751]  do_idle+0x204/0x290
[ 3106.701758]  cpu_startup_entry+0x20/0x30
[ 3106.701763]  start_secondary+0x122/0x160
[ 3106.701773]  secondary_startup_64_no_verify+0xe5/0xeb
[ 3106.701784]  </TASK>

The reason is not obvious to me, so I need to dig further...

Regards,
Daniele

^ permalink raw reply

* Re: [PATCH v2 6/9] net: phy: motorcomm: Add YT8531 phy support
From: Andrew Lunn @ 2022-12-20 14:20 UTC (permalink / raw)
  To: Yanhong Wang
  Cc: linux-riscv, netdev, devicetree, linux-kernel, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Emil Renner Berthing, Richard Cochran,
	Heiner Kallweit, Peter Geis
In-Reply-To: <20221216070632.11444-7-yanhong.wang@starfivetech.com>

> Signed-off-by: Yanhong Wang <yanhong.wang@starfivetech.com>
> ---
>  drivers/net/phy/Kconfig     |   3 +-
>  drivers/net/phy/motorcomm.c | 202 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 204 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index c57a0262fb64..86399254d9ff 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -258,9 +258,10 @@ config MICROSEMI_PHY
>  
>  config MOTORCOMM_PHY
>  	tristate "Motorcomm PHYs"
> +	default SOC_STARFIVE

Please don't do this. Look at the other PHY drivers. How many have a
default like this?

Generally, if you are doing something which no other driver does, you
are doing something wrong.

> --- a/drivers/net/phy/motorcomm.c
> +++ b/drivers/net/phy/motorcomm.c
> @@ -3,13 +3,17 @@
>   * Driver for Motorcomm PHYs
>   *
>   * Author: Peter Geis <pgwipeout@gmail.com>
> + *
>   */

Please avoid changes like this. It distracts from the real code
changes.

>  
> +#include <linux/bitops.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/of.h>
>  #include <linux/phy.h>
>  
>  #define PHY_ID_YT8511		0x0000010a
> +#define PHY_ID_YT8531		0x4f51e91b
>  
>  #define YT8511_PAGE_SELECT	0x1e
>  #define YT8511_PAGE		0x1f
> @@ -17,6 +21,10 @@
>  #define YT8511_EXT_DELAY_DRIVE	0x0d
>  #define YT8511_EXT_SLEEP_CTRL	0x27
>  
> +#define YTPHY_EXT_SMI_SDS_PHY		0xa000
> +#define YTPHY_EXT_CHIP_CONFIG		0xa001
> +#define YTPHY_EXT_RGMII_CONFIG1	0xa003
> +
>  /* 2b00 25m from pll
>   * 2b01 25m from xtl *default*
>   * 2b10 62.m from pll
> @@ -38,6 +46,51 @@
>  #define YT8511_DELAY_FE_TX_EN	(0xf << 12)
>  #define YT8511_DELAY_FE_TX_DIS	(0x2 << 12)
>  
> +struct ytphy_reg_field {
> +	char *name;
> +	u32 mask;
> +	u8	dflt;	/* Default value */
> +};

This should be next to where it is used. But once you have delay in
ps, i suspect you will throw this away.

>  
> +static int ytphy_config_init(struct phy_device *phydev)

Is this specific to the 8531? If so, put 8531 in the function name?

Do any of these delay configurations also apply to the 8511?

> +{
> +	struct device_node *of_node;
> +	u32 val;
> +	u32 mask;
> +	u32 cfg;
> +	int ret;
> +	int i = 0;

Reverse Christmas tree. i needs to be earlier, val needs to be later.

> +	of_node = phydev->mdio.dev.of_node;
> +	if (of_node) {
> +		ret = of_property_read_u32(of_node, ytphy_rxden_grp[0].name, &cfg);
> +		if (!ret) {
> +			mask = ytphy_rxden_grp[0].mask;
> +			val = ytphy_read_ext(phydev, YTPHY_EXT_CHIP_CONFIG);
> +
> +			/* check the cfg overflow or not */
> +			cfg = cfg > mask >> (ffs(mask) - 1) ? mask : cfg;
> +
> +			val &= ~mask;
> +			val |= FIELD_PREP(mask, cfg);
> +			ytphy_write_ext(phydev, YTPHY_EXT_CHIP_CONFIG, val);
> +		}
> +
> +		val = ytphy_read_ext(phydev, YTPHY_EXT_RGMII_CONFIG1);
> +		for (i = 0; i < ARRAY_SIZE(ytphy_rxtxd_grp); i++) {
> +			ret = of_property_read_u32(of_node, ytphy_rxtxd_grp[i].name, &cfg);
> +			if (!ret) {
> +				mask = ytphy_rxtxd_grp[i].mask;
> +
> +				/* check the cfg overflow or not */
> +				cfg = cfg > mask >> (ffs(mask) - 1) ? mask : cfg;
> +
> +				val &= ~mask;
> +				val |= cfg << (ffs(mask) - 1);
> +			}
> +		}
> +		return ytphy_write_ext(phydev, YTPHY_EXT_RGMII_CONFIG1, val);
> +	}
> +
> +	phydev_err(phydev, "Get of node fail\n");

What about the case of the PHY is used on a board without device tree?
ACPI could be used, or there could be nothing because it is on a USB
dongle etc.

You should of defined in your device tree binding what the defaults
are when properties are missing. So apply them when there is no DT
available. Then we at least have the PHY in a known state.

       Andrew

^ permalink raw reply

* Re: [PATCH] nfc:  Fix potential resource leaks
From: Michal Swiatkowski @ 2022-12-20 14:18 UTC (permalink / raw)
  To: Miaoqian Lin
  Cc: Krzysztof Kozlowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Samuel Ortiz, Christophe Ricard,
	netdev, linux-kernel
In-Reply-To: <20221220134623.2084443-1-linmq006@gmail.com>

On Tue, Dec 20, 2022 at 05:46:23PM +0400, Miaoqian Lin wrote:
> nfc_get_device() take reference for the device, add missing
> nfc_put_device() to release it when not need anymore.
> Also fix the style warnning by use error EOPNOTSUPP instead of
> ENOTSUPP.
> 
> Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation")
> Fixes: 29e76924cf08 ("nfc: netlink: Add capability to reply to vendor_cmd with data")
> Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
> ---
Nice catch
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> -- 
> 2.25.1
> 

^ permalink raw reply

* [PATCH v2 9/9] virtio_net: support multi-buffer xdp
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

Driver can pass the skb to stack by build_skb_from_xdp_buff().

Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 65 ++++++----------------------------------
 1 file changed, 9 insertions(+), 56 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 398ffe2a5084..daa380b9d1cc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1074,7 +1074,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	unsigned int metasize = 0;
 	unsigned int frame_sz;
 	int err;
 
@@ -1165,63 +1164,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 		switch (act) {
 		case XDP_PASS:
-			metasize = xdp.data - xdp.data_meta;
-
-			/* recalculate offset to account for any header
-			 * adjustments and minus the metasize to copy the
-			 * metadata in page_to_skb(). Note other cases do not
-			 * build an skb and avoid using offset
-			 */
-			offset = xdp.data - page_address(xdp_page) -
-				 vi->hdr_len - metasize;
-
-			/* recalculate len if xdp.data, xdp.data_end or
-			 * xdp.data_meta were adjusted
-			 */
-			len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
-
-			/* recalculate headroom if xdp.data or xdp_data_meta
-			 * were adjusted, note that offset should always point
-			 * to the start of the reserved bytes for virtio_net
-			 * header which are followed by xdp.data, that means
-			 * that offset is equal to the headroom (when buf is
-			 * starting at the beginning of the page, otherwise
-			 * there is a base offset inside the page) but it's used
-			 * with a different starting point (buf start) than
-			 * xdp.data (buf start + vnet hdr size). If xdp.data or
-			 * data_meta were adjusted by the xdp prog then the
-			 * headroom size has changed and so has the offset, we
-			 * can use data_hard_start, which points at buf start +
-			 * vnet hdr size, to calculate the new headroom and use
-			 * it later to compute buf start in page_to_skb()
-			 */
-			headroom = xdp.data - xdp.data_hard_start - metasize;
-
-			/* We can only create skb based on xdp_page. */
-			if (unlikely(xdp_page != page)) {
-				rcu_read_unlock();
-				put_page(page);
-				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE);
-				return head_skb;
-			}
-			break;
+			head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
+			rcu_read_unlock();
+			return head_skb;
 		case XDP_TX:
 			stats->xdp_tx++;
 			xdpf = xdp_convert_buff_to_frame(&xdp);
 			if (unlikely(!xdpf)) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				netdev_dbg(dev, "convert buff to frame failed for xdp\n");
+				goto err_xdp_frags;
 			}
 			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
 			if (unlikely(!err)) {
 				xdp_return_frame_rx_napi(xdpf);
 			} else if (unlikely(err < 0)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				goto err_xdp_frags;
 			}
 			*xdp_xmit |= VIRTIO_XDP_TX;
 			if (unlikely(xdp_page != page))
@@ -1231,11 +1189,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		case XDP_REDIRECT:
 			stats->xdp_redirects++;
 			err = xdp_do_redirect(dev, &xdp, xdp_prog);
-			if (err) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
-			}
+			if (err)
+				goto err_xdp_frags;
 			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			if (unlikely(xdp_page != page))
 				put_page(page);
@@ -1248,9 +1203,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			trace_xdp_exception(vi->dev, xdp_prog, act);
 			fallthrough;
 		case XDP_DROP:
-			if (unlikely(xdp_page != page))
-				__free_pages(xdp_page, 0);
-			goto err_xdp;
+			goto err_xdp_frags;
 		}
 err_xdp_frags:
 		shinfo = xdp_get_shared_info_from_buff(&xdp);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb()
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

For the clear construction of xdp_buff, we remove the xdp processing
interleaved with page_to_skb(). Now, the logic of xdp and building
skb from xdp are separate and independent.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 41 +++++++++-------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4e12196fcfd4..398ffe2a5084 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 				   struct receive_queue *rq,
 				   struct page *page, unsigned int offset,
-				   unsigned int len, unsigned int truesize,
-				   bool hdr_valid, unsigned int metasize,
-				   unsigned int headroom)
+				   unsigned int len, unsigned int truesize)
 {
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	else
 		hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
-	/* If headroom is not 0, there is an offset between the beginning of the
-	 * data and the allocated space, otherwise the data and the allocated
-	 * space are aligned.
-	 *
-	 * Buffers with headroom use PAGE_SIZE as alloc size, see
-	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
-	 */
-	truesize = headroom ? PAGE_SIZE : truesize;
-	tailroom = truesize - headroom;
-	buf = p - headroom;
-
+	buf = p;
 	len -= hdr_len;
 	offset += hdr_padded_len;
 	p += hdr_padded_len;
-	tailroom -= hdr_padded_len + len;
+	tailroom = truesize - hdr_padded_len - len;
 
 	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
@@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	if (len <= skb_tailroom(skb))
 		copy = len;
 	else
-		copy = ETH_HLEN + metasize;
+		copy = ETH_HLEN;
 	skb_put_data(skb, p, copy);
 
 	len -= copy;
@@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		give_pages(rq, page);
 
 ok:
-	/* hdr_valid means no XDP, so we can copy the vnet header */
-	if (hdr_valid) {
-		hdr = skb_vnet_hdr(skb);
-		memcpy(hdr, hdr_p, hdr_len);
-	}
+	hdr = skb_vnet_hdr(skb);
+	memcpy(hdr, hdr_p, hdr_len);
 	if (page_to_free)
 		put_page(page_to_free);
 
-	if (metasize) {
-		__skb_pull(skb, metasize);
-		skb_metadata_set(skb, metasize);
-	}
-
 	return skb;
 }
 
@@ -934,7 +914,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
 {
 	struct page *page = buf;
 	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
+		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
 
 	stats->bytes += len - vi->hdr_len;
 	if (unlikely(!skb))
@@ -1222,9 +1202,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				rcu_read_unlock();
 				put_page(page);
 				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE, false,
-						       metasize,
-						       headroom);
+						       len, PAGE_SIZE);
 				return head_skb;
 			}
 			break;
@@ -1289,8 +1267,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_unlock();
 
 skip_xdp:
-	head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
-			       metasize, headroom);
+	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
 	curr_skb = head_skb;
 
 	if (unlikely(!curr_skb))
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 7/9] virtio_net: build skb from multi-buffer xdp
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

This converts the xdp_buff directly to a skb, including
multi-buffer and single buffer xdp. We'll isolate the
construction of skb based on xdp from page_to_skb().

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 50 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f31bfa7f9a6..4e12196fcfd4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -948,6 +948,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* Why not use xdp_build_skb_from_frame() ?
+ * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
+ * virtio-net there are 2 points that do not match its requirements:
+ *  1. The size of the prefilled buffer is not fixed before xdp is set.
+ *  2. When xdp is loaded, virtio-net has a hole mechanism (refer to
+ *     add_recvbuf_mergeable()), which will make the size of a buffer
+ *     exceed PAGE_SIZE.
+ */
+static struct sk_buff *build_skb_from_xdp_buff(struct net_device *dev,
+					       struct virtnet_info *vi,
+					       struct xdp_buff *xdp,
+					       unsigned int xdp_frags_truesz)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	unsigned int headroom, data_len;
+	struct sk_buff *skb;
+	int metasize;
+	u8 nr_frags;
+
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		pr_debug("Error building skb as missing reserved tailroom for xdp");
+		return NULL;
+	}
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		nr_frags = sinfo->nr_frags;
+
+	skb = build_skb(xdp->data_hard_start, xdp->frame_sz);
+	if (unlikely(!skb))
+		return NULL;
+
+	headroom = xdp->data - xdp->data_hard_start;
+	data_len = xdp->data_end - xdp->data;
+	skb_reserve(skb, headroom);
+	__skb_put(skb, data_len);
+
+	metasize = xdp->data - xdp->data_meta;
+	metasize = metasize > 0 ? metasize : 0;
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		xdp_update_skb_shared_info(skb, nr_frags,
+					   sinfo->xdp_frags_size,
+					   xdp_frags_truesz,
+					   xdp_buff_is_frag_pfmemalloc(xdp));
+
+	return skb;
+}
+
 /* TODO: build xdp in big mode */
 static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 				      struct virtnet_info *vi,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().

For the prefilled buffer before xdp is set, we will probably use
vq reset in the future. At the same time, virtio net currently
uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
the tailroom of the last frag, which will involve the offset of the
corresponding page and cause a negative value, so we disable tail
increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 60 +++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8fc3b1841d92..40bc58fa57f5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1018,6 +1018,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 unsigned int *xdp_xmit,
 					 struct virtnet_rq_stats *stats)
 {
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 	struct page *page = virt_to_head_page(buf);
@@ -1048,11 +1049,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		unsigned int xdp_frags_truesz = 0;
+		struct skb_shared_info *shinfo;
 		struct xdp_frame *xdpf;
 		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
+		int i;
 
 		/* Transient failure which in theory could occur if
 		 * in-flight packets from before XDP was enabled reach
@@ -1061,19 +1065,23 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
-		/* Buffers with headroom use PAGE_SIZE as alloc size,
-		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		/* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+		 * with headroom may add hole in truesize, which
+		 * make their length exceed PAGE_SIZE. So we disabled the
+		 * hole mechanism for xdp. See add_recvbuf_mergeable().
 		 */
 		frame_sz = headroom ? PAGE_SIZE : truesize;
 
-		/* This happens when rx buffer size is underestimated
-		 * or headroom is not enough because of the buffer
-		 * was refilled before XDP is set. This should only
-		 * happen for the first several packets, so we don't
-		 * care much about its performance.
+		/* This happens when headroom is not enough because
+		 * of the buffer was prefilled before XDP is set.
+		 * This should only happen for the first several packets.
+		 * In fact, vq reset can be used here to help us clean up
+		 * the prefilled buffers, but many existing devices do not
+		 * support it, and we don't want to bother users who are
+		 * using xdp normally.
 		 */
-		if (unlikely(num_buf > 1 ||
-			     headroom < virtnet_get_headroom(vi))) {
+		if (!xdp_prog->aux->xdp_has_frags &&
+		    (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
 			/* linearize data for XDP */
 			xdp_page = xdp_linearize_page(rq, &num_buf,
 						      page, offset,
@@ -1084,17 +1092,26 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
+		} else if (unlikely(headroom < virtnet_get_headroom(vi))) {
+			if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
+				goto err_xdp;
+
+			xdp_page = alloc_page(GFP_ATOMIC);
+			if (!xdp_page)
+				goto err_xdp;
+
+			memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+			       page_address(page) + offset, len);
+			frame_sz = PAGE_SIZE;
+			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
-
-		/* Allow consuming headroom but reserve enough space to push
-		 * the descriptor on if we get an XDP_TX return code.
-		 */
 		data = page_address(xdp_page) + offset;
-		xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
-		xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
-				 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
+		err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, len, frame_sz,
+						 &num_buf, &xdp_frags_truesz, stats);
+		if (unlikely(err))
+			goto err_xdp_frags;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -1190,6 +1207,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
+err_xdp_frags:
+		shinfo = xdp_get_shared_info_from_buff(&xdp);
+
+		if (unlikely(xdp_page != page))
+			__free_pages(xdp_page, 0);
+
+		for (i = 0; i < shinfo->nr_frags; i++) {
+			xdp_page = skb_frag_page(&shinfo->frags[i]);
+			put_page(xdp_page);
+		}
+		goto err_xdp;
 	}
 	rcu_read_unlock();
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 40bc58fa57f5..9f31bfa7f9a6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	int err;
+	struct skb_shared_info *shinfo;
+	u8 nr_frags = 0;
+	int err, i;
 
 	if (unlikely(xdpf->headroom < vi->hdr_len))
 		return -EOVERFLOW;
 
-	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
+	if (unlikely(xdp_frame_has_frags(xdpf))) {
+		shinfo = xdp_get_shared_info_from_frame(xdpf);
+		nr_frags = shinfo->nr_frags;
+	}
+
+	/* Need to adjust this to calculate the correct postion
+	 * for shinfo of the xdpf.
+	 */
+	xdpf->headroom -= vi->hdr_len;
 	xdpf->data -= vi->hdr_len;
 	/* Zero header and leave csum up to XDP layers */
 	hdr = xdpf->data;
 	memset(hdr, 0, vi->hdr_len);
 	xdpf->len   += vi->hdr_len;
 
-	sg_init_one(sq->sg, xdpf->data, xdpf->len);
+	sg_init_table(sq->sg, nr_frags + 1);
+	sg_set_buf(sq->sg, xdpf->data, xdpf->len);
+	for (i = 0; i < nr_frags; i++) {
+		skb_frag_t *frag = &shinfo->frags[i];
+
+		sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
+			    skb_frag_size(frag), skb_frag_off(frag));
+	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
-				   GFP_ATOMIC);
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
+				   xdp_to_ptr(xdpf), GFP_ATOMIC);
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

Support xdp for multi buffer packets in mergeable mode.

Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 78 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 08f209d7b0bf..8fc3b1841d92 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* TODO: build xdp in big mode */
+static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
+				      struct virtnet_info *vi,
+				      struct receive_queue *rq,
+				      struct xdp_buff *xdp,
+				      void *buf,
+				      unsigned int len,
+				      unsigned int frame_sz,
+				      u16 *num_buf,
+				      unsigned int *xdp_frags_truesize,
+				      struct virtnet_rq_stats *stats)
+{
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+	unsigned int truesize, headroom;
+	struct skb_shared_info *shinfo;
+	unsigned int xdp_frags_truesz = 0;
+	unsigned int cur_frag_size;
+	struct page *page;
+	skb_frag_t *frag;
+	int offset;
+	void *ctx;
+
+	xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
+	xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
+			 VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
+
+	if (*num_buf > 1) {
+		shinfo = xdp_get_shared_info_from_buff(xdp);
+		shinfo->nr_frags = 0;
+		shinfo->xdp_frags_size = 0;
+	}
+
+	if ((*num_buf - 1) > MAX_SKB_FRAGS)
+		return -EINVAL;
+
+	while ((--*num_buf) >= 1) {
+		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		if (unlikely(!buf)) {
+			pr_debug("%s: rx error: %d buffers out of %d missing\n",
+				 dev->name, *num_buf,
+				 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		if (!xdp_buff_has_frags(xdp))
+			xdp_buff_set_frags_flag(xdp);
+
+		stats->bytes += len;
+		page = virt_to_head_page(buf);
+		offset = buf - page_address(page);
+		truesize = mergeable_ctx_to_truesize(ctx);
+		headroom = mergeable_ctx_to_headroom(ctx);
+
+		cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
+		xdp_frags_truesz += cur_frag_size;
+		if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
+			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
+				 dev->name, len, (unsigned long)ctx);
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		frag = &shinfo->frags[shinfo->nr_frags++];
+		__skb_frag_set_page(frag, page);
+		skb_frag_off_set(frag, offset);
+		skb_frag_size_set(frag, len);
+		if (page_is_pfmemalloc(page))
+			xdp_buff_set_frag_pfmemalloc(xdp);
+
+		shinfo->xdp_frags_size += len;
+	}
+
+	*xdp_frags_truesize = xdp_frags_truesz;
+	return 0;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 3/9] virtio_net: update bytes calculation for xdp_frame
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c5c4e9db4ed3..08f209d7b0bf 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -658,7 +658,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 		if (likely(is_xdp_frame(ptr))) {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		} else {
 			struct sk_buff *skb = ptr;
@@ -1604,7 +1604,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 		} else {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		}
 		packets++;
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

XDP core assumes that the frame_size of xdp_buff and the length of
the frag are PAGE_SIZE. The hole may cause the processing of xdp to
fail, so we disable the hole mechanism when xdp is set.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9cce7dec7366..443aa7b8f0ad 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer.
+		 * XDP core assumes that frame_size of xdp_buff and the length
+		 * of the frag are PAGE_SIZE, so we disable the hole mechanism.
 		 */
-		len += hole;
+		if (!headroom)
+			len += hole;
 		alloc_frag->offset += hole;
 	}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo
In-Reply-To: <20221220141449.115918-1-hengqi@linux.alibaba.com>

When the xdp program sets xdp.frags, which means it can process
multi-buffer packets over larger MTU, so we continue to support xdp.
But for single-buffer xdp, we should keep checking for MTU.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 443aa7b8f0ad..c5c4e9db4ed3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 		return -EINVAL;
 	}
 
-	if (dev->mtu > max_sz) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
+	if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
+		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
 		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
 		return -EINVAL;
 	}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related

* [PATCH v2 0/9] virtio_net: support multi buffer xdp
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Changes since RFC:
- Using headroom instead of vi->xdp_enabled to avoid re-reading
  in add_recvbuf_mergeable();
- Disable GRO_HW and keep linearization for single buffer xdp;
- Renamed to virtnet_build_xdp_buff_mrg();
- pr_debug() to netdev_dbg();
- Adjusted the order of the patch series.

Currently, virtio net only supports xdp for single-buffer packets
or linearized multi-buffer packets. This patchset supports xdp for
multi-buffer packets, then larger MTU can be used if xdp sets the
xdp.frags. This does not affect single buffer handling.

In order to build multi-buffer xdp neatly, we integrated the code
into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
for prepared xdp buff, and the rest of the buffers are added to
its skb_shared_info structure. This structure can also be
conveniently converted during XDP_PASS to get the corresponding skb.

Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
is based on the assumption of the page pool,
(rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
is negative in most cases. So we didn't set xdp_rxq->frag_size in
virtnet_open() to disable the tail increase.

Heng Qi (9):
  virtio_net: disable the hole mechanism for xdp
  virtio_net: set up xdp for multi buffer packets
  virtio_net: update bytes calculation for xdp_frame
  virtio_net: build xdp_buff with multi buffers
  virtio_net: construct multi-buffer xdp in mergeable
  virtio_net: transmit the multi-buffer xdp
  virtio_net: build skb from multi-buffer xdp
  virtio_net: remove xdp related info from page_to_skb()
  virtio_net: support multi-buffer xdp

 drivers/net/virtio_net.c | 332 ++++++++++++++++++++++++++-------------
 1 file changed, 219 insertions(+), 113 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply

* [PATCH] vhost_vdpa: fix the compile issue in commit 881ac7d2314f
From: Cindy Lu @ 2022-12-20 14:02 UTC (permalink / raw)
  To: lulu, jasowang, mst, kvm, linux-kernel, virtualization, netdev; +Cc: stable

The input of  vhost_vdpa_iotlb_unmap() was changed in 881ac7d2314f,
But some function was not changed while calling this function.
Add this change

Cc: stable@vger.kernel.org
Fixes: 881ac7d2314f ("vhost_vdpa: fix the crash in unmap a large memory")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/vdpa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 46ce35bea705..ec32f785dfde 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -66,8 +66,8 @@ static DEFINE_IDA(vhost_vdpa_ida);
 static dev_t vhost_vdpa_major;
 
 static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
-				   struct vhost_iotlb *iotlb,
-				   u64 start, u64 last);
+				   struct vhost_iotlb *iotlb, u64 start,
+				   u64 last, u32 asid);
 
 static inline u32 iotlb_to_asid(struct vhost_iotlb *iotlb)
 {
@@ -139,7 +139,7 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
 		return -EINVAL;
 
 	hlist_del(&as->hash_link);
-	vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1);
+	vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
 	kfree(as);
 
 	return 0;
-- 
2.34.3


^ permalink raw reply related

* Re: [RFC PATCH net-next v2 0/5] net/smc:Introduce SMC-D based loopback acceleration
From: Niklas Schnelle @ 2022-12-20 14:02 UTC (permalink / raw)
  To: Wen Gu, kgraul, wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel
In-Reply-To: <1671506505-104676-1-git-send-email-guwen@linux.alibaba.com>

On Tue, 2022-12-20 at 11:21 +0800, Wen Gu wrote:
> Hi, all
> 
> # Background
> 
> As previously mentioned in [1], we (Alibaba Cloud) are trying to use SMC
> to accelerate TCP applications in cloud environment, improving inter-host
> or inter-VM communication.
> 
> In addition of these, we also found the value of SMC-D in scenario of local
> inter-process communication, such as accelerate communication between containers
> within the same host. So this RFC tries to provide a SMC-D loopback solution
> in such scenario, to bring a significant improvement in latency and throughput
> compared to TCP loopback.
> 
> # Design
> 
> This patch set provides a kind of SMC-D loopback solution.
> 
> Patch #1/5 and #2/5 provide an SMC-D based dummy device, preparing for the
> inter-process communication acceleration. Except for loopback acceleration,
> the dummy device can also meet the requirements mentioned in [2], which is
> providing a way to test SMC-D logic for broad community without ISM device.
> 
>  +------------------------------------------+
>  |  +-----------+           +-----------+   |
>  |  | process A |           | process B |   |
>  |  +-----------+           +-----------+   |
>  |       ^                        ^         |
>  |       |    +---------------+   |         |
>  |       |    |   SMC stack   |   |         |
>  |       +--->| +-----------+ |<--|         |
>  |            | |   dummy   | |             |
>  |            | |   device  | |             |
>  |            +-+-----------+-+             |
>  |                   VM                     |
>  +------------------------------------------+
> 
> Patch #3/5, #4/5, #5/5 provides a way to avoid data copy from sndbuf to RMB
> and improve SMC-D loopback performance. Through extending smcd_ops with two
> new semantic: attach_dmb and detach_dmb, sender's sndbuf shares the same
> physical memory region with receiver's RMB. The data copied from userspace
> to sender's sndbuf directly reaches the receiver's RMB without unnecessary
> memory copy in the same kernel.
> 
>  +----------+                     +----------+
>  | socket A |                     | socket B |
>  +----------+                     +----------+
>        |                               ^
>        |         +---------+           |
>   regard as      |         | ----------|
>   local sndbuf   |  B's    |     regard as
>        |         |  RMB    |     local RMB
>        |-------> |         |
>                  +---------+

Hi Wen Gu,

I maintain the s390 specific PCI support in Linux and would like to
provide a bit of background on this. You're surely wondering why we
even have a copy in there for our ISM virtual PCI device. To understand
why this copy operation exists and why we need to keep it working, one
needs a bit of s390 aka mainframe background.

On s390 all (currently supported) native machines have a mandatory
machine level hypervisor. All OSs whether z/OS or Linux run either on
this machine level hypervisor as so called Logical Partitions (LPARs)
or as second/third/… level guests on e.g. a KVM or z/VM hypervisor that
in turn runs in an LPAR. Now, in terms of memory this machine level
hypervisor sometimes called PR/SM unlike KVM, z/VM, or VMWare is a
partitioning hypervisor without paging. This is one of the main reasons
for the very-near-native performance of the machine hypervisor as the
memory of its guests acts just like native RAM on other systems. It is
never paged out and always accessible to IOMMU translated DMA from
devices without the need for pinning pages and besides a trivial
offset/limit adjustment an LPAR's MMU does the same amount of work as
an MMU on a bare metal x86_64/ARM64 box.

It also means however that when SMC-D is used to communicate between
LPARs via an ISM device there is  no way of mapping the DMBs to the
same physical memory as there exists no MMU-like layer spanning
partitions that could do such a mapping. Meanwhile for machine level
firmware including the ISM virtual PCI device it is still possible to
_copy_ memory between different memory partitions. So yeah while I do
see the appeal of skipping the memcpy() for loopback or even between
guests of a paging hypervisor such as KVM, which can map the DMBs on
the same physical memory, we must keep in mind this original use case
requiring a copy operation.

Thanks,
Niklas

> 
> # Benchmark Test
> 
>  * Test environments:
>       - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
>       - SMC sndbuf/RMB size 1MB.
> 
>  * Test object:
>       - TCP: run on TCP loopback.
>       - domain: run on UNIX domain.
>       - SMC lo: run on SMC loopback device with patch #1/5 ~ #2/5.
>       - SMC lo-nocpy: run on SMC loopback device with patch #1/5 ~ #5/5.
> 
> 1. ipc-benchmark (see [3])
> 
>  - ./<foo> -c 1000000 -s 100
> 
>                        TCP              domain              SMC-lo             SMC-lo-nocpy
> Message
> rate (msg/s)         75140      129548(+72.41)    152266(+102.64%)         151914(+102.17%)

Interesting that it does beat UNIX domain sockets. Also, see my below
comment for nginx/wrk as this seems very similar.

> 
> 2. sockperf
> 
>  - serv: <smc_run> taskset -c <cpu> sockperf sr --tcp
>  - clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
> 
>                        TCP                  SMC-lo             SMC-lo-nocpy
> Bandwidth(MBps)   4943.359        4936.096(-0.15%)        8239.624(+66.68%)
> Latency(us)          6.372          3.359(-47.28%)            3.25(-49.00%)
> 
> 3. iperf3
> 
>  - serv: <smc_run> taskset -c <cpu> iperf3 -s
>  - clnt: <smc_run> taskset -c <cpu> iperf3 -c 127.0.0.1 -t 15
> 
>                        TCP                  SMC-lo             SMC-lo-nocpy
> Bitrate(Gb/s)         40.5            41.4(+2.22%)            76.4(+88.64%)
> 
> 4. nginx/wrk
> 
>  - serv: <smc_run> nginx
>  - clnt: <smc_run> wrk -t 8 -c 500 -d 30 http://127.0.0.1:80
> 
>                        TCP                  SMC-lo             SMC-lo-nocpy
> Requests/s       154643.22      220894.03(+42.84%)        226754.3(+46.63%)


This result is very interesting indeed. So with the much more realistic
nginx/wrk workload it seems to copy hurts much less than the
iperf3/sockperf would suggest while SMC-D itself seems to help more.
I'd hope that this translates to actual applications as well. Maybe
this makes SMC-D based loopback interesting even while keeping the
copy, at least until we can come up with a sane way to work a no-copy
variant into SMC-D?

> 
> 
> # Discussion
> 
> 1. API between SMC-D and ISM device
> 
> As Jan mentioned in [2], IBM are working on placing an API between SMC-D
> and the ISM device for easier use of different "devices" for SMC-D.
> 
> So, considering that the introduction of attach_dmb or detach_dmb can
> effectively avoid data copying from sndbuf to RMB and brings obvious
> throughput advantages in inter-VM or inter-process scenarios, can the
> attach/detach semantics be taken into consideration when designing the
> API to make it a standard ISM device behavior?

Due to the reasons explained above this behavior can't be emulated by
ISM devices at least not when crossing partitions. Not sure if we can
still incorporate it in the API and allow for both copying and
remapping SMC-D like devices, it definitely needs careful consideration
and I think also a better understanding of the benefit for real world
workloads.

> 
> Maybe our RFC of SMC-D based inter-process acceleration (this one) and
> inter-VM acceleration (will coming soon, which is the update of [1])
> can provide some examples for new API design. And we are very glad to
> discuss this on the mail list.
> 
> 2. Way to select different ISM-like devices
> 
> With the proposal of SMC-D loopback 'device' (this RFC) and incoming
> device used for inter-VM acceleration as update of [1], SMC-D has more
> options to choose from. So we need to consider that how to indicate
> supported devices, how to determine which one to use, and their priority...

Agree on this part, though it is for the SMC maintainers to decide, I
think we would definitely want to be able to use any upcoming inter-VM
devices on s390 possibly also in conjunction with ISM devices for
communication across partitions.

> 
> IMHO, this may require an update of CLC message and negotiation mechanism.
> Again, we are very glad to discuss this with you on the mailing list.
> 
> [1] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/
> [2] https://lore.kernel.org/netdev/35d14144-28f7-6129-d6d3-ba16dae7a646@linux.ibm.com/
> [3] https://github.com/goldsborough/ipc-bench
> 
> v1->v2
>  1. Fix some build WARNINGs complained by kernel test rebot
>     Reported-by: kernel test robot <lkp@intel.com>
>  2. Add iperf3 test data.
> 
> Wen Gu (5):
>   net/smc: introduce SMC-D loopback device
>   net/smc: choose loopback device in SMC-D communication
>   net/smc: add dmb attach and detach interface
>   net/smc: avoid data copy from sndbuf to peer RMB in SMC-D loopback
>   net/smc: logic of cursors update in SMC-D loopback connections
> 
>  include/net/smc.h      |   3 +
>  net/smc/Makefile       |   2 +-
>  net/smc/af_smc.c       |  88 +++++++++++-
>  net/smc/smc_cdc.c      |  59 ++++++--
>  net/smc/smc_cdc.h      |   1 +
>  net/smc/smc_clc.c      |   4 +-
>  net/smc/smc_core.c     |  62 +++++++++
>  net/smc/smc_core.h     |   2 +
>  net/smc/smc_ism.c      |  39 +++++-
>  net/smc/smc_ism.h      |   2 +
>  net/smc/smc_loopback.c | 358 +++++++++++++++++++++++++++++++++++++++++++++++++
>  net/smc/smc_loopback.h |  63 +++++++++
>  12 files changed, 662 insertions(+), 21 deletions(-)
>  create mode 100644 net/smc/smc_loopback.c
>  create mode 100644 net/smc/smc_loopback.h
> 


^ permalink raw reply

* [linux-next:master] BUILD REGRESSION e45fb347b630ee76482fe938ba76cf8eab811290
From: kernel test robot @ 2022-12-20 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: virtualization, speakup, netdev, loongarch, linuxppc-dev,
	linux-xfs, linux-omap, linux-mm, linux-media, linux-cxl,
	linux-can, linux-arm-msm, linux-arm-kernel,
	Linux Memory Management List

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: e45fb347b630ee76482fe938ba76cf8eab811290  Add linux-next specific files for 20221220

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202211242120.MzZVGULn-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212020520.0OkMIno3-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212040713.rVney9e8-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212061455.6GE7y0jg-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212090509.NjAl9tbo-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212191708.Xk9yBj52-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212201859.qUGugK1F-lkp@intel.com
https://lore.kernel.org/oe-kbuild-all/202212202020.qL8Aaqu0-lkp@intel.com

Error/Warning: (recently discovered and may have been fixed)

Documentation/gpu/drm-internals:179: ./include/drm/drm_file.h:411: WARNING: undefined label: drm_accel_node (if the link has no caption the label must precede a section header)
Documentation/networking/devlink/etas_es58x.rst: WARNING: document isn't included in any toctree
Warning: tools/power/cpupower/man/cpupower-powercap-info.1 references a file that doesn't exist: Documentation/power/powercap/powercap.txt
arch/arm/kernel/entry-armv.S:485:5: warning: "CONFIG_ARM_THUMB" is not defined, evaluates to 0 [-Wundef]
arch/loongarch/kernel/asm-offsets.c:265:6: warning: no previous prototype for 'output_pbe_defines' [-Wmissing-prototypes]
arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction
arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction
drivers/regulator/tps65219-regulator.c:310:32: warning: parameter 'dev' set but not used [-Wunused-but-set-parameter]
drivers/regulator/tps65219-regulator.c:310:60: warning: parameter 'dev' set but not used [-Wunused-but-set-parameter]
drivers/regulator/tps65219-regulator.c:370:26: warning: ordered comparison of pointer with integer zero [-Wextra]
lib/dhry_run.c:61:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
mm/memfd.c:274:31: warning: unused variable 'ns' [-Wunused-variable]

Unverified Error/Warning (likely false positive, please contact us if interested):

drivers/accessibility/speakup/main.c:1290:26: sparse: sparse: obsolete array initializer, use C99 syntax
drivers/cxl/core/mbox.c:832:18: sparse: sparse: cast from non-scalar
drivers/cxl/core/mbox.c:832:18: sparse: sparse: cast to non-scalar
drivers/i2c/busses/i2c-qcom-geni.c:1028:28: sparse: sparse: symbol 'i2c_master_hub' was not declared. Should it be static?
drivers/media/platform/ti/davinci/vpif.c:483:20: sparse: sparse: cast from non-scalar
drivers/media/platform/ti/davinci/vpif.c:483:20: sparse: sparse: cast to non-scalar
fs/xfs/xfs_iomap.c:86:29: sparse: sparse: symbol 'xfs_iomap_page_ops' was not declared. Should it be static?

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arc-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arm-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arm-buildonly-randconfig-r005-20221219
|   `-- arch-arm-kernel-entry-armv.S:warning:CONFIG_ARM_THUMB-is-not-defined-evaluates-to
|-- arm64-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- i386-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- i386-buildonly-randconfig-r001-20221219
|   `-- mm-memfd.c:warning:unused-variable-ns
|-- ia64-allmodconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- loongarch-allyesconfig
|   `-- arch-loongarch-kernel-asm-offsets.c:warning:no-previous-prototype-for-output_pbe_defines
|-- loongarch-randconfig-s051-20221218
|   |-- drivers-i2c-busses-i2c-qcom-geni.c:sparse:sparse:symbol-i2c_master_hub-was-not-declared.-Should-it-be-static
|   `-- fs-xfs-xfs_iomap.c:sparse:sparse:symbol-xfs_iomap_page_ops-was-not-declared.-Should-it-be-static
|-- m68k-allmodconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- m68k-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- mips-allyesconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- powerpc-allmodconfig
|   |-- arch-powerpc-kernel-kvm_emul.o:warning:objtool:kvm_template_end():can-t-find-starting-instruction
|   |-- arch-powerpc-kernel-optprobes_head.o:warning:objtool:optprobe_template_end():can-t-find-starting-instruction
|   |-- drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- riscv-randconfig-s041-20221218
|   |-- drivers-accessibility-speakup-main.c:sparse:sparse:obsolete-array-initializer-use-C99-syntax
|   `-- fs-xfs-xfs_iomap.c:sparse:sparse:symbol-xfs_iomap_page_ops-was-not-declared.-Should-it-be-static
|-- riscv-randconfig-s042-20221218
|   |-- drivers-cxl-core-mbox.c:sparse:sparse:cast-from-non-scalar
|   |-- drivers-cxl-core-mbox.c:sparse:sparse:cast-to-non-scalar
|   |-- drivers-net-thunderbolt.c:sparse:sparse:incorrect-type-in-assignment-(different-base-types)-expected-restricted-__le16-usertype-frame_id-got-unsigned-short-usertype
|   |-- drivers-net-thunderbolt.c:sparse:sparse:incorrect-type-in-assignment-(different-base-types)-expected-restricted-__le16-usertype-frame_index-got-unsigned-short-usertype
|   |-- drivers-net-thunderbolt.c:sparse:sparse:incorrect-type-in-assignment-(different-base-types)-expected-restricted-__le32-usertype-frame_count-got-unsigned-int-usertype
clang_recent_errors
|-- hexagon-allmodconfig
|   |-- drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|   `-- lib-dhry_run.c:warning:variable-ret-is-used-uninitialized-whenever-if-condition-is-false
`-- x86_64-rhel-8.3-rust
    `-- vmlinux.o:warning:objtool:___ksymtab_gpl-_RNvNtCsfATHBUcknU9_6kernel5print16call_printk_cont:data-relocation-to-ENDBR:_RNvNtCsfATHBUcknU9_6kernel5print16call_printk_cont

elapsed time: 726m

configs tested: 66
configs skipped: 2

gcc tested configs:
um                             i386_defconfig
um                           x86_64_defconfig
powerpc                           allnoconfig
arc                                 defconfig
x86_64                    rhel-8.3-kselftests
s390                             allmodconfig
x86_64                          rhel-8.3-func
alpha                               defconfig
i386                                defconfig
s390                                defconfig
arm                                 defconfig
sh                               allmodconfig
s390                             allyesconfig
x86_64               randconfig-a002-20221219
x86_64               randconfig-a003-20221219
alpha                            allyesconfig
x86_64               randconfig-a001-20221219
m68k                             allyesconfig
x86_64               randconfig-a004-20221219
mips                             allyesconfig
m68k                             allmodconfig
powerpc                          allmodconfig
arc                              allyesconfig
x86_64               randconfig-a005-20221219
arc                  randconfig-r043-20221220
x86_64                           rhel-8.3-bpf
x86_64               randconfig-a006-20221219
x86_64                           rhel-8.3-syz
riscv                randconfig-r042-20221220
x86_64                         rhel-8.3-kunit
ia64                             allmodconfig
x86_64                            allnoconfig
arm                              allyesconfig
x86_64                           rhel-8.3-kvm
arm64                            allyesconfig
s390                 randconfig-r044-20221220
i386                             allyesconfig
i386                 randconfig-a001-20221219
i386                 randconfig-a003-20221219
i386                 randconfig-a002-20221219
i386                 randconfig-a006-20221219
i386                 randconfig-a005-20221219
i386                 randconfig-a004-20221219
powerpc                     ep8248e_defconfig
powerpc                     rainier_defconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                           allyesconfig

clang tested configs:
x86_64                          rhel-8.3-rust
hexagon              randconfig-r041-20221220
arm                  randconfig-r046-20221220
i386                 randconfig-a011-20221219
i386                 randconfig-a014-20221219
hexagon              randconfig-r045-20221220
i386                 randconfig-a012-20221219
i386                 randconfig-a013-20221219
i386                 randconfig-a015-20221219
i386                 randconfig-a016-20221219
x86_64               randconfig-a014-20221219
x86_64               randconfig-a015-20221219
x86_64               randconfig-a012-20221219
x86_64               randconfig-a011-20221219
arm                             mxs_defconfig
x86_64               randconfig-a016-20221219
powerpc                     ppa8548_defconfig
x86_64               randconfig-a013-20221219

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH] wifi: rtl8xxxu: fixing transmisison failure for rtl8192eu
From: Jun ASAKA @ 2022-12-20 13:48 UTC (permalink / raw)
  To: Bitterblue Smith, Ping-Ke Shih, Jes.Sorensen@gmail.com
  Cc: kvalo@kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <33b2b585-c5b1-5888-bcee-ca74ce809a44@gmail.com>

On 20/12/2022 21:03, Bitterblue Smith wrote:

> On 20/12/2022 07:44, Ping-Ke Shih wrote:
>>
>>> -----Original Message-----
>>> From: Jun ASAKA <JunASAKA@zzy040330.moe>
>>> Sent: Saturday, December 17, 2022 11:07 AM
>>> To: Jes.Sorensen@gmail.com
>>> Cc: kvalo@kernel.org; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
>>> linux-wireless@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jun ASAKA
>>> <JunASAKA@zzy040330.moe>
>>> Subject: [PATCH] wifi: rtl8xxxu: fixing transmisison failure for rtl8192eu
>>>
>>> Fixing transmission failure which results in
>>> "authentication with ... timed out". This can be
>>> fixed by disable the REG_TXPAUSE.
>>>
>>> Signed-off-by: Jun ASAKA <JunASAKA@zzy040330.moe>
>>> ---
>>>   drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8192e.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8192e.c
>>> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8192e.c
>>> index a7d76693c02d..9d0ed6760cb6 100644
>>> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8192e.c
>>> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8192e.c
>>> @@ -1744,6 +1744,11 @@ static void rtl8192e_enable_rf(struct rtl8xxxu_priv *priv)
>>>   	val8 = rtl8xxxu_read8(priv, REG_PAD_CTRL1);
>>>   	val8 &= ~BIT(0);
>>>   	rtl8xxxu_write8(priv, REG_PAD_CTRL1, val8);
>>> +
>>> +	/*
>>> +	 * Fix transmission failure of rtl8192e.
>>> +	 */
>>> +	rtl8xxxu_write8(priv, REG_TXPAUSE, 0x00);
>> I trace when rtl8xxxu set REG_TXPAUSE=0xff that will stop TX.
>> The occasions include RF calibration, LPS mode (called by power off), and
>> going to stop. So, I think RF calibration does TX pause but not restore
>> settings after calibration, and causes TX stuck. As the flow I traced,
>> this patch looks reasonable. But, I wonder why other people don't meet
>> this problem.
>>
> Other people have this problem too:
> https://bugzilla.kernel.org/show_bug.cgi?id=196769
> https://bugzilla.kernel.org/show_bug.cgi?id=216746
Actually, one of the two bug was issued by me. Also, my friend who is 
using a TP-Link rtl8192eu device said that his device doesn't work as well.
>
> The RF calibration does restore REG_TXPAUSE at the end. What happens is
> when you plug in the device, something (mac80211? wpa_supplicant?) calls
> rtl8xxxu_start(), then rtl8xxxu_stop(), then rtl8xxxu_start() again.
> rtl8xxxu_stop() sets REG_TXPAUSE to 0xff and nothing sets it back to 0.

^ permalink raw reply

* [PATCH] nfc:  Fix potential resource leaks
From: Miaoqian Lin @ 2022-12-20 13:46 UTC (permalink / raw)
  To: Krzysztof Kozlowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Samuel Ortiz, Christophe Ricard,
	netdev, linux-kernel
  Cc: linmq006

nfc_get_device() take reference for the device, add missing
nfc_put_device() to release it when not need anymore.
Also fix the style warnning by use error EOPNOTSUPP instead of
ENOTSUPP.

Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation")
Fixes: 29e76924cf08 ("nfc: netlink: Add capability to reply to vendor_cmd with data")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
---
 net/nfc/netlink.c | 51 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
index 9d91087b9399..d081beaf4828 100644
--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -1497,6 +1497,7 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info)
 	u32 dev_idx, se_idx;
 	u8 *apdu;
 	size_t apdu_len;
+	int error;
 
 	if (!info->attrs[NFC_ATTR_DEVICE_INDEX] ||
 	    !info->attrs[NFC_ATTR_SE_INDEX] ||
@@ -1510,25 +1511,37 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info)
 	if (!dev)
 		return -ENODEV;
 
-	if (!dev->ops || !dev->ops->se_io)
-		return -ENOTSUPP;
+	if (!dev->ops || !dev->ops->se_io) {
+		error = -EOPNOTSUPP;
+		goto put_dev;
+	}
 
 	apdu_len = nla_len(info->attrs[NFC_ATTR_SE_APDU]);
-	if (apdu_len == 0)
-		return -EINVAL;
+	if (apdu_len == 0) {
+		error = -EINVAL;
+		goto put_dev;
+	}
 
 	apdu = nla_data(info->attrs[NFC_ATTR_SE_APDU]);
-	if (!apdu)
-		return -EINVAL;
+	if (!apdu) {
+		error = -EINVAL;
+		goto put_dev;
+	}
 
 	ctx = kzalloc(sizeof(struct se_io_ctx), GFP_KERNEL);
-	if (!ctx)
-		return -ENOMEM;
+	if (!ctx) {
+		error = -ENOMEM;
+		goto put_dev;
+	}
 
 	ctx->dev_idx = dev_idx;
 	ctx->se_idx = se_idx;
 
-	return nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx);
+	error = nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx);
+
+put_dev:
+	nfc_put_device(dev);
+	return error;
 }
 
 static int nfc_genl_vendor_cmd(struct sk_buff *skb,
@@ -1551,14 +1564,20 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb,
 	subcmd = nla_get_u32(info->attrs[NFC_ATTR_VENDOR_SUBCMD]);
 
 	dev = nfc_get_device(dev_idx);
-	if (!dev || !dev->vendor_cmds || !dev->n_vendor_cmds)
+	if (!dev)
 		return -ENODEV;
+	if (!dev->vendor_cmds || !dev->n_vendor_cmds) {
+		err = -ENODEV;
+		goto put_dev;
+	}
 
 	if (info->attrs[NFC_ATTR_VENDOR_DATA]) {
 		data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]);
 		data_len = nla_len(info->attrs[NFC_ATTR_VENDOR_DATA]);
-		if (data_len == 0)
-			return -EINVAL;
+		if (data_len == 0) {
+			err = -EINVAL;
+			goto put_dev;
+		}
 	} else {
 		data = NULL;
 		data_len = 0;
@@ -1573,10 +1592,14 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb,
 		dev->cur_cmd_info = info;
 		err = cmd->doit(dev, data, data_len);
 		dev->cur_cmd_info = NULL;
-		return err;
+		goto put_dev;
 	}
 
-	return -EOPNOTSUPP;
+	err = -EOPNOTSUPP;
+
+put_dev:
+	nfc_put_device(dev);
+	return err;
 }
 
 /* message building helper */
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH iproute2] dcb: unblock mnl_socket_recvfrom if not message received
From: Ido Schimmel @ 2022-12-20 13:34 UTC (permalink / raw)
  To: Hao Lan
  Cc: lipeng321, shenjian15, huangguangbin2, chenjunxin1, netdev,
	dsahern, stephen, davem, edumazet, kuba, pabeni, petrm
In-Reply-To: <20221019012008.11322-1-lanhao@huawei.com>

On Wed, Oct 19, 2022 at 09:20:08AM +0800, Hao Lan wrote:
> From: Junxin Chen <chenjunxin1@huawei.com>
> 
> Currently, the dcb command sinks to the kernel through the netlink
> to obtain information. However, if the kernel fails to obtain infor-
> mation or is not processed, the dcb command is suspended.
> 
> For example, if we don't implement dcbnl_ops->ieee_getpfc in the
> kernel, the command "dcb pfc show dev eth1" will be stuck and subsequent
> commands cannot be executed.
> 
> This patch adds the NLM_F_ACK flag to the netlink in mnlu_msg_prepare
> to ensure that the kernel responds to user requests.

The analysis is not correct: The kernel does reply, but the reply does not
contain the 'DCB_ATTR_IEEE_PFC' attribute, causing the dcb utility to block on
recvmsg(). Since you changed the utility to request an ACK you need to make
sure this ACK is processed before issuing another request. Please test the
following patch. I would like to post it tomorrow.

Thanks

commit 7b545308a2273a7fd26204688fa632ec1b4c0205
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Dec 20 14:27:46 2022 +0200

    dcb: Do not leave ACKs in socket receive buffer
    
    Originally, the dcb utility only stopped receiving messages from a
    socket when it found the attribute it was looking for. Cited commit
    changed that, so that the utility will also stop when seeing an ACK
    (NLMSG_ERROR message), by setting the NLM_F_ACK flag on requests.
    
    This is problematic because it means a successful request will leave an
    ACK in the socket receive buffer, causing the next request to bail
    before reading its response.
    
    Fix that by not stopping when finding the required attribute in a
    response. Instead, stop on the subsequent ACK.
    
    Fixes: 84c036972659 ("dcb: unblock mnl_socket_recvfrom if not message received")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>

diff --git a/dcb/dcb.c b/dcb/dcb.c
index 3ffa91d64d0d..9b996abac529 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -72,7 +72,7 @@ static int dcb_get_attribute_attr_ieee_cb(const struct nlattr *attr, void *data)
 
 	ga->payload = mnl_attr_get_payload(attr);
 	ga->payload_len = mnl_attr_get_payload_len(attr);
-	return MNL_CB_STOP;
+	return MNL_CB_OK;
 }
 
 static int dcb_get_attribute_attr_cb(const struct nlattr *attr, void *data)
@@ -126,7 +126,7 @@ static int dcb_set_attribute_attr_cb(const struct nlattr *attr, void *data)
 		return MNL_CB_ERROR;
 	}
 
-	return MNL_CB_STOP;
+	return MNL_CB_OK;
 }
 
 static int dcb_set_attribute_cb(const struct nlmsghdr *nlh, void *data)

^ permalink raw reply related

* Re: [PATCH net-next v1 4/4] net: phy: mxl-gpy: disable interrupts on GPY215 by default
From: Andrew Lunn @ 2022-12-20 13:33 UTC (permalink / raw)
  To: Michael Walle
  Cc: Xu Liang, Heiner Kallweit, Russell King, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, netdev, linux-kernel, devicetree
In-Reply-To: <34dc81b01930e594ca4773ddb8c24160@walle.cc>

> > Yes, it is a valid point to do this check, but on its own i don't
> > think it is sufficient.
> 
> Care to elaborate a bit? E.g. what is the difference to the case
> the phy would have an interrupt described but no .config_intr()
> op.
> 
> > > > I think a better place for this test is in gpy_config_intr(), return
> > > > -EOPNOTSUPP. phy_enable_interrupts() failing should then cause
> > > > phy_request_interrupt() to use polling.
> > > 
> > > Which will then print a warning, which might be misleading.
> > > Or we disable the warning if -EOPNOTSUPP is returned?
> > 
> > Disabling the warning is the right thing to do.
> 
> There is more to this. .config_intr() is also used in
> phy_init_hw() and phy_drv_supports_irq(). The latter would
> still return true in our case. I'm not sure that is correct.
> 
> After trying your suggestion, I'm still in favor of somehow
> tell the phy core to force polling mode during probe() of the
> driver.

The problem is that the MAC can set the interrupt number after the PHY
probe has been called. e.g.

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c#L524

The interrupt needs to be set by the time the PHY is connected to the
MAC, which is often in the MAC open method, much later than the PHY
probe.

     Andrew

^ permalink raw reply

* Re: [PATCH v1] qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure
From: Michal Swiatkowski @ 2022-12-20 13:32 UTC (permalink / raw)
  To: Daniil Tatianin
  Cc: Shahed Shaikh, Manish Chopra, GR-Linux-NIC-Dev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20221220125649.1637829-1-d-tatianin@yandex-team.ru>

On Tue, Dec 20, 2022 at 03:56:49PM +0300, Daniil Tatianin wrote:
> adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
> case qlcnic_dcb_attach() would return an error, which always happens
> under OOM conditions. This would lead to use-after-free because both
> of the existing callers invoke qlcnic_dcb_get_info() on the obtained
> pointer, which is potentially freed at that point.
> 
> Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
> pointer at callsite.
> 
> Found by Linux Verification Center (linuxtesting.org) with the SVACE
> static analysis tool.
> 
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>

Please add Fix tag and net as target (net-next is close till the end of
this year)

> ---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 9 ++++++++-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h       | 5 ++---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c      | 9 ++++++++-
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c
> index dbb800769cb6..465f149d94d4 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c
> @@ -2505,7 +2505,14 @@ int qlcnic_83xx_init(struct qlcnic_adapter *adapter)
>  		goto disable_mbx_intr;
>  
>  	qlcnic_83xx_clear_function_resources(adapter);
> -	qlcnic_dcb_enable(adapter->dcb);
> +
> +	err = qlcnic_dcb_enable(adapter->dcb);
> +	if (err) {
> +		qlcnic_clear_dcb_ops(adapter->dcb);
> +		adapter->dcb = NULL;
> +		goto disable_mbx_intr;
> +	}

Maybe I miss sth but it looks like there can be memory leak.
For example if error in attach happen after allocating of dcb->cfg.
Isn't it better to call qlcnic_dcb_free instead of qlcnic_clear_dcb_ops?

> +
>  	qlcnic_83xx_initialize_nic(adapter, 1);
>  	qlcnic_dcb_get_info(adapter->dcb);
>  
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h
> index 7519773eaca6..e1460f9c38bf 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h
> @@ -112,9 +112,8 @@ static inline void qlcnic_dcb_init_dcbnl_ops(struct qlcnic_dcb *dcb)
>  		dcb->ops->init_dcbnl_ops(dcb);
>  }
>  
> -static inline void qlcnic_dcb_enable(struct qlcnic_dcb *dcb)
> +static inline int qlcnic_dcb_enable(struct qlcnic_dcb *dcb)
>  {
> -	if (dcb && qlcnic_dcb_attach(dcb))
> -		qlcnic_clear_dcb_ops(dcb);
> +	return dcb ? qlcnic_dcb_attach(dcb) : 0;
>  }
>  #endif
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 28476b982bab..36ba15fc9776 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -2599,7 +2599,14 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  			 "Device does not support MSI interrupts\n");
>  
>  	if (qlcnic_82xx_check(adapter)) {
> -		qlcnic_dcb_enable(adapter->dcb);
> +		err = qlcnic_dcb_enable(adapter->dcb);
> +		if (err) {
> +			qlcnic_clear_dcb_ops(adapter->dcb);
> +			adapter->dcb = NULL;
> +			dev_err(&pdev->dev, "Failed to enable DCB\n");
> +			goto err_out_free_hw;
> +		}
> +
>  		qlcnic_dcb_get_info(adapter->dcb);
>  		err = qlcnic_setup_intr(adapter);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply

* Re: [PATCH net-next v1 3/4] dt-bindings: net: phy: add MaxLinear GPY2xx bindings
From: Andrew Lunn @ 2022-12-20 13:21 UTC (permalink / raw)
  To: Michael Walle
  Cc: Krzysztof Kozlowski, Rob Herring, Xu Liang, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Krzysztof Kozlowski, netdev, linux-kernel,
	devicetree
In-Reply-To: <796a528b23aded95c1a647317c277b1f@walle.cc>

>  (2) Krzysztof pointed out that there is still the issue raised by
>      Rob, that the schemas haven't any compatible and cannot be
>      validated. I think that applies to all the network PHY bindings
>      in the tree right now. I don't know how to fix them.

i've been offline for a while, i sabotaged my own mail server...

You can always add an unneeded compatible, using the PHY devices ID:

      - pattern: "^ethernet-phy-id[a-f0-9]{4}\\.[a-f0-9]{4}$"
        description:
          If the PHY reports an incorrect ID (or none at all) then the
          compatible list may contain an entry with the correct PHY ID
          in the above form.
          The first group of digits is the 16 bit Phy Identifier 1
          register, this is the chip vendor OUI bits 3:18. The
          second group of digits is the Phy Identifier 2 register,
          this is the chip vendor OUI bits 19:24, followed by 10
          bits of a vendor specific ID.

It would be fine to do this in the example in the binding, but i would
add a comment something like:

"Compatible generally only needed to make DT lint tools work. Mostly
not needed for real DT descriptions"

Examples often get cut/paste without thinking, and we don't really
want the compatible used unless it is really needed.

This is however a bigger problem than just PHYs. It applies to any
device which can be enumerated on a bus, e.g. USB, PCI. So maybe this
limitation of the DT linting tools should be fixed once at a higher
level?

>  (3) The main problem with the broken interrupt handling of the PHY
>      is that it will disturb other devices on that interrupt line.
>      IOW if the interrupt line is shared the PHY should fall back
>      to polling mode. I haven't found anything in the interrupt
>      subsys to query if a line is shared and I guess it's also
>      conceptually impossible to do such a thing, because there
>      might be any driver probed at a later time which also uses
>      that line.
>      Rob had the idea to walk the device tree and determine if
>      a particular interrupt is used by other devices, too. If
>      feasable, this sounds like a good enough heuristic for our
>      problem. Although there might be some edge cases, like
>      DT overlays loaded at linux runtime (?!).

My humble opinion is that it is not worth the complexity for just one
PHY which should work in polling mode without problems. I think the
boolean property you propose is KISS and does what is needed.

	Andrew


^ permalink raw reply

* [PATCH v3 2/3] net: phy: make phy_enable_interrupts() non-static
From: Enguerrand de Ribaucourt @ 2022-12-20 13:19 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, woojung.huh, davem, UNGLinuxDriver,
	Enguerrand de Ribaucourt
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D408987FF@CHN-SV-EXMX02.mchp-main.com>

Currently, phy_disable_interrupts() allows to disable interrupts without
freeing them. However, the only exported function to re-enable them is
phy_request_interrupt(), which also requests them again. It should be
possible to re-enable interrupts without re-allocating them.

This is required for lan78xx.c where we want to enable/disable the phy
interrupts during lan78xx_link_status_change().

Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
---
 drivers/net/phy/phy.c | 3 ++-
 include/linux/phy.h   | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 33250da76466..4168cf54aa59 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1040,10 +1040,11 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
  * phy_enable_interrupts - Enable the interrupts from the PHY side
  * @phydev: target phy_device struct
  */
-static int phy_enable_interrupts(struct phy_device *phydev)
+int phy_enable_interrupts(struct phy_device *phydev)
 {
 	return phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED);
 }
+EXPORT_SYMBOL(phy_enable_interrupts);
 
 /**
  * phy_request_interrupt - request and enable interrupt for a PHY device
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 71eeb4e3b1fd..d2350f0dc566 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1715,6 +1715,7 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd);
 int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
 int phy_do_ioctl_running(struct net_device *dev, struct ifreq *ifr, int cmd);
 int phy_disable_interrupts(struct phy_device *phydev);
+int phy_enable_interrupts(struct phy_device *phydev);
 void phy_request_interrupt(struct phy_device *phydev);
 void phy_free_interrupt(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox