Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: dsa: mv88e6060: report max mtu 1536
From: Sergei Antonov @ 2022-08-11  8:23 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev, Florian Fainelli
In-Reply-To: <20220810193825.vq7rdgwx7xua5amj@skbuf>

On Wed, 10 Aug 2022 at 22:38, Vladimir Oltean <olteanv@gmail.com> wrote:
> > > The bug seems to have been introduced by commit 0abfd494deef ("net: dsa:
> > > use dedicated CPU port"), because, although before we'd be uselessly
> > > programming the port VLAN for a disabled port, now in doing so, we
> > > dereference a NULL pointer.
> >
> > The suggested fix with dsa_is_unused_port() works. I tested it on the
> > 'netdev/net.git' repo, see below. Should I submit it as a patch
> > (Fixes: 0abfd494deef)?
>
> Yes. See the section that talks about "git log -1 --pretty=fixes" in
> process/submitting-patches.rst for how the Fixes tag should actually
> look like.
>
> I thought about whether dsa_is_unused_port() is sufficient, since
> theoretically dsa_is_dsa_port() is also a possibility which isn't
> covered by the check. But I rechecked and it appears that the Marvell
> 6060 doesn't support cascade ports, so we should be fine with just that.

Great. I submitted a patch.

> > So I tested "dsa_is_unused_port()" and "switch@10" fixes with
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
> > What I did after system boot-up:
> >
> > ~ # dmesg | grep mv88
> > [    7.187296] mv88e6060 92000090.mdio--1-mii:10: switch Marvell 88E6060 (B0) detected
> > [    8.325712] mv88e6060 92000090.mdio--1-mii:10: switch Marvell 88E6060 (B0) detected
> > [    9.190299] mv88e6060 92000090.mdio--1-mii:10 lan2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
> >
> > ~ # ip a
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: eth0: <BROADCAST,MULTICAST> mtu 1504 qdisc noop qlen 1000
> >     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
>
> The DSA master is super odd for starting with an all-zero MAC address.
> What driver handles this part? Normally, drivers are expected to work
> with a MAC address provided by the firmware (of_get_mac_address or
> other, perhaps proprietary, means) and fall back to eth_random_addr()
> if that is missing.

eth0 is handled by the CONFIG_ARM_MOXART_ETHER driver. By the way, I
had to change some code in it to make it work, and I am going to
submit a patch or two later.
The driver does not know its MAC address initially. On my hardware it
is stored in a flash memory chip, so I assign it using "ip link set
..." either manually or from an /etc/init.d script. A solution with
early MAC assignment in the moxart_mac_probe() function is doable. Do
you think I should implement it?

> > 3: lan2@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop qlen 1000
> >     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
>
> Here DSA inherits the MAC address of the master. It does this by default
> in dsa_slave_create() -> eth_hw_addr_inherit(). If the OF node for the
> DSA port has its own MAC address, that will have priority over the MAC
> address of the master.

I quickly tried setting a MAC address in the moxart_mac_probe()
function and DSA did inherit it. Looks like the way to go.

> > ~ # ip link set dev eth0 address 00:90:e8:00:10:03 up
>
> This shouldn't be necessary, neither assigning a MAC address nor putting
> the master up, see Documentation/networking/dsa/configuration.rst, the
> master comes up automatically.

Yes, it indeed does. Thanks.

> > ~ # ip a add 192.168.127.254/24 dev lan2
> >
> > ~ # ip link set dev lan2 address 00:90:e8:00:10:03 up
> > [   56.383801] DSA: failed to set STP state 3 (-95)
>
> errno 95 is EOPNOTSUPP, we shouldn't warn here, I'll submit a patch for
> that.

Great, I'll review it.

> > Is it correct for eth0 and lan2@eth0 to have the same MAC?
>
> It is not wrong, it's a configuration that many deployed DSA systems use.

Is it also not wrong with several lanN@eth0 interfaces? I'm asking it
because I will probably need to support hardware with more than one
port on the 6060.

> > I could not make it work with different MACs.
>
> That is a problem, and I believe it's a problem with the DSA master driver.
> See, the reason it should work is this. Switch ports don't really have a
> MAC address, since they forward everything and not really terminate anything.
> The MAC address of a switch port is a software construct which means
> that software L3 termination interfaces (of which we have one per port)
> should accept packets with some known MAC DA, and drop the rest, and
> everything should be fine.
>
> There are multiple kinds of DSA tags, but 6060 uses a trailer, and this
> will not shift the 'real' MAC DA of packets compared to where the DSA
> master expects to see them. So if the MAC address of the DSA master is A,
> the MAC address of lan2 is B, and you ping lan2 from the outside world,
> the DSA master will see packets with a MAC DA of B.
>
> So the DSA master sees packets with a MAC DA different from its own
> dev->dev_addr (A) and thinks it's within its right to drop them. Except
> that it isn't, because we do the following to prevent it:
>
> (1) in case the DSA master supports IFF_UNICAST_FLT, we call dev_uc_add()
>     from dsa_slave_set_mac_address() and from dsa_slave_open(), and this
>     informs it of our address B.
> (2) in case it doesn't support IFF_UNICAST_FLT, we just call
>     dsa_master_set_promiscuity() and that should keep it promiscuous and
>     it should accept packets regardless of MAC DA (that's the definition).
>
> So you should run some tcpdump and ethtool -S on the DSA master and see
> whether it receives any packets or it drops them. It's possible that
> tcpdump makes packets be accepted, since it puts the interface in
> promiscuous mode.

When I tried to make it work with different MAC addresses, I used
Wireshark and saw that ARP packets did not reach the interface unless
they were broadcast. I might have been a configuration issue rather
than driver issue. Thanks for the explanation! But I will happily
stick to the common MAC address solution if it is not wrong.

^ permalink raw reply

* Re: [PATCH v7 1/4] vdpa: Add suspend operation
From: Michael S. Tsirkin @ 2022-08-11  8:27 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: kvm, linux-kernel, Jason Wang, virtualization, netdev, dinang,
	martinpo, Wu Zongyong, Piotr.Uminski, gautam.dawar, ecree.xilinx,
	martinh, Stefano Garzarella, pabloc, habetsm.xilinx, lvivier,
	Zhu Lingshan, tanuj.kamde, Longpeng, lulu, hanand, Parav Pandit,
	Si-Wei Liu, Eli Cohen, Xie Yongji, Zhang Min, Dan Carpenter,
	Christophe JAILLET
In-Reply-To: <20220810171512.2343333-2-eperezma@redhat.com>

On Wed, Aug 10, 2022 at 07:15:09PM +0200, Eugenio Pérez wrote:
> This operation is optional: It it's not implemented, backend feature bit
> will not be exposed.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Message-Id: <20220623160738.632852-2-eperezma@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

What is this message id doing here?

> ---
>  include/linux/vdpa.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 7b4a13d3bd91..d282f464d2f1 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -218,6 +218,9 @@ struct vdpa_map_file {
>   * @reset:			Reset device
>   *				@vdev: vdpa device
>   *				Returns integer: success (0) or error (< 0)
> + * @suspend:			Suspend or resume the device (optional)
> + *				@vdev: vdpa device
> + *				Returns integer: success (0) or error (< 0)
>   * @get_config_size:		Get the size of the configuration space includes
>   *				fields that are conditional on feature bits.
>   *				@vdev: vdpa device
> @@ -319,6 +322,7 @@ struct vdpa_config_ops {
>  	u8 (*get_status)(struct vdpa_device *vdev);
>  	void (*set_status)(struct vdpa_device *vdev, u8 status);
>  	int (*reset)(struct vdpa_device *vdev);
> +	int (*suspend)(struct vdpa_device *vdev);
>  	size_t (*get_config_size)(struct vdpa_device *vdev);
>  	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>  			   void *buf, unsigned int len);
> -- 
> 2.31.1


^ permalink raw reply

* Re: [PATCH] net: usb: qmi_wwan: Add support for Cinterion MV32
From: Bjørn Mork @ 2022-08-11  8:27 UTC (permalink / raw)
  To: Slark Xiao; +Cc: davem, edumazet, kuba, pabeni, netdev, linux-usb, linux-kernel
In-Reply-To: <20220810014521.9383-1-slark_xiao@163.com>

Slark Xiao <slark_xiao@163.com> writes:

> There are 2 models for MV32 serials. MV32-W-A is designed
> based on Qualcomm SDX62 chip, and MV32-W-B is designed based
> on Qualcomm SDX65 chip. So we use 2 different PID to separate it.
>
> Test evidence as below:
> T:  Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#=  3 Spd=480 MxCh= 0
> D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
> P:  Vendor=1e2d ProdID=00f3 Rev=05.04
> S:  Manufacturer=Cinterion
> S:  Product=Cinterion PID 0x00F3 USB Mobile Broadband
> S:  SerialNumber=d7b4be8d
> C:  #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
> I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
> I:  If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
> I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
> I:  If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
>
> T:  Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 10 Spd=480 MxCh= 0
> D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
> P:  Vendor=1e2d ProdID=00f4 Rev=05.04
> S:  Manufacturer=Cinterion
> S:  Product=Cinterion PID 0x00F4 USB Mobile Broadband
> S:  SerialNumber=d095087d
> C:  #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
> I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
> I:  If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
> I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
> I:  If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
>
> Signed-off-by: Slark Xiao <slark_xiao@163.com>

Thanks for answering all my questions so fast. This patch looks very
good to me

Acked-by: Bjørn Mork <bjorn@mork.no>


^ permalink raw reply

* Re: [PATCH v7 3/4] vhost-vdpa: uAPI to suspend the device
From: Michael S. Tsirkin @ 2022-08-11  8:29 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: kvm, linux-kernel, Jason Wang, virtualization, netdev, dinang,
	martinpo, Wu Zongyong, Piotr.Uminski, gautam.dawar, ecree.xilinx,
	martinh, Stefano Garzarella, pabloc, habetsm.xilinx, lvivier,
	Zhu Lingshan, tanuj.kamde, Longpeng, lulu, hanand, Parav Pandit,
	Si-Wei Liu, Eli Cohen, Xie Yongji, Zhang Min, Dan Carpenter,
	Christophe JAILLET
In-Reply-To: <20220810171512.2343333-4-eperezma@redhat.com>

On Wed, Aug 10, 2022 at 07:15:11PM +0200, Eugenio Pérez wrote:
> The ioctl adds support for suspending the device from userspace.
> 
> This is a must before getting virtqueue indexes (base) for live migration,
> since the device could modify them after userland gets them. There are
> individual ways to perform that action for some devices
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
> 
> After a successful return of the ioctl call the device must not process
> more virtqueue descriptors. The device can answer to read or writes of
> config fields as if it were not suspended. In particular, writing to
> "queue_enable" with a value of 1 will not make the device start
> processing buffers of the virtqueue.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Message-Id: <20220623160738.632852-4-eperezma@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

You are not supposed to include upstream maintainer's signoff
like this.

> ---
> v7: Delete argument to ioctl, unused
> ---
>  drivers/vhost/vdpa.c       | 19 +++++++++++++++++++
>  include/uapi/linux/vhost.h |  9 +++++++++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 3d636e192061..7fa671ac4bdf 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa *v, u32 __user *argp)
>  	return 0;
>  }
>  
> +/* After a successful return of ioctl the device must not process more
> + * virtqueue descriptors. The device can answer to read or writes of config
> + * fields as if it were not suspended. In particular, writing to "queue_enable"
> + * with a value of 1 will not make the device start processing buffers.
> + */
> +static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> +{
> +	struct vdpa_device *vdpa = v->vdpa;
> +	const struct vdpa_config_ops *ops = vdpa->config;
> +
> +	if (!ops->suspend)
> +		return -EOPNOTSUPP;
> +
> +	return ops->suspend(vdpa);
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>  				   void __user *argp)
>  {
> @@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>  	case VHOST_VDPA_GET_VQS_COUNT:
>  		r = vhost_vdpa_get_vqs_count(v, argp);
>  		break;
> +	case VHOST_VDPA_SUSPEND:
> +		r = vhost_vdpa_suspend(v);
> +		break;
>  	default:
>  		r = vhost_dev_ioctl(&v->vdev, cmd, argp);
>  		if (r == -ENOIOCTLCMD)
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index cab645d4a645..f9f115a7c75b 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -171,4 +171,13 @@
>  #define VHOST_VDPA_SET_GROUP_ASID	_IOW(VHOST_VIRTIO, 0x7C, \
>  					     struct vhost_vring_state)
>  
> +/* Suspend a device so it does not process virtqueue requests anymore
> + *
> + * After the return of ioctl the device must preserve all the necessary state
> + * (the virtqueue vring base plus the possible device specific states) that is
> + * required for restoring in the future. The device must not change its
> + * configuration after that point.
> + */
> +#define VHOST_VDPA_SUSPEND		_IO(VHOST_VIRTIO, 0x7D)
> +
>  #endif
> -- 
> 2.31.1


^ permalink raw reply

* Re: [PATCH] can: rx-offload: Break loop on queue full
From: Marc Kleine-Budde @ 2022-08-11  8:30 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Wolfgang Grandegger, linux-can, netdev, kernel
In-Reply-To: <20220810144536.389237-1-u.kleine-koenig@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]

On 10.08.2022 16:45:36, Uwe Kleine-König wrote:
> The following happend on an i.MX25 using flexcan with many packets on
> the bus:
> 
> The rx-offload queue reached a length more than skb_queue_len_max. So in
> can_rx_offload_offload_one() the drop variable was set to true which
> made the call to .mailbox_read() (here: flexcan_mailbox_read()) just
> return ERR_PTR(-ENOBUFS) (plus some irrelevant hardware interaction) and
> so can_rx_offload_offload_one() returned ERR_PTR(-ENOBUFS), too.
> 
> Now can_rx_offload_irq_offload_fifo() looks as follows:
> 
> 	while (1) {
> 		skb = can_rx_offload_offload_one(offload, 0);
> 		if (IS_ERR(skb))
> 			continue;
> 		...
> 	}
> 
> As the i.MX25 is a single core CPU while the rx-offload processing is
> active there is no thread to process packets from the offload queue and
> so it doesn't get shorter.
> 
> The result is a tight loop: can_rx_offload_offload_one() does nothing
> relevant and returns an error code and so
> can_rx_offload_irq_offload_fifo() calls can_rx_offload_offload_one()
> again.
> 
> To break that loop don't continue calling can_rx_offload_offload_one()
> after it reported an error.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
> Hello,
> 
> this patch just implements the obvious change to break the loop. I'm not
> 100% certain that there is no corner case where the break is wrong. The
> problem exists at least since v5.6, didn't go back further to check.
> 
> This fixes a hard hang on said i.MX25.

As Uwe suggested in an IRC conversation, the correct fix for the flexcan
driver is to return NULL if there is no CAN frame pending.

I'll send a -v2.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v3] nfp: fix use-after-free in area_cache_get()
From: Simon Horman @ 2022-08-11  8:34 UTC (permalink / raw)
  To: Yinjun Zhang
  Cc: Jialiang Wang, kuba@kernel.org, davem@davemloft.net,
	edumazet@google.com, pabeni@redhat.com, niejianglei2021@163.com,
	oss-drivers, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <DM6PR13MB3705E2B4925F8BA9B1482DB3FC649@DM6PR13MB3705.namprd13.prod.outlook.com>

On Thu, Aug 11, 2022 at 05:10:44AM +0100, Yinjun Zhang wrote:
> On Wed, 10 Aug 2022 15:30:57 +0800 Jialiang Wang wrote:
> > area_cache_get() is used to distribute cache->area and set cache->id,
> >  and if cache->id is not 0 and cache->area->kref refcount is 0, it will
> >  release the cache->area by nfp_cpp_area_release(). area_cache_get()
> >  set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().
> > 
> > But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
> >  is already set but the refcount is not increased as expected. At this
> >  time, calling the nfp_cpp_area_release() will cause use-after-free.
> > 
> > To avoid the use-after-free, set cache->id after area_init() and
> >  nfp_cpp_area_acquire() complete successfully.
> > 
> > Note: This vulnerability is triggerable by providing emulated device
> >  equipped with specified configuration.
> > 
> >  BUG: KASAN: use-after-free in nfp6000_area_init (/home/user/Kernel/v5.19
> > /x86_64/src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:7
> > 60)
> >   Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1
> > 
> >  Call Trace:
> >   <TASK>
> >  nfp6000_area_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > /ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
> >  area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884)
> > 
> >  Allocated by task 1:
> >  nfp_cpp_area_alloc_with_name
> > (/home/user/Kernel/v5.19/x86_64/src/drivers
> > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303)
> >  nfp_cpp_area_cache_add
> > (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802)
> >  nfp6000_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > /netronome/nfp/nfpcore/nfp6000_pcie.c:1230)
> >  nfp_cpp_from_operations
> > (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215)
> >  nfp_pci_probe (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > /netronome/nfp/nfp_main.c:744)
> > 
> >  Freed by task 1:
> >  kfree (/home/user/Kernel/v5.19/x86_64/src/mm/slub.c:4562)
> >  area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873)
> >  nfp_cpp_read (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > /netronome/nfp/nfpcore/nfp_cppcore.c:924
> > /home/user/Kernel/v5.19/x86_64
> > /src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973)
> >  nfp_cpp_readl (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > /netronome/nfp/nfpcore/nfp_cpplib.c:48)
> > 
> > Signed-off-by: Jialiang Wang <wangjialiang0806@163.com>
> 
> Thanks.
> Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com>

Acked-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply

* Re: [RFC 4/5] virtio: get desc id in order
From: Guo Zhi @ 2022-08-11  8:49 UTC (permalink / raw)
  To: jasowang
  Cc: eperezma, sgarzare, Michael Tsirkin, netdev, linux-kernel,
	kvm list, virtualization
In-Reply-To: <9d4c24de-f2cc-16a0-818a-16695946f3a3@redhat.com>



----- Original Message -----
> From: "jasowang" <jasowang@redhat.com>
> To: "Guo Zhi" <qtxuning1999@sjtu.edu.cn>, "eperezma" <eperezma@redhat.com>, "sgarzare" <sgarzare@redhat.com>, "Michael
> Tsirkin" <mst@redhat.com>
> Cc: "netdev" <netdev@vger.kernel.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "kvm list" <kvm@vger.kernel.org>,
> "virtualization" <virtualization@lists.linux-foundation.org>
> Sent: Tuesday, July 26, 2022 4:07:46 PM
> Subject: Re: [RFC 4/5] virtio: get desc id in order

> 在 2022/7/21 16:43, Guo Zhi 写道:
>> If in order feature negotiated, we can skip the used ring to get
>> buffer's desc id sequentially.
> 
> 
> Let's rename the patch to something like "in order support for virtio_ring"
> 
> 
>>
>> Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
>> ---
>>   drivers/virtio/virtio_ring.c | 37 ++++++++++++++++++++++++++++--------
>>   1 file changed, 29 insertions(+), 8 deletions(-)
> 
> 
> I don't see packed support in this patch, we need to implement that.
> 
> 
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index a5ec724c0..4d57a4edc 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -144,6 +144,9 @@ struct vring_virtqueue {
>>   			/* DMA address and size information */
>>   			dma_addr_t queue_dma_addr;
>>   			size_t queue_size_in_bytes;
>> +
>> +			/* In order feature batch begin here */
>> +			u16 next_batch_desc_begin;
>>   		} split;
>>   
>>   		/* Available for packed ring */
>> @@ -700,8 +703,10 @@ static void detach_buf_split(struct vring_virtqueue *vq,
>> unsigned int head,
>>   	}
>>   
>>   	vring_unmap_one_split(vq, i);
>> -	vq->split.desc_extra[i].next = vq->free_head;
>> -	vq->free_head = head;
>> +	if (!virtio_has_feature(vq->vq.vdev, VIRTIO_F_IN_ORDER)) {
>> +		vq->split.desc_extra[i].next = vq->free_head;
>> +		vq->free_head = head;
>> +	}
> 
> 
> Let's add a comment to explain why we don't need anything if in order is
> neogitated.
> 
> 
>>   
>>   	/* Plus final descriptor */
>>   	vq->vq.num_free++;
>> @@ -743,7 +748,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue
>> *_vq,
>>   {
>>   	struct vring_virtqueue *vq = to_vvq(_vq);
>>   	void *ret;
>> -	unsigned int i;
>> +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>> +	unsigned int i, j;
>>   	u16 last_used;
>>   
>>   	START_USE(vq);
>> @@ -762,11 +768,24 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue
>> *_vq,
>>   	/* Only get used array entries after they have been exposed by host. */
>>   	virtio_rmb(vq->weak_barriers);
>>   
>> -	last_used = (vq->last_used_idx & (vq->split.vring.num - 1));
>> -	i = virtio32_to_cpu(_vq->vdev,
>> -			vq->split.vring.used->ring[last_used].id);
>> -	*len = virtio32_to_cpu(_vq->vdev,
>> -			vq->split.vring.used->ring[last_used].len);
>> +	if (virtio_has_feature(_vq->vdev, VIRTIO_F_IN_ORDER)) {
>> +		/* Skip used ring and get used desc in order*/
>> +		i = vq->split.next_batch_desc_begin;
>> +		j = i;
>> +		while (vq->split.vring.desc[j].flags & nextflag)
> 
> 
> Let's don't depend on the descriptor ring which is under the control of
> the malicious hypervisor.
> 
> Let's use desc_extra that is not visible by the hypervisor. More can be
> seen in this commit:
> 
> 72b5e8958738 ("virtio-ring: store DMA metadata in desc_extra for split
> virtqueue")
> 
> 
>> +			j = (j + 1) % vq->split.vring.num;
>> +		/* move to next */
>> +		j = (j + 1) % vq->split.vring.num;
>> +		vq->split.next_batch_desc_begin = j;
> 
> 
> I'm not sure I get the logic here, basically I think we should check
> buffer instead of descriptor here.

I's sorry I don't understand this comment.
In order means device use descriptors in the same order as they been available.
So we should iterate the descriptor table and calculte the next desc which will be used,
because we don't use used ring now.

> 
> So if vring.used->ring[last_used].id != last_used, we know all
> [last_used, vring.used->ring[last_used].id] have been used in a batch?
> 

We don't use used ring for in order feature.
N descriptors in descriptor table from vq->split.next_batch_desc_begin have been used.
N is vq->split.vring.used->idx - vq->last_used_idx (haven't consider ring problem for short).

> 
>> +
>> +		/* TODO: len of buffer */
> 
> 
> So spec said:
> 
> "
> 
> The skipped buffers (for which no used ring entry was written) are
> assumed to have been used (read or written) by the device completely.
> 
> 
> "
> 
> Thanks
> 
> 
>> +	} else {
>> +		last_used = (vq->last_used_idx & (vq->split.vring.num - 1));
>> +		i = virtio32_to_cpu(_vq->vdev,
>> +				    vq->split.vring.used->ring[last_used].id);
>> +		*len = virtio32_to_cpu(_vq->vdev,
>> +				       vq->split.vring.used->ring[last_used].len);
>> +	}
>>   
>>   	if (unlikely(i >= vq->split.vring.num)) {
>>   		BAD_RING(vq, "id %u out of range\n", i);
>> @@ -2234,6 +2253,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int
>> index,
>>   	vq->split.avail_flags_shadow = 0;
>>   	vq->split.avail_idx_shadow = 0;
>>   
>> +	vq->split.next_batch_desc_begin = 0;
>> +
>>   	/* No callback?  Tell other side not to bother us. */
>>   	if (!callback) {
>>   		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;

^ permalink raw reply

* Re: [PATCH v5 3/4] drivers/net/virtio_net: Added RSS hash report.
From: Michael S. Tsirkin @ 2022-08-11  8:53 UTC (permalink / raw)
  To: Andrew Melnychenko
  Cc: netdev, linux-kernel, virtualization, kuba, davem, jasowang, yan,
	yuri.benditovich
In-Reply-To: <20220328175336.10802-4-andrew@daynix.com>

On Mon, Mar 28, 2022 at 08:53:35PM +0300, Andrew Melnychenko wrote:
> +static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash *hdr_hash,
> +				struct sk_buff *skb)
> +{
> +	enum pkt_hash_types rss_hash_type;
> +
> +	if (!hdr_hash || !skb)
> +		return;
> +
> +	switch ((int)hdr_hash->hash_report) {
> +	case VIRTIO_NET_HASH_REPORT_TCPv4:
> +	case VIRTIO_NET_HASH_REPORT_UDPv4:
> +	case VIRTIO_NET_HASH_REPORT_TCPv6:
> +	case VIRTIO_NET_HASH_REPORT_UDPv6:
> +	case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
> +	case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
> +		rss_hash_type = PKT_HASH_TYPE_L4;
> +		break;
> +	case VIRTIO_NET_HASH_REPORT_IPv4:
> +	case VIRTIO_NET_HASH_REPORT_IPv6:
> +	case VIRTIO_NET_HASH_REPORT_IPv6_EX:
> +		rss_hash_type = PKT_HASH_TYPE_L3;
> +		break;
> +	case VIRTIO_NET_HASH_REPORT_NONE:
> +	default:
> +		rss_hash_type = PKT_HASH_TYPE_NONE;
> +	}
> +	skb_set_hash(skb, (unsigned int)hdr_hash->hash_value, rss_hash_type);
> +}
> +
>  static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  			void *buf, unsigned int len, void **ctx,
>  			unsigned int *xdp_xmit,

I just noticed this- endian-ness broken here, you need _le16_to_cpu for
hash_report and __le_32_to_cpu for hash_value.

-- 
MST


^ permalink raw reply

* Re: igc: missing HW timestamps at TX
From: Ferenc Fejes @ 2022-08-11  8:54 UTC (permalink / raw)
  To: vinicius.gomes@intel.com, netdev@vger.kernel.org
  Cc: marton12050@gmail.com, peti.antal99@gmail.com
In-Reply-To: <695ec13e018d1111cf3e16a309069a72d55ea70e.camel@ericsson.com>

Hi Vinicius!

On Tue, 2022-07-19 at 09:40 +0200, Fejes Ferenc wrote:
> Hi Vinicius!
> 
> On Mon, 2022-07-18 at 11:46 -0300, Vinicius Costa Gomes wrote:
> > Hi Ferenc,
> > 
> > Ferenc Fejes <ferenc.fejes@ericsson.com> writes:
> > 
> > > (Ctrl+Enter'd by mistake)
> > > 
> > > My question here: is there anything I can quickly try to avoid
> > > that
> > > behavior? Even when I send only a few (like 10) packets but on
> > > fast
> > > rate (5us between packets) I get missing TX HW timestamps. The
> > > receive
> > > side looks much more roboust, I cannot noticed missing HW
> > > timestamps
> > > there.
> > 
> > There's a limitation in the i225/i226 in the number of "in flight"
> > TX
> > timestamps they are able to handle. The hardware has 4 sets of
> > registers
> > to handle timestamps.
> > 
> > There's an aditional issue that the driver as it is right now, only
> > uses
> > one set of those registers.
> > 
> > I have one only briefly tested series that enables the driver to
> > use
> > the
> > full set of TX timestamp registers. Another reason that it was not
> > proposed yet is that I still have to benchmark it and see what is
> > the
> > performance impact.
> 
> Thank you for the quick reply! I'm glad you already have this series
> right off the bat. I'll be back when we done with a quick testing,
> hopefully sooner than later.

Sorry for the late reply. I had time for a few tests, with the patch.
For my tests it looks much better. I send a packet in every 500us with
isochron-send, TX SW and HW timestamping enabled and for 10000 packets
I see zero lost timestamp. Even for 100000 packets only a few dropped
HW timestamps visible.

With iperf TCP test line-rate achiveable just like without the patch.

> > 
> > If you are feeling adventurous and feel like helping test it, here
> > is
> > the link:
> > 
> > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
> > multiple-tstamp-timers-lock-new
> > 

Is there any test in partucular you interested in? My testbed is
configured so I can do some.

> > 
> > Cheers,
> 
> Best,
> Ferenc

Best,
Ferenc


^ permalink raw reply

* Re: [RFC 1/5] vhost: reorder used descriptors in a batch
From: Guo Zhi @ 2022-08-11  8:58 UTC (permalink / raw)
  To: jasowang
  Cc: eperezma, sgarzare, Michael Tsirkin, netdev, linux-kernel,
	kvm list, virtualization
In-Reply-To: <CACGkMEt4GzC7t0qqc2SgUWDRB9Amr+XDKiYOKmogrOyfCBFwvA@mail.gmail.com>



----- Original Message -----
> From: "jasowang" <jasowang@redhat.com>
> To: "Guo Zhi" <qtxuning1999@sjtu.edu.cn>
> Cc: "eperezma" <eperezma@redhat.com>, "sgarzare" <sgarzare@redhat.com>, "Michael Tsirkin" <mst@redhat.com>, "netdev"
> <netdev@vger.kernel.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "kvm list" <kvm@vger.kernel.org>,
> "virtualization" <virtualization@lists.linux-foundation.org>
> Sent: Thursday, August 4, 2022 1:04:16 PM
> Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch

> On Tue, Aug 2, 2022 at 10:12 PM Guo Zhi <qtxuning1999@sjtu.edu.cn> wrote:
>>
>>
>>
>> ----- Original Message -----
>> > From: "jasowang" <jasowang@redhat.com>
>> > To: "Guo Zhi" <qtxuning1999@sjtu.edu.cn>
>> > Cc: "eperezma" <eperezma@redhat.com>, "sgarzare" <sgarzare@redhat.com>, "Michael
>> > Tsirkin" <mst@redhat.com>, "netdev"
>> > <netdev@vger.kernel.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "kvm
>> > list" <kvm@vger.kernel.org>,
>> > "virtualization" <virtualization@lists.linux-foundation.org>
>> > Sent: Friday, July 29, 2022 3:32:02 PM
>> > Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch
>>
>> > On Thu, Jul 28, 2022 at 4:26 PM Guo Zhi <qtxuning1999@sjtu.edu.cn> wrote:
>> >>
>> >> On 2022/7/26 15:36, Jason Wang wrote:
>> >>
>> >>
>> >> 在 2022/7/21 16:43, Guo Zhi 写道:
>> >>
>> >> Device may not use descriptors in order, for example, NIC and SCSI may
>> >> not call __vhost_add_used_n with buffers in order.  It's the task of
>> >> __vhost_add_used_n to order them.
>> >>
>> >>
>> >>
>> >> I'm not sure this is ture. Having ooo descriptors is probably by design to have
>> >> better performance.
>> >>
>> >> This might be obvious for device that may have elevator or QOS stuffs.
>> >>
>> >> I suspect the right thing to do here is, for the device that can't perform
>> >> better in the case of IN_ORDER, let's simply not offer IN_ORDER (zerocopy or
>> >> scsi). And for the device we know it can perform better, non-zercopy ethernet
>> >> device we can do that.
>> >>
>> >>
>> >>   This commit reorder the buffers using
>> >> vq->heads, only the batch is begin from the expected start point and is
>> >> continuous can the batch be exposed to driver.  And only writing out a
>> >> single used ring for a batch of descriptors, according to VIRTIO 1.1
>> >> spec.
>> >>
>> >>
>> >>
>> >> So this sounds more like a "workaround" of the device that can't consume buffer
>> >> in order, I suspect it can help in performance.
>> >>
>> >> More below.
>> >>
>> >>
>> >>
>> >> Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
>> >> ---
>> >>   drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++++++++++++++++--
>> >>   drivers/vhost/vhost.h |  3 +++
>> >>   2 files changed, 45 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> >> index 40097826c..e2e77e29f 100644
>> >> --- a/drivers/vhost/vhost.c
>> >> +++ b/drivers/vhost/vhost.c
>> >> @@ -317,6 +317,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>> >>       vq->used_flags = 0;
>> >>       vq->log_used = false;
>> >>       vq->log_addr = -1ull;
>> >> +    vq->next_used_head_idx = 0;
>> >>       vq->private_data = NULL;
>> >>       vq->acked_features = 0;
>> >>       vq->acked_backend_features = 0;
>> >> @@ -398,6 +399,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
>> >>                         GFP_KERNEL);
>> >>           if (!vq->indirect || !vq->log || !vq->heads)
>> >>               goto err_nomem;
>> >> +
>> >> +        memset(vq->heads, 0, sizeof(*vq->heads) * dev->iov_limit);
>> >>       }
>> >>       return 0;
>> >>   @@ -2374,12 +2377,49 @@ static int __vhost_add_used_n(struct vhost_virtqueue
>> >>   *vq,
>> >>                   unsigned count)
>> >>   {
>> >>       vring_used_elem_t __user *used;
>> >> +    struct vring_desc desc;
>> >>       u16 old, new;
>> >>       int start;
>> >> +    int begin, end, i;
>> >> +    int copy_n = count;
>> >> +
>> >> +    if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> >>
>> >>
>> >>
>> >> How do you guarantee that ids of heads are contiguous?
>> >>
>> >> There is no need to be contiguous for ids of heads.
>> >>
>> >> For example, I have three buffer { .id = 0, 15}, {.id = 20, 30} {.id = 15, 20}
>> >> for vhost_add_used_n. Then I will let the vq->heads[0].len=15.
>> >> vq->heads[15].len=5, vq->heads[20].len=10 as reorder. Once I found there is no
>> >> hold in the batched descriptors. I will expose them to driver.
>> >
>> > So spec said:
>> >
>> > "If VIRTIO_F_IN_ORDER has been negotiated, driver uses descriptors in
>> > ring order: starting from offset 0 in the table, and wrapping around
>> > at the end of the table."
>> >
>> > And
>> >
>> > "VIRTIO_F_IN_ORDER(35)This feature indicates that all buffers are used
>> > by the device in the same order in which they have been made
>> > available."
>> >
>> > This means your example is not an IN_ORDER device.
>> >
>> > The driver should submit buffers (assuming each buffer have one
>> > descriptor) in order {id = 0, 15}, {id = 1, 30} and {id = 2, 20}.
>> >
>> > And even if it is submitted in order, we can not use a batch because:
>> >
>> > "The skipped buffers (for which no used ring entry was written) are
>> > assumed to have been used (read or written) by the device completely."
>> >
>> > This means for TX we are probably ok, but for rx, unless we know the
>> > buffers were written completely, we can't write them in a batch.
>> >
>> > I'd suggest to do cross testing for this series:
>> >
>> > 1) testing vhost IN_ORDER support with DPDK virtio PMD
>> > 2) testing virtio IN_ORDER with DPDK vhost-user via testpmd
>> >
>> > Thanks
>> >
>> You are correct, for rx we can't do a batch because we have to let the driver
>> know the length of buffers.
> 
> Note that we can do a batch for rx when we know all the buffers have
> been fully written.
> 
>>
>> I think these circumstances can offer batch:
>> 1. tx
>> 2. rx with RX_MRGBUF feature, which introduce a header for each received buffer
>>
>> Consider batch is not a mandatory requirement for in order feature according to
>> spec.
>> I'd like to let current RFC patch focus on in order implementation, and send
>> another
>> patch series to improve performance by batching on above circumstances.
> 
> That's fine, how about simply starting from the patch that offers
> IN_ORDER when zerocopy is disabled?
> 

Yeah, I'd like to start from vsock device, which doesn't use zerocopy

Thanks
> Thanks
> 
>>
>> What's your opinon.
>>
>> Thanks
>> >
>> >>
>> >>
>> >> +        /* calculate descriptor chain length for each used buffer */
>> >>
>> >>
>> >>
>> >> I'm a little bit confused about this comment, we have heads[i].len for this?
>> >>
>> >> Maybe I should not use vq->heads, some misleading.
>> >>
>> >>
>> >> +        for (i = 0; i < count; i++) {
>> >> +            begin = heads[i].id;
>> >> +            end = begin;
>> >> +            vq->heads[begin].len = 0;
>> >>
>> >>
>> >>
>> >> Does this work for e.g RX virtqueue?
>> >>
>> >>
>> >> +            do {
>> >> +                vq->heads[begin].len += 1;
>> >> +                if (unlikely(vhost_get_desc(vq, &desc, end))) {
>> >>
>> >>
>> >>
>> >> Let's try hard to avoid more userspace copy here, it's the source of performance
>> >> regression.
>> >>
>> >> Thanks
>> >>
>> >>
>> >> +                    vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
>> >> +                           end, vq->desc + end);
>> >> +                    return -EFAULT;
>> >> +                }
>> >> +            } while ((end = next_desc(vq, &desc)) != -1);
>> >> +        }
>> >> +
>> >> +        count = 0;
>> >> +        /* sort and batch continuous used ring entry */
>> >> +        while (vq->heads[vq->next_used_head_idx].len != 0) {
>> >> +            count++;
>> >> +            i = vq->next_used_head_idx;
>> >> +            vq->next_used_head_idx = (vq->next_used_head_idx +
>> >> +                          vq->heads[vq->next_used_head_idx].len)
>> >> +                          % vq->num;
>> >> +            vq->heads[i].len = 0;
>> >> +        }
>> >> +        /* only write out a single used ring entry with the id corresponding
>> >> +         * to the head entry of the descriptor chain describing the last buffer
>> >> +         * in the batch.
>> >> +         */
>> >> +        heads[0].id = i;
>> >> +        copy_n = 1;
>> >> +    }
>> >>         start = vq->last_used_idx & (vq->num - 1);
>> >>       used = vq->used->ring + start;
>> >> -    if (vhost_put_used(vq, heads, start, count)) {
>> >> +    if (vhost_put_used(vq, heads, start, copy_n)) {
>> >>           vq_err(vq, "Failed to write used");
>> >>           return -EFAULT;
>> >>       }
>> >> @@ -2410,7 +2450,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct
>> >> vring_used_elem *heads,
>> >>         start = vq->last_used_idx & (vq->num - 1);
>> >>       n = vq->num - start;
>> >> -    if (n < count) {
>> >> +    if (n < count && !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> >>           r = __vhost_add_used_n(vq, heads, n);
>> >>           if (r < 0)
>> >>               return r;
>> >> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> >> index d9109107a..7b2c0fbb5 100644
>> >> --- a/drivers/vhost/vhost.h
>> >> +++ b/drivers/vhost/vhost.h
>> >> @@ -107,6 +107,9 @@ struct vhost_virtqueue {
>> >>       bool log_used;
>> >>       u64 log_addr;
>> >>   +    /* Sort heads in order */
>> >> +    u16 next_used_head_idx;
>> >> +
>> >>       struct iovec iov[UIO_MAXIOV];
>> >>       struct iovec iotlb_iov[64];
>> >>       struct iovec *indirect;
>> >>
>> >>
>> >>
>>

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: atm: remove support for ZeitNet ZN122x ATM devices
From: Arnd Bergmann @ 2022-08-11  9:18 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Jakub Kicinski, Arnd Bergmann, davem, pabeni, netdev,
	Chas Williams, linux-atm-general, Thomas Bogendoerfer, linux-mips
In-Reply-To: <de3170f3-6035-21e4-8ca5-427ca878b3a4@kernel.org>

On Thu, Aug 11, 2022 at 7:19 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> On 10. 08. 22, 18:42, Jakub Kicinski wrote:
> > On Wed, 10 Aug 2022 11:11:32 +0200 Arnd Bergmann wrote:
> >>> This unfortunately breaks linux-atm:
> >>> zntune.c:18:10: fatal error: linux/atm_zatm.h: No such file or directory
> >>>
> >>> The source does also:
> >>> ioctl(s,ZATM_SETPOOL,&sioc)
> >>> ioctl(s,zero ? ZATM_GETPOOLZ : ZATM_GETPOOL,&sioc)
> >>> etc.
> >>>
> >>> So we should likely revert the below:
> >>
> >> I suppose there is no chance of also getting the linux-atm package updated
> >> to not include those source files, right? The last release I found on
> >> sourceforge
> >> is 12 years old, but maybe I was looking in the wrong place.
> >
> > Is linux-atm used for something remotely modern? PPPoA? Maybe it's
> > time to ditch it completely? I'll send the revert in any case.
>
> Sorry, I have no idea. openSUSE is just a provider of an rpm -- if there
> any users? Who knows...

I think in theory this is the subsystem that DSL drivers would use, but
there is only one driver for the "Solos ADSL2+".

OpenWRT used to support the TI AR7 platform (later owned by Infineon,
Lantiq, and Intel, now Maxlinear) with the "sangam-atm" driver for DSL,
but that driver was never available in mainline Linux and is now gone
from OpenWRT as well.

It appears that the later hardware that is still supported uses a custom
atm driver implementation rather than the in-kernel subsystem, using
a different set of ioctls:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=package/kernel/lantiq/ltq-atm/src

There are also DSL SoCs from (at least) Broadcom, Realtek, Mediatek and
Qualcomm, but no open source drivers, so I guess they probably all
use their own kernel subsystems.

       Arnd

^ permalink raw reply

* [PATCH 1/1] net: qrtr: start MHI channel after endpoit creation
From: Maxim Kochetkov @ 2022-08-11  9:21 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, linux-arm-msm, Maxim Kochetkov,
	Hemant Kumar, Manivannan Sadhasivam

MHI channel may generates event/interrupt right after enabling.
It may leads to 2 race conditions issues.

1)
Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:

	if (!qdev || mhi_res->transaction_status)
		return;

Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
this moment. In this situation qrtr-ns will be unable to enumerate
services in device.
---------------------------------------------------------------

2)
Such event may come at the moment after dev_set_drvdata() and
before qrtr_endpoint_register(). In this case kernel will panic with
accessing wrong pointer at qcom_mhi_qrtr_dl_callback():

	rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr,
				mhi_res->bytes_xferd);

Because endpoint is not created yet.
--------------------------------------------------------------
So move mhi_prepare_for_transfer_autoqueue after endpoint creation
to fix it.

Fixes: a2e2cc0dbb11 ("net: qrtr: Start MHI channels during init")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
---
 net/qrtr/mhi.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
index 18196e1c8c2f..17520d9e7a51 100644
--- a/net/qrtr/mhi.c
+++ b/net/qrtr/mhi.c
@@ -78,11 +78,6 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
 	struct qrtr_mhi_dev *qdev;
 	int rc;
 
-	/* start channels */
-	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
-	if (rc)
-		return rc;
-
 	qdev = devm_kzalloc(&mhi_dev->dev, sizeof(*qdev), GFP_KERNEL);
 	if (!qdev)
 		return -ENOMEM;
@@ -96,6 +91,11 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
 	if (rc)
 		return rc;
 
+	/* start channels */
+	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
+	if (rc)
+		return rc;
+
 	dev_dbg(qdev->dev, "Qualcomm MHI QRTR driver probed\n");
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH] fec: Restart PPS after link state change
From: Csókás Bence @ 2022-08-11  9:23 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Richard Cochran, Fugang Duan
In-Reply-To: <YvRH06S/7E6J8RY0@lunn.ch>


On 2022. 08. 11. 2:05, Andrew Lunn wrote:
>>>
>>>>    	/* Whack a reset.  We should wait for this.
>>>>    	 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
>>>> @@ -1119,6 +1120,13 @@ fec_restart(struct net_device *ndev)
>>>>    	if (fep->bufdesc_ex)
>>>>    		fec_ptp_start_cyclecounter(ndev);
>>>> +	/* Restart PPS if needed */
>>>> +	if (fep->pps_enable) {
>>>> +		/* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
>>>> +		fep->pps_enable = 0;
>>>
>>> If reset causes PPS to stop, maybe it would be better to do this
>>> unconditionally?
>>
>> But if it wasn't enabled before the reset in the first place, we wouldn't
>> want to unexpectedly start it.
> 
> We should decide what fep->pps_enable actually means. It should be
> enabled, or it is actually enabled? Then it becomes clear if the reset
> function should clear it to reflect the hardware, or if the
> fec_ptp_enable_pps() should not be looking at it, and needs to read
> the status from the hardware.
> 

`fep->pps_enable` is the state of the PPS the driver *believes* to be 
the case. After a reset, this belief may or may not be true anymore: if 
the driver believed formerly that the PPS is down, then after a reset, 
its belief will still be correct, thus nothing needs to be done about 
the situation. If, however, the driver thought that PPS was up, after 
controller reset, it no longer holds, so we need to update our 
world-view (`fep->pps_enable = 0;`), and then correct for the fact that 
PPS just unexpectedly stopped.

>>> Also, if it is always outputting, don't you need to stop it in
>>> fec_drv_remove(). You probably don't want to still going after the
>>> driver is unloaded.
>>
>> Good point, that is one exception we could make to the above statement
>> (though even in this case, userspace *really* should disable PPS before
>> unloading the module).
> 
> Never trust userspace. Ever.

Amen. Said issue is already fixed in the next version of the patch.

> 
>        Andrew

Bence

^ permalink raw reply

* Re: [PATCH net-next] net/smc: Introduce TCP ULP support
From: Tony Lu @ 2022-08-11  9:29 UTC (permalink / raw)
  To: Al Viro; +Cc: kgraul, kuba, davem, netdev, linux-s390, linux-rdma,
	linux-fsdevel
In-Reply-To: <Yus1SycZxcd+wHwz@ZenIV>

On Thu, Aug 04, 2022 at 03:56:11AM +0100, Al Viro wrote:
> 	Half a year too late, but then it hadn't been posted on fsdevel.
> Which it really should have been, due to
> 
> > +	/* replace tcp socket to smc */
> > +	smcsock->file = tcp->file;
> > +	smcsock->file->private_data = smcsock;
> > +	smcsock->file->f_inode = SOCK_INODE(smcsock); /* replace inode when sock_close */
> > +	smcsock->file->f_path.dentry->d_inode = SOCK_INODE(smcsock); /* dput() in __fput */
> > +	tcp->file = NULL;
> 
> this.  It violates a bunch of rather fundamental assertions about the
> data structures you are playing with, and I'm not even going into the
> lifetime and refcounting issues.
> 
> 	* ->d_inode of a busy positive dentry never changes while refcount
> of dentry remains positive.  A lot of places in VFS rely upon that.
> 	* ->f_inode of a file never changes, period.
> 	* ->private_data of a struct file associated with a socket never
> changes; it can be accessed lockless, with no precautions beyond "make sure
> that refcount of struct file will remain positive".
>
> PS: more than one thread could be calling methods of that struct socket at the
> same time; what's to stop e.g. connect(2) on the same sucker (e.g. called on
> the same descriptor from a different thread that happens to share the same
> descriptor table) to be sitting there trying to lock the struct sock currently
> held locked by caller of tcp_set_ulp()?

Sorry for the late reply.

SMC ULP tries to make original TCP sockets behave like SMC. The original
TCP sockets will belong to this new SMC socket, and it can only be
accessed in kernel with struct socket in SMC. The SMC and TCP sockets are
bonded together.

So this patch replaces the file of TCP to SMC socket which is allocated
in kernel. It is guaranteed that the TCP socket is always freed before
the newly replaced SMC socket.

There is an other approach to archive this by changing af_ops of sockets.
I will fix it without breaking the assertions.

Tony Lu

^ permalink raw reply

* [PATCH v2] fec: Restart PPS after link state change
From: Csókás Bence @ 2022-08-11  9:34 UTC (permalink / raw)
  To: netdev; +Cc: Richard Cochran, Csókás Bence

On link state change, the controller gets reset,
causing PPS to drop out. So we restart it if needed.

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
---
 drivers/net/ethernet/freescale/fec_main.c | 32 +++++++++++++++++++++--
 drivers/net/ethernet/freescale/fec_ptp.c  |  3 +++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 366c52b62d4b..546a152df4f4 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -257,6 +257,9 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #define FEC_MMFR_TA		(2 << 16)
 #define FEC_MMFR_DATA(v)	(v & 0xffff)
 /* FEC ECR bits definition */
+#define FEC_ECR_RESET   	(1 << 0)
+#define FEC_ECR_ETHEREN 	(1 << 1)
+#define FEC_ECR_EN1588  	(1 << 4)
 #define FEC_ECR_MAGICEN		(1 << 2)
 #define FEC_ECR_SLEEP		(1 << 3)
 
@@ -955,6 +958,7 @@ fec_restart(struct net_device *ndev)
 	u32 temp_mac[2];
 	u32 rcntl = OPT_FRAME_SIZE | 0x04;
 	u32 ecntl = 0x2; /* ETHEREN */
+	struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
 
 	/* Whack a reset.  We should wait for this.
 	 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
@@ -1106,7 +1110,7 @@ fec_restart(struct net_device *ndev)
 	}
 
 	if (fep->bufdesc_ex)
-		ecntl |= (1 << 4);
+		ecntl |= FEC_ECR_EN1588;
 
 #ifndef CONFIG_M5272
 	/* Enable the MIB statistic event counters */
@@ -1120,6 +1124,13 @@ fec_restart(struct net_device *ndev)
 	if (fep->bufdesc_ex)
 		fec_ptp_start_cyclecounter(ndev);
 
+	/* Restart PPS if needed */
+	if (fep->pps_enable) {
+		/* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+		fep->pps_enable = 0;
+		fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+	}
+
 	/* Enable interrupts we wish to service */
 	if (fep->link)
 		writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
@@ -1155,6 +1166,8 @@ fec_stop(struct net_device *ndev)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	u32 rmii_mode = readl(fep->hwp + FEC_R_CNTRL) & (1 << 8);
 	u32 val;
+	struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
+	u32 ecntl = 0;
 
 	/* We cannot expect a graceful transmit stop without link !!! */
 	if (fep->link) {
@@ -1185,12 +1198,27 @@ fec_stop(struct net_device *ndev)
 	}
 	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 
+	if (fep->bufdesc_ex)
+		ecntl |= FEC_ECR_EN1588;
+
 	/* We have to keep ENET enabled to have MII interrupt stay working */
 	if (fep->quirks & FEC_QUIRK_ENET_MAC &&
 		!(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
-		writel(2, fep->hwp + FEC_ECNTRL);
+		ecntl |= FEC_ECR_ETHEREN;
 		writel(rmii_mode, fep->hwp + FEC_R_CNTRL);
 	}
+
+	writel(ecntl, fep->hwp + FEC_ECNTRL);
+
+	if (fep->bufdesc_ex)
+		fec_ptp_start_cyclecounter(ndev);
+
+	/* Restart PPS if needed */
+	if (fep->pps_enable) {
+		/* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+		fep->pps_enable = 0;
+		fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+	}
 }
 
 
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index a5077eff305b..869d149efc53 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -628,6 +628,9 @@ void fec_ptp_stop(struct platform_device *pdev)
 	struct net_device *ndev = platform_get_drvdata(pdev);
 	struct fec_enet_private *fep = netdev_priv(ndev);
 
+	if (fep->pps_enable)
+		fec_ptp_enable_pps(fep, 0);
+
 	cancel_delayed_work_sync(&fep->time_keep);
 	if (fep->ptp_clock)
 		ptp_clock_unregister(fep->ptp_clock);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 1/1] net: qrtr: start MHI channel after endpoit creation
From: Manivannan Sadhasivam @ 2022-08-11  9:37 UTC (permalink / raw)
  To: Maxim Kochetkov
  Cc: netdev, davem, edumazet, kuba, pabeni, linux-arm-msm,
	Hemant Kumar
In-Reply-To: <20220811092145.1648008-1-fido_max@inbox.ru>

On Thu, Aug 11, 2022 at 12:21:45PM +0300, Maxim Kochetkov wrote:
> MHI channel may generates event/interrupt right after enabling.
> It may leads to 2 race conditions issues.
> 
> 1)
> Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:
> 
> 	if (!qdev || mhi_res->transaction_status)
> 		return;
> 
> Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
> this moment. In this situation qrtr-ns will be unable to enumerate
> services in device.
> ---------------------------------------------------------------
> 
> 2)
> Such event may come at the moment after dev_set_drvdata() and
> before qrtr_endpoint_register(). In this case kernel will panic with
> accessing wrong pointer at qcom_mhi_qrtr_dl_callback():
> 
> 	rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr,
> 				mhi_res->bytes_xferd);
> 
> Because endpoint is not created yet.
> --------------------------------------------------------------
> So move mhi_prepare_for_transfer_autoqueue after endpoint creation
> to fix it.
> 
> Fixes: a2e2cc0dbb11 ("net: qrtr: Start MHI channels during init")
> Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com>
> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
> ---
>  net/qrtr/mhi.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
> index 18196e1c8c2f..17520d9e7a51 100644
> --- a/net/qrtr/mhi.c
> +++ b/net/qrtr/mhi.c
> @@ -78,11 +78,6 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
>  	struct qrtr_mhi_dev *qdev;
>  	int rc;
>  
> -	/* start channels */
> -	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
> -	if (rc)
> -		return rc;
> -
>  	qdev = devm_kzalloc(&mhi_dev->dev, sizeof(*qdev), GFP_KERNEL);
>  	if (!qdev)
>  		return -ENOMEM;
> @@ -96,6 +91,11 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
>  	if (rc)
>  		return rc;
>  
> +	/* start channels */
> +	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
> +	if (rc)
> +		return rc;

I missed the fact that you need to call qrtr_endpoint_unregister() in
the error path.

Please fix it.

Thanks,
Mani

> +
>  	dev_dbg(qdev->dev, "Qualcomm MHI QRTR driver probed\n");
>  
>  	return 0;
> -- 
> 2.34.1
> 

^ permalink raw reply

* Re: [PATCH] can: rx-offload: Break loop on queue full
From: Marc Kleine-Budde @ 2022-08-11  9:43 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Wolfgang Grandegger, linux-can, netdev, kernel
In-Reply-To: <20220811083039.xi4fkj2cl4k22wm7@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 2208 bytes --]

On 11.08.2022 10:30:39, Marc Kleine-Budde wrote:
> On 10.08.2022 16:45:36, Uwe Kleine-König wrote:
> > The following happend on an i.MX25 using flexcan with many packets on
> > the bus:
> > 
> > The rx-offload queue reached a length more than skb_queue_len_max. So in
> > can_rx_offload_offload_one() the drop variable was set to true which
> > made the call to .mailbox_read() (here: flexcan_mailbox_read()) just
> > return ERR_PTR(-ENOBUFS) (plus some irrelevant hardware interaction) and
> > so can_rx_offload_offload_one() returned ERR_PTR(-ENOBUFS), too.
> > 
> > Now can_rx_offload_irq_offload_fifo() looks as follows:
> > 
> > 	while (1) {
> > 		skb = can_rx_offload_offload_one(offload, 0);
> > 		if (IS_ERR(skb))
> > 			continue;
> > 		...
> > 	}
> > 
> > As the i.MX25 is a single core CPU while the rx-offload processing is
> > active there is no thread to process packets from the offload queue and
> > so it doesn't get shorter.
> > 
> > The result is a tight loop: can_rx_offload_offload_one() does nothing
> > relevant and returns an error code and so
> > can_rx_offload_irq_offload_fifo() calls can_rx_offload_offload_one()
> > again.
> > 
> > To break that loop don't continue calling can_rx_offload_offload_one()
> > after it reported an error.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> > Hello,
> > 
> > this patch just implements the obvious change to break the loop. I'm not
> > 100% certain that there is no corner case where the break is wrong. The
> > problem exists at least since v5.6, didn't go back further to check.
> > 
> > This fixes a hard hang on said i.MX25.
> 
> As Uwe suggested in an IRC conversation, the correct fix for the flexcan
> driver is to return NULL if there is no CAN frame pending.
> 
> I'll send a -v2.

https://lore.kernel.org/all/20220811094254.1864367-1-mkl@pengutronix.de

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH v2 1/1] net: qrtr: start MHI channel after endpoit creation
From: Maxim Kochetkov @ 2022-08-11  9:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, linux-arm-msm, Maxim Kochetkov,
	Hemant Kumar, Manivannan Sadhasivam

MHI channel may generates event/interrupt right after enabling.
It may leads to 2 race conditions issues.

1)
Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:

	if (!qdev || mhi_res->transaction_status)
		return;

Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
this moment. In this situation qrtr-ns will be unable to enumerate
services in device.
---------------------------------------------------------------

2)
Such event may come at the moment after dev_set_drvdata() and
before qrtr_endpoint_register(). In this case kernel will panic with
accessing wrong pointer at qcom_mhi_qrtr_dl_callback():

	rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr,
				mhi_res->bytes_xferd);

Because endpoint is not created yet.
--------------------------------------------------------------
So move mhi_prepare_for_transfer_autoqueue after endpoint creation
to fix it.

Fixes: a2e2cc0dbb11 ("net: qrtr: Start MHI channels during init")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
---
 net/qrtr/mhi.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
index 18196e1c8c2f..9ced13c0627a 100644
--- a/net/qrtr/mhi.c
+++ b/net/qrtr/mhi.c
@@ -78,11 +78,6 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
 	struct qrtr_mhi_dev *qdev;
 	int rc;
 
-	/* start channels */
-	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
-	if (rc)
-		return rc;
-
 	qdev = devm_kzalloc(&mhi_dev->dev, sizeof(*qdev), GFP_KERNEL);
 	if (!qdev)
 		return -ENOMEM;
@@ -96,6 +91,13 @@ static int qcom_mhi_qrtr_probe(struct mhi_device *mhi_dev,
 	if (rc)
 		return rc;
 
+	/* start channels */
+	rc = mhi_prepare_for_transfer_autoqueue(mhi_dev);
+	if (rc) {
+		qrtr_endpoint_unregister(&qdev->ep);
+		return rc;
+	}
+
 	dev_dbg(qdev->dev, "Qualcomm MHI QRTR driver probed\n");
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v7 3/4] vhost-vdpa: uAPI to suspend the device
From: Eugenio Perez Martin @ 2022-08-11  9:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm list, linux-kernel, Jason Wang, virtualization, netdev,
	Dinan Gunawardena, Martin Porter, Wu Zongyong, Uminski, Piotr,
	Dawar, Gautam, ecree.xilinx, Martin Petrus Hubertus Habets,
	Stefano Garzarella, Pablo Cascon Katchadourian, habetsm.xilinx,
	Laurent Vivier, Zhu Lingshan, Kamde, Tanuj, Longpeng, Cindy Lu,
	Harpreet Singh Anand, Parav Pandit, Si-Wei Liu, Eli Cohen,
	Xie Yongji, Zhang Min, Dan Carpenter, Christophe JAILLET
In-Reply-To: <20220811042847-mutt-send-email-mst@kernel.org>

On Thu, Aug 11, 2022 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Aug 10, 2022 at 07:15:11PM +0200, Eugenio Pérez wrote:
> > The ioctl adds support for suspending the device from userspace.
> >
> > This is a must before getting virtqueue indexes (base) for live migration,
> > since the device could modify them after userland gets them. There are
> > individual ways to perform that action for some devices
> > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> >
> > After a successful return of the ioctl call the device must not process
> > more virtqueue descriptors. The device can answer to read or writes of
> > config fields as if it were not suspended. In particular, writing to
> > "queue_enable" with a value of 1 will not make the device start
> > processing buffers of the virtqueue.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Message-Id: <20220623160738.632852-4-eperezma@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> You are not supposed to include upstream maintainer's signoff
> like this.
>

I'm very sorry, I modified the commits in your vhost branch and I left
the signoff (and message-id) lines by mistake.

> > ---
> > v7: Delete argument to ioctl, unused
> > ---
> >  drivers/vhost/vdpa.c       | 19 +++++++++++++++++++
> >  include/uapi/linux/vhost.h |  9 +++++++++
> >  2 files changed, 28 insertions(+)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index 3d636e192061..7fa671ac4bdf 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa *v, u32 __user *argp)
> >       return 0;
> >  }
> >
> > +/* After a successful return of ioctl the device must not process more
> > + * virtqueue descriptors. The device can answer to read or writes of config
> > + * fields as if it were not suspended. In particular, writing to "queue_enable"
> > + * with a value of 1 will not make the device start processing buffers.
> > + */
> > +static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> > +{
> > +     struct vdpa_device *vdpa = v->vdpa;
> > +     const struct vdpa_config_ops *ops = vdpa->config;
> > +
> > +     if (!ops->suspend)
> > +             return -EOPNOTSUPP;
> > +
> > +     return ops->suspend(vdpa);
> > +}
> > +
> >  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> >                                  void __user *argp)
> >  {
> > @@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> >       case VHOST_VDPA_GET_VQS_COUNT:
> >               r = vhost_vdpa_get_vqs_count(v, argp);
> >               break;
> > +     case VHOST_VDPA_SUSPEND:
> > +             r = vhost_vdpa_suspend(v);
> > +             break;
> >       default:
> >               r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> >               if (r == -ENOIOCTLCMD)
> > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > index cab645d4a645..f9f115a7c75b 100644
> > --- a/include/uapi/linux/vhost.h
> > +++ b/include/uapi/linux/vhost.h
> > @@ -171,4 +171,13 @@
> >  #define VHOST_VDPA_SET_GROUP_ASID    _IOW(VHOST_VIRTIO, 0x7C, \
> >                                            struct vhost_vring_state)
> >
> > +/* Suspend a device so it does not process virtqueue requests anymore
> > + *
> > + * After the return of ioctl the device must preserve all the necessary state
> > + * (the virtqueue vring base plus the possible device specific states) that is
> > + * required for restoring in the future. The device must not change its
> > + * configuration after that point.
> > + */
> > +#define VHOST_VDPA_SUSPEND           _IO(VHOST_VIRTIO, 0x7D)
> > +
> >  #endif
> > --
> > 2.31.1
>


^ permalink raw reply

* [PATCH] fec: Fix timer capture timing in `fec_ptp_enable_pps()`
From: Csókás Bence @ 2022-08-11 10:13 UTC (permalink / raw)
  To: netdev; +Cc: Richard Cochran, Csókás Bence

Code reimplements functionality already in `fec_ptp_read()`,
but misses check for FEC_QUIRK_BUG_CAPTURE. Replace with function call.

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
---
 drivers/net/ethernet/freescale/fec_ptp.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index 869d149efc53..11620e2a485c 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -139,11 +139,7 @@ static int fec_ptp_enable_pps(struct fec_enet_private *fep, uint enable)
 		 * NSEC_PER_SEC - ts.tv_nsec. Add the remaining nanoseconds
 		 * to current timer would be next second.
 		 */
-		tempval = readl(fep->hwp + FEC_ATIME_CTRL);
-		tempval |= FEC_T_CTRL_CAPTURE;
-		writel(tempval, fep->hwp + FEC_ATIME_CTRL);
-
-		tempval = readl(fep->hwp + FEC_ATIME);
+		tempval = fep->cc.read(&fep->cc);
 		/* Convert the ptp local counter to 1588 timestamp */
 		ns = timecounter_cyc2time(&fep->tc, tempval);
 		ts = ns_to_timespec64(ns);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v7 1/4] vdpa: Add suspend operation
From: Dan Carpenter @ 2022-08-11 10:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Pérez, kvm, linux-kernel, Jason Wang, virtualization,
	netdev, dinang, martinpo, Wu Zongyong, Piotr.Uminski,
	gautam.dawar, ecree.xilinx, martinh, Stefano Garzarella, pabloc,
	habetsm.xilinx, lvivier, Zhu Lingshan, tanuj.kamde, Longpeng,
	lulu, hanand, Parav Pandit, Si-Wei Liu, Eli Cohen, Xie Yongji,
	Zhang Min, Christophe JAILLET
In-Reply-To: <20220811042717-mutt-send-email-mst@kernel.org>

On Thu, Aug 11, 2022 at 04:27:32AM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 10, 2022 at 07:15:09PM +0200, Eugenio Pérez wrote:
> > This operation is optional: It it's not implemented, backend feature bit
> > will not be exposed.
> > 
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Message-Id: <20220623160738.632852-2-eperezma@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> What is this message id doing here?
> 

I like the Message-Id tag.  It means you can `b4 mbox <mesg-id>` and get
the thread.

Linus has complained (rough remembering) that everyone is using the
Link: tag for links to the patch itself.  It's supposed to be for Links
to bugzilla or to the spec or whatever.  Extra information, too much to
put in the commit message.  Now the Link tag is useless because it either
points to the patch or to a bugzilla.  Depend on what you want it to do,
it *always* points to the opposite thing.

But I can't remember what people settled on as the alternative to use
to link to lore...

In theory, we should be able to figure out the link to lore automatically
and there have been a couple projects which tried to do this but they
can't make it work 100%.  Maintainers massage and reformat the patches
too much before applying them.

regards,
dan carpenter

^ permalink raw reply

* Re: [syzbot] inconsistent lock state in find_vmap_area
From: syzbot @ 2022-08-11 10:43 UTC (permalink / raw)
  To: alan.maguire, andrii, ast, bp, bpf, daniel, dave.hansen, gor, hpa,
	john.fastabend, kafai, kpsingh, linux-kernel, linux-usb, luto,
	mingo, netdev, peterz, rdunlap, songliubraving, syzkaller-bugs,
	tglx, x86, yhs
In-Reply-To: <000000000000a256df05e39a74e7@google.com>

syzbot has bisected this issue to:

commit 43174f0d4597325cb91f1f1f55263eb6e6101036
Author: Alan Maguire <alan.maguire@oracle.com>
Date:   Mon Nov 29 10:00:40 2021 +0000

    libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1350eb05080000
start commit:   200e340f2196 Merge tag 'pull-work.dcache' of git://git.ker..
git tree:       upstream
final oops:     https://syzkaller.appspot.com/x/report.txt?x=10d0eb05080000
console output: https://syzkaller.appspot.com/x/log.txt?x=1750eb05080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a3f4d6985d3164cd
dashboard link: https://syzkaller.appspot.com/bug?extid=8d19062486784d15dda9
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1252c5a7080000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1160a8e3080000

Reported-by: syzbot+8d19062486784d15dda9@syzkaller.appspotmail.com
Fixes: 43174f0d4597 ("libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [RFCv7 PATCH net-next 01/36] net: introduce operation helpers for netdev features
From: Alexander Lobakin @ 2022-08-11 10:49 UTC (permalink / raw)
  To: shenjian (K)
  Cc: Alexander Lobakin, davem, kuba, andrew, ecree.xilinx, hkallweit1,
	saeed, leon, netdev, linuxarm
In-Reply-To: <444c4f87-ed36-721a-f619-97c7725e2c87@huawei.com>

From: "shenjian (K)" <shenjian15@huawei.com>
Date: Wed, 10 Aug 2022 19:32:28 +0800

> 在 2022/8/10 17:43, Alexander Lobakin 写道:
> > From: Jian Shen <shenjian15@huawei.com>
> > Date: Wed, 10 Aug 2022 11:05:49 +0800
> >
> >> Introduce a set of bitmap operation helpers for netdev features,
> >> then we can use them to replace the logical operation with them.
> >>
> >> The implementation of these helpers are based on the old prototype
> >> of netdev_features_t is still u64. These helpers will be rewritten
> >> on the last patch, when the prototype changes.
> >>
> >> To avoid interdependencies between netdev_features_helper.h and
> >> netdevice.h, put the helpers for testing feature in the netdevice.h,
> >> and move advandced helpers like netdev_get_wanted_features() and
> >> netdev_intersect_features() to netdev_features_helper.h.
> >>
> >> Signed-off-by: Jian Shen <shenjian15@huawei.com>
> >> ---
> >>   include/linux/netdev_features.h        |  11 +
> >>   include/linux/netdev_features_helper.h | 707 +++++++++++++++++++++++++
> > 'netdev_feature_helpers.h' fits more I guess, doesn't it? It
> > contains several helpers, not only one.
> ok， will rename it.
> 
> > And BTW, do you think it's worth to create a new file rather than
> > put everything just in netdev_features.h?
> Jakub suggested me to move them to a new file, then it can be includued
> at users appropriately. 
> [https://www.spinics.net/lists/netdev/msg809370.html]
> 
> And it's unable to put everything in netdev_features.h, because these 
> helpers
> need to see the definition of struct net_device which is defined in 
> netdevice.h.
> It leading interdependence for netdeice.h include netdev_features.h.

Ah, correct then, sure! I missed that fact.

> 
> 
> >>   include/linux/netdevice.h              |  45 +-
> >>   net/8021q/vlan_dev.c                   |   1 +
> >>   net/core/dev.c                         |   1 +
> >>   5 files changed, 747 insertions(+), 18 deletions(-)
> >>   create mode 100644 include/linux/netdev_features_helper.h
> >>
> >> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> >> index 7c2d77d75a88..9d434b4e6e6e 100644
> >> --- a/include/linux/netdev_features.h
> >> +++ b/include/linux/netdev_features.h
> >> @@ -11,6 +11,17 @@
> >>   
> >>   typedef u64 netdev_features_t;
> >>   
> >> +struct netdev_feature_set {
> >> +	unsigned int cnt;
> >> +	unsigned short feature_bits[];
> >> +};
> >> +
> >> +#define DECLARE_NETDEV_FEATURE_SET(name, features...)			\
> >> +	const struct netdev_feature_set name = {			\
> >> +		.cnt = sizeof((unsigned short[]){ features }) / sizeof(unsigned short),	\
> >> +		.feature_bits = { features },				\
> >> +	}
> >> +
> >>   enum {
> >>   	NETIF_F_SG_BIT,			/* Scatter/gather IO. */
> >>   	NETIF_F_IP_CSUM_BIT,		/* Can checksum TCP/UDP over IPv4. */
> >> diff --git a/include/linux/netdev_features_helper.h b/include/linux/netdev_features_helper.h
> >> new file mode 100644
> >> index 000000000000..5423927d139b
> >> --- /dev/null
> >> +++ b/include/linux/netdev_features_helper.h
> >> @@ -0,0 +1,707 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> +/*
> >> + * Network device features helpers.
> >> + */
> >> +#ifndef _LINUX_NETDEV_FEATURES_HELPER_H
> >> +#define _LINUX_NETDEV_FEATURES_HELPER_H
> >> +
> >> +#include <linux/netdevice.h>
> >> +
> >> +static inline void netdev_features_zero(netdev_features_t *dst)
> >> +{
> >> +	*dst = 0;
> >> +}
> >> +
> >> +/* active_feature prefer to netdev->features */
> >> +#define netdev_active_features_zero(ndev) \
> >> +		netdev_features_zero(&ndev->features)
> > netdev_features_t sometimes is being placed and used on the stack.
> > I think it's better to pass just `netdev_features_t *` to those
> > helpers, this way you wouldn't also need to create a new helper
> > for each net_device::*_features.
> My purpose of defining  helpers for each net_device::*_features is to
> avoiding driver to change  net_device::*_features directly.

But why? My point is that you have to create a whole bunch of
copy'n'paste functions differing only by the &net_device field
name.

> 
> >> +
> >> +#define netdev_hw_features_zero(ndev) \
> >> +		netdev_features_zero(&ndev->hw_features)

Oh BTW: wrap `ndev` in the netdev_features_zero() call into braces,
`netdev_feature_zero(&(ndev)->hw_features)`, otherwise it may cause
unwanted sneaky logical changes or build failures.

> >> +
> >> +#define netdev_wanted_features_zero(ndev) \
> > [...]
> >
> >> +#define netdev_gso_partial_features_and(ndev, __features) \
> >> +		netdev_features_and(ndev->gso_partial_features, __features)
> >> +
> >> +/* helpers for netdev features '&=' operation */
> >> +static inline void
> >> +netdev_features_mask(netdev_features_t *dst,
> >> +			   const netdev_features_t features)
> >> +{
> >> +	*dst = netdev_features_and(*dst, features);
> > A small proposal: if you look at bitmap_and() for example, it
> > returns 1 if the resulting bitmap is non-empty and 0 if it is. What
> > about doing the same here? It would probably help to do reduce
> > boilerplating in the drivers where we only want to know if there's
> > anything left after masking.
> > Same for xor, toggle etc.
> Thanks for point this.  Return whether empty, then I can remove 
> netdev_features_intersects
> helpers. But there are also many places to use 'f1 & f2' as return value 
> or input param, then
> I need to define more temporay features to store the result, and then 
> return the temporay
> features or pass into it.

No, netdev_features_intersects() is okay, leave it as it is. Just
look on bitmap_*() prototypes and return its values when applicable.

> 
> >> +}
> >> +
> >> +static inline void
> >> +netdev_active_features_mask(struct net_device *ndev,
> >> +			    const netdev_features_t features)
> >> +{
> >> +	ndev->features = netdev_active_features_and(ndev, features);
> >> +}
> > [...]
> >
> >> +/* helpers for netdev features 'set bit array' operation */
> >> +static inline void
> >> +netdev_features_set_array(const struct netdev_feature_set *set,
> >> +			  netdev_features_t *dst)
> >> +{
> >> +	int i;
> >> +
> >> +	for (i = 0; i < set->cnt; i++)
> > Nit: kernel is C11 now, you can do just `for (u32 i = 0; i ...`.
> > (and yeah, it's better to use unsigned types when you don't plan
> > to store negative values there).
> ok, will fix it.
> 
> >> +		netdev_feature_add(set->feature_bits[i], dst);
> >> +}
> > [...]
> >
> >> -- 
> >> 2.33.0
> > Thanks,
> > Olek
> >
> > .

Thanks,
Olek

^ permalink raw reply

* Re: [RFCv7 PATCH net-next 02/36] net: replace general features macroes with global netdev_features variables
From: Alexander Lobakin @ 2022-08-11 11:05 UTC (permalink / raw)
  To: shenjian (K)
  Cc: Alexander Lobakin, davem, kuba, andrew, ecree.xilinx, hkallweit1,
	saeed, leon, netdev, linuxarm
In-Reply-To: <7eb9ad01-cf1f-afea-0c16-4b269462236f@huawei.com>

From: "shenjian (K)" <shenjian15@huawei.com>
Date: Wed, 10 Aug 2022 20:01:15 +0800

> 在 2022/8/10 17:58, Alexander Lobakin 写道:
> > From: Jian Shen <shenjian15@huawei.com>
> > Date: Wed, 10 Aug 2022 11:05:50 +0800
> >
> >> There are many netdev_features bits group used in kernel. The definition
> >> will be illegal when using feature bit more than 64. Replace these macroes
> >> with global netdev_features variables, initialize them when netdev module
> >> init.
> >>
> >> Signed-off-by: Jian Shen <shenjian15@huawei.com>
> >> ---

[...]

> >> @@ -11362,6 +11363,86 @@ static struct pernet_operations __net_initdata default_device_ops = {
> >>   	.exit_batch = default_device_exit_batch,
> >>   };
> >>   
> >> +static void __init netdev_features_init(void)
> > Given that you're creating a new file dedicated to netdev features,
> > I'd place that initializer there. You can then declare its proto in
> > net/core/dev.h.
> I want to make sure it cann't be called outside net/core/dev.c, for some
> drivers include net/core/dev.h, then they can see it.

net/core/dev.h is internal, nobody outside net/core/ uses it and
this was its purpose.

> 
> >> +{
> >> +	netdev_features_t features;
> >> +
> >> +	netdev_features_set_array(&netif_f_ip_csum_feature_set,
> >> +				  &netdev_ip_csum_features);
> >> +	netdev_features_set_array(&netif_f_csum_feature_set_mask,
> >> +				  &netdev_csum_features_mask);
> >> +
> >> +	netdev_features_set_array(&netif_f_gso_feature_set_mask,
> >> +				  &netdev_gso_features_mask);
> >> +	netdev_features_set_array(&netif_f_general_tso_feature_set,
> >> +				  &netdev_general_tso_features);
> >> +	netdev_features_set_array(&netif_f_all_tso_feature_set,
> >> +				  &netdev_all_tso_features);
> >> +	netdev_features_set_array(&netif_f_tso_ecn_feature_set,
> >> +				  &netdev_tso_ecn_features);
> >> +	netdev_features_set_array(&netif_f_all_fcoe_feature_set,
> >> +				  &netdev_all_fcoe_features);
> >> +	netdev_features_set_array(&netif_f_gso_soft_feature_set,
> >> +				  &netdev_gso_software_features);
> >> +	netdev_features_set_array(&netif_f_gso_encap_feature_set,
> >> +				  &netdev_gso_encap_all_features);
> >> +
> >> +	netdev_csum_gso_features_mask =
> >> +		netdev_features_or(netdev_gso_features_mask,
> >> +				   netdev_csum_features_mask);
> > (I forgot to mention this in 01/36 ._.)
> >
> > As you're converting to bitmaps, you should probably avoid direct
> > assignments. All the bitmap_*() modification functions take a pointer
> > to the destination as a first argument. So it should be
> >
> > netdev_features_or(netdev_features_t *dst, const netdev_features_t *src1,
> > 		   const netdev_features_t *src1);
> The netdev_features_t will be convert to a structure which only contained
> a feature bitmap. So assginement is ok.

Yeah I realized it later, probably a good idea.

> 
> 
> >> +
> >> +	netdev_features_set_array(&netif_f_one_for_all_feature_set,
> >> +				  &netdev_one_for_all_features);
> > Does it make sense to prefix features and the corresponding sets
> > differently? Why not just 'netdev_' for both of them?
> For all the feature bits are named "NETFI_F_XXX_BIT",

Right, but then why are netdev_*_features prefixed with 'netdev',
not 'netif_f'? :D Those sets are tied tightly with the feature
structures, so I think they should have the same prefix. I'd go
with 'netdev' for both.

> 
> 
> >> +	netdev_features_set_array(&netif_f_all_for_all_feature_set,
> >> +				  &netdev_all_for_all_features);

[...]

> >> -- 
> >> 2.33.0
> > Thanks,
> > Olek
> >
> > .
> 
> Thank,
> Jian

Thanks,
Olek

^ permalink raw reply

* Re: CFS for Netdev 0x16 open!
From: Ferenc Fejes @ 2022-08-11 11:17 UTC (permalink / raw)
  To: jhs@mojatatu.com, linux-wireless@vger.kernel.org,
	people@netdevconf.info, lwn@lwn.net,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
  Cc: prog-committee-0x16@netdevconf.info, board@netdevconf.info
In-Reply-To: <CAM0EoMn_4k_w_maX=0=tmiK5k1nTEWpByGP+83qiJHdM0DbigA@mail.gmail.com>

Dear Jamal!

On Tue, 2022-06-07 at 12:36 -0400, Jamal Hadi Salim wrote:
> We are pleased to announce the opening of Call For
> Submissions(CFS) for Netdev 0x16.
> 
> For overview of topics, submissions and requirements
> please visit:
> https://netdevconf.info/0x16/submit-proposal.html
> 
> For all submitted sessions, we employ a blind
> review process carried out by the Program Committee.
> Please refer to:
> https://www.netdevconf.info/0x16/pc_review.html
> 
> Important dates:
> Closing of CFS: Wed, Sept. 7, 2022
> Notification by: Thu, Sept. 15, 2022
> Conference dates: Oct 24th - 28th, 2022

Sorry for spamming it here, but unfortunaltey seems like the
"submissions-0x16@netdevconf.info" address not working (or not yet). I
sent a question to that address but it bounced back all of my mailboxes
including gmail (assuming there must be on the recepient side if any).

> 
> cheers,
> jamal (on behalf of the Netdev Society)

Best,
Ferenc

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox