Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Luiz Angelo Daros de Luca @ 2023-11-03 17:05 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vladimir Oltean, Andrew Lunn, Florian Fainelli, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list:NETWORKING DRIVERS, open list
In-Reply-To: <CACRpkdaoBo0S0RgLhacObd3pbjtWAfr6s3oizQAHqdB76gaG5A@mail.gmail.com>

> > [    3.967086] realtek-smi switch: set MAC: 42:E4:F5:XX:XX:XX
>
> Unrelated: I have seen other DSA switches "inherit" the MAC of the
> conduit interface (BRCM). I wonder if we could do that too instead
> of this random MAC number. Usually the conduit interface has a
> properly configured MAC.
>
> > [    3.976779] realtek-smi switch: missing child interrupt-controller node
> > [    3.983455] realtek-smi switch: no interrupt support
> > [    4.158891] realtek-smi switch: no LED for port 5
>
> Are the LEDs working? My device has no LEDs so I have never
> tested it, despite I coded it. (Also these days we can actually
> support each LED individually configured from device tree using
> the LED API, but that would be quite a bit of code, so only some
> fun for the aspiring developer...)

No at all. LEDs sometimes change state all together but they normally
are just kept on.

My device is not funny to play with. It has only 32MB of RAM and it
frequently OOM. Even flashing a new image requires some sokoban,
unloading all possible kernel modules. It is not usable anymore in the
real world but I might take a look at LEDs just for fun. The LEDs in
the old swconfig driver do work and I can compare the code.

> > Maybe the ag71xx driver is doing something differently.
>
> I'll look at it!
>
> Thanks a lot Luiz,

I'm glad to help.

Regards,

Luiz

^ permalink raw reply

* Re: [EXT] Re: [net PATCH] octeontx2: Fix klockwork and coverity issues
From: Andrew Lunn @ 2023-11-03 17:06 UTC (permalink / raw)
  To: Suman Ghosh
  Cc: Sunil Kovvuri Goutham, Geethasowjanya Akula,
	Subbaraya Sundeep Bhatta, Hariprasad Kelam, Linu Cherian,
	Jerin Jacob Kollanukkaran, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	horms@kernel.org, Ratheesh Kannoth
In-Reply-To: <SJ0PR18MB521648A973DF026521B57D39DBA5A@SJ0PR18MB5216.namprd18.prod.outlook.com>

> [Suman] Yes, backporting to net was the intention. Do we need to put all the fixes tags? Because the fixes are from different commits.

When you break it up into multiple patches, the fixes-tag will get
simpler.

> Are there any generic submission steps for klockwork fixes?

Take a look at

https://docs.kernel.org/process/stable-kernel-rules.html

In particular, the "It must fix a real bug that bothers people".

Some issues found by static code tools don't actually bother
people. So you might want some patches to net-next, not net.

     Andrew

^ permalink raw reply

* Re: [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso_bug()
From: Alex Pakhunov @ 2023-11-03 17:07 UTC (permalink / raw)
  To: michael.chan
  Cc: alexey.pakhunov, linux-kernel, mchan, netdev, prashant,
	siva.kallam, vincent.wong2
In-Reply-To: <CACKFLik-Ey1eptrCkhSEp0Oi66kBKnVWa+yDk7-_uzxqSTHb6A@mail.gmail.com>

> This is prone to race conditions if we have more than one TX queue.

Yes, indeed.

> The original driver code only supported one TX queue and the counters
> were never modified properly to support multiple queues.  We should
> convert them to per queue counters by moving tx_dropped and rx_dropped
> to the tg3_napi struct.

I'm not super familiar with the recommended approach for handling locks in
network drivers, so I spent a bit of tme looking at what tg3 does.

It seems that there are a few ways to remove the race condition when
working with these counters:

1. Use atomic increments. It is easy but every update is more expensive
   than it needs to be. We might be able to say that there specific
   counters are updated rarely, so maybe we don't care too much.
2. netif_tx_lock is already taken when tx_droped is incremented - wrap
   rx_dropped increment and reading both counters in netif_tx_lock. This
   seems legal since tg3_tx() can take netif_tx_lock. I'm not sure how to
   order netif_tx_lock and tp->lock, since tg3_get_stats64() takes
   the latter. Should netif_tx_lock be takes inside tp->lock? Should they
   be not nested?
3. Using tp->lock to protect rx_dropped (tg3_poll_link() already takes it
   so it must be legal) and netif_tx_lock to protect tx_dropped.

There are probably other options. Can you recommend an aproach?

Also, this seems like a larger change that should be done separately from
fixing the TX stall. Should we land just "[PATCH v2 2/2]"? Should we land
the whole patch (since it does not make race condition much worse) and fix
the race condition separately?

Alex.

^ permalink raw reply

* Re: [PATCH iwl-next] iavf: Remove queue tracking fields from iavf_adminq_ring
From: Simon Horman @ 2023-11-03 17:09 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: netdev, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	linux-kernel, Jacob Keller, Wojciech Drewek
In-Reply-To: <20231026083932.2623631-1-ivecera@redhat.com>

On Thu, Oct 26, 2023 at 10:39:32AM +0200, Ivan Vecera wrote:
> Fields 'head', 'tail', 'len', 'bah' and 'bal' in iavf_adminq_ring
> are used to store register offsets. These offsets are initialized
> and remains constant so there is no need to store them in the
> iavf_adminq_ring structure.
> 
> Remove these fields from iavf_adminq_ring and use register offset
> constants instead. Remove iavf_adminq_init_regs() that originally
> stores these constants into these fields.
> 
> Finally improve iavf_check_asq_alive() that assumes that
> non-zero value of hw->aq.asq.len indicates fully initialized
> AdminQ send queue. Replace it by check for non-zero value
> of field hw->aq.asq.count that is non-zero when the sending
> queue is initialized and is zeroed during shutdown of
> the queue.
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Thanks, this is a nice cleanup.

Reviewed-by: Simon Horman <horms@kernel.org>

...

^ permalink raw reply

* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Luiz Angelo Daros de Luca @ 2023-11-03 17:13 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Linus Walleij, Andrew Lunn, Florian Fainelli, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20231102103123.hklqlsbb5kbq53mm@skbuf>

>
> Hi Luiz,
>
> On Wed, Nov 01, 2023 at 09:35:30AM -0300, Luiz Angelo Daros de Luca wrote:
> > Hi Linus,
> >
> > Sorry but I noticed no issues:
> >
> > From the router:
> >
> > No. Time Source Destination Protocol Length Info
> > 1 0.000000000 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) request id=0x0789, seq=23/5888, ttl=64 (reply in 2)
> > 2 0.000040094 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) reply id=0x0789, seq=23/5888, ttl=64 (request in 1)
> >
> > From the host:
> >
> > No. Time Source Destination Protocol Length Info
> > 1 0.000000000 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) request id=0x0002, seq=8/2048, ttl=64 (reply in 2)
> > 2 0.000391800 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) reply id=0x0002, seq=8/2048, ttl=64 (request in 1)
> >
> > If I go over that limit, it fragments the packet as expected.
>
> Could you run the shell command that sweeps over the entire range, fromhere?
> https://lore.kernel.org/netdev/20231030222035.oqos7v7sdq5u6mti@skbuf/

Sure. I might be able to run it on Monday. I'm away from the device
and it might have OOM. It has too little RAM to survive too much time
by itself. However, I doubt it might fail at a specific packet range
as it would hang an interactive SSH session quite easily.

Regards,

Luiz

^ permalink raw reply

* Re: [PATCH iwl-next] i40e: Remove AQ register definitions for VF types
From: Simon Horman @ 2023-11-03 17:13 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: netdev, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	linux-kernel, Jacob Keller, Wojciech Drewek
In-Reply-To: <20231026083822.2622930-1-ivecera@redhat.com>

On Thu, Oct 26, 2023 at 10:38:22AM +0200, Ivan Vecera wrote:
> The i40e driver does not handle its VF device types so there
> is no need to keep AdminQ register definitions for such
> device types. Remove them.
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Thanks, another nice cleanup.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* [RFC v1 0/8] vhost-vdpa: add support for iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev


Hi All
This code provides the iommufd support for vdpa device
This code fixes the bugs from the last version and also add the asid support. rebase on kernel
v6,6-rc3
Test passed in the physical device (vp_vdpa), but  there are still some problems in the emulated device (vdpa_sim_net), 
I will continue working on it

The kernel code is
https://gitlab.com/lulu6/vhost/-/tree/iommufdRFC_v1

Signed-off-by: Cindy Lu <lulu@redhat.com>


Cindy Lu (8):
  vhost/iommufd: Add the functions support iommufd
  Kconfig: Add the new file vhost/iommufd
  vhost: Add 3 new uapi to support iommufd
  vdpa: Add new vdpa_config_ops to support iommufd
  vdpa_sim :Add support for iommufd
  vdpa: change the map/unmap process to support iommufd
  vp_vdpa::Add support for iommufd
  iommu: expose the function iommu_device_use_default_domain

 drivers/iommu/iommu.c             |   2 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |   8 ++
 drivers/vdpa/virtio_pci/vp_vdpa.c |   4 +
 drivers/vhost/Kconfig             |   1 +
 drivers/vhost/Makefile            |   1 +
 drivers/vhost/iommufd.c           | 178 +++++++++++++++++++++++++
 drivers/vhost/vdpa.c              | 210 +++++++++++++++++++++++++++++-
 drivers/vhost/vhost.h             |  21 +++
 include/linux/vdpa.h              |  38 +++++-
 include/uapi/linux/vhost.h        |  66 ++++++++++
 10 files changed, 525 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vhost/iommufd.c

-- 
2.34.3


^ permalink raw reply

* [RFC v1 1/8] vhost/iommufd: Add the functions support iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Add a new file vhost/iommufd.c to support the function of
iommufd, This file contains iommufd function of emulated device and
the physical device. 

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/iommufd.c | 178 ++++++++++++++++++++++++++++++++++++++++
 drivers/vhost/vhost.h   |  21 +++++
 2 files changed, 199 insertions(+)
 create mode 100644 drivers/vhost/iommufd.c

diff --git a/drivers/vhost/iommufd.c b/drivers/vhost/iommufd.c
new file mode 100644
index 000000000000..113dda50a9b6
--- /dev/null
+++ b/drivers/vhost/iommufd.c
@@ -0,0 +1,178 @@
+#include <linux/vdpa.h>
+#include <linux/iommufd.h>
+
+#include "vhost.h"
+
+MODULE_IMPORT_NS(IOMMUFD);
+
+int vdpa_iommufd_bind(struct vdpa_device *vdpa, struct iommufd_ctx *ictx,
+		      u32 *ioas_id, u32 *device_id)
+{
+	int ret;
+
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	/*
+        * If the driver doesn't provide this op then it means the device does
+        * not do DMA at all. So nothing to do.
+        */
+	if (!vdpa->config->bind_iommufd)
+		return 0;
+	ret = vdpa->config->bind_iommufd(vdpa, ictx, device_id);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+void vdpa_iommufd_unbind(struct vdpa_device *vdpa)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (vdpa->config->unbind_iommufd)
+		vdpa->config->unbind_iommufd(vdpa);
+}
+
+int vdpa_iommufd_physical_bind(struct vdpa_device *vdpa,
+			       struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+	struct device *dma_dev = vdpa_get_dma_dev(vdpa);
+	struct iommufd_device *idev;
+
+	idev = iommufd_device_bind(ictx, dma_dev, out_device_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+	vdpa->iommufd_device = idev;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_physical_bind);
+
+void vdpa_iommufd_physical_unbind(struct vdpa_device *vdpa)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (vdpa->iommufd_attached) {
+		iommufd_device_detach(vdpa->iommufd_device);
+		vdpa->iommufd_attached = false;
+	}
+	iommufd_device_unbind(vdpa->iommufd_device);
+	vdpa->iommufd_device = NULL;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_physical_unbind);
+
+int vdpa_iommufd_physical_attach_ioas(struct vdpa_device *vdpa,
+				      u32 *iommufd_ioasid)
+{
+	int rc;
+
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (WARN_ON(!vdpa->iommufd_device))
+		return -EINVAL;
+
+	if (vdpa->iommufd_attached)
+		rc = iommufd_device_replace(vdpa->iommufd_device,
+					    iommufd_ioasid);
+	else
+		rc = iommufd_device_attach(vdpa->iommufd_device,
+					   iommufd_ioasid);
+	if (rc)
+		return rc;
+	vdpa->iommufd_attached = true;
+
+	return 0;
+}
+
+EXPORT_SYMBOL_GPL(vdpa_iommufd_physical_attach_ioas);
+int vdpa_iommufd_physical_detach_ioas(struct vdpa_device *vdpa)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (WARN_ON(!vdpa->iommufd_device) || !vdpa->iommufd_attached)
+		return -1;
+
+	iommufd_device_detach(vdpa->iommufd_device);
+	vdpa->iommufd_attached = false;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_physical_detach_ioas);
+
+static void vdpa_emulated_unmap(void *data, unsigned long iova,
+				unsigned long length)
+{
+	struct vdpa_device *vdpa = data;
+	/* todo: need to unmap the iova-lenth in all ASID*/
+
+	//	vdpa->config->dma_unmap(vdpa, 0, iova, length);
+}
+
+static const struct iommufd_access_ops vdpa_user_ops = {
+	.needs_pin_pages = 1,
+	.unmap = vdpa_emulated_unmap,
+};
+
+int vdpa_iommufd_emulated_bind(struct vdpa_device *vdpa,
+			       struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	struct iommufd_access *user;
+
+	user = iommufd_access_create(ictx, &vdpa_user_ops, vdpa, out_device_id);
+	if (IS_ERR(user))
+		return PTR_ERR(user);
+	vdpa->iommufd_access = user;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_emulated_bind);
+
+void vdpa_iommufd_emulated_unbind(struct vdpa_device *vdpa)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (vdpa->iommufd_access) {
+		iommufd_access_destroy(vdpa->iommufd_access);
+		vdpa->iommufd_attached = false;
+		vdpa->iommufd_access = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_emulated_unbind);
+
+int vdpa_iommufd_emulated_attach_ioas(struct vdpa_device *vdpa,
+				      u32 *iommufd_ioasid)
+{
+	int rc;
+
+	struct iommufd_access *user;
+
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (vdpa->iommufd_attached) {
+		rc = iommufd_access_replace(vdpa->iommufd_access,
+					    *iommufd_ioasid);
+	} else {
+		rc = iommufd_access_attach(vdpa->iommufd_access,
+					   *iommufd_ioasid);
+	}
+	user = vdpa->iommufd_access;
+
+	if (rc)
+		return rc;
+	vdpa->iommufd_attached = true;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_emulated_attach_ioas);
+
+int vdpa_iommufd_emulated_detach_ioas(struct vdpa_device *vdpa)
+{
+	vhost_vdpa_lockdep_assert_held(vdpa);
+
+	if (WARN_ON(!vdpa->iommufd_access) || !vdpa->iommufd_attached)
+		return -1;
+
+	iommufd_access_detach(vdpa->iommufd_access);
+	vdpa->iommufd_attached = false;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vdpa_iommufd_emulated_detach_ioas);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f60d5f7bef94..179012e350f9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -310,6 +310,27 @@ static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
 }
 #endif
 
+struct iommufd_ctx;
+struct vdpa_device;
+void vhost_vdpa_lockdep_assert_held(struct vdpa_device *vdpa);
+
+#if IS_ENABLED(CONFIG_IOMMUFD)
+int vdpa_iommufd_bind(struct vdpa_device *vdpa, struct iommufd_ctx *ictx,
+		      u32 *ioas_id, u32 *device_id);
+void vdpa_iommufd_unbind(struct vdpa_device *vdpa);
+#else
+static inline int vdpa_iommufd_bind(struct vdpa_device *vdpa,
+				    struct iommufd_ctx *ictx, u32 *ioas_id,
+				    u32 *device_id)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vdpa_iommufd_unbind(struct vdpa_device *vdpa)
+{
+}
+#endif
+
 /* Memory accessors */
 static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)
 {
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 2/8] Kconfig: Add the new file vhost/iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Change the makefile and Kconfig, to add the
new file vhost/iommufd.c

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/Kconfig  | 1 +
 drivers/vhost/Makefile | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index b455d9ab6f3d..a4becfb36d77 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -72,6 +72,7 @@ config VHOST_VDPA
 	select VHOST
 	select IRQ_BYPASS_MANAGER
 	depends on VDPA
+	depends on IOMMUFD || !IOMMUFD
 	help
 	  This kernel module can be loaded in host kernel to accelerate
 	  guest virtio devices with the vDPA-based backends.
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..cda7f6b7f8da 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_VHOST_RING) += vringh.o
 
 obj-$(CONFIG_VHOST_VDPA) += vhost_vdpa.o
 vhost_vdpa-y := vdpa.o
+vhost_vdpa-$(CONFIG_IOMMUFD) += iommufd.o
 
 obj-$(CONFIG_VHOST)	+= vhost.o
 
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 3/8] vhost: Add 3 new uapi to support iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

VHOST_VDPA_SET_IOMMU_FD: bind the device to iommufd device

VDPA_DEVICE_ATTACH_IOMMUFD_AS: Attach a vdpa device to an iommufd
address space specified by IOAS id.

VDPA_DEVICE_DETACH_IOMMUFD_AS: Detach a vdpa device
from the iommufd address space

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/vdpa.c       | 171 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vhost.h |  66 ++++++++++++++
 2 files changed, 237 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 78379ffd2336..dfaddd833364 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/iommu.h>
+#include <linux/iommufd.h>
 #include <linux/uuid.h>
 #include <linux/vdpa.h>
 #include <linux/nospec.h>
@@ -25,6 +26,8 @@
 
 #include "vhost.h"
 
+MODULE_IMPORT_NS(IOMMUFD);
+
 enum {
 	VHOST_VDPA_BACKEND_FEATURES =
 	(1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) |
@@ -69,6 +72,15 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
 				   struct vhost_iotlb *iotlb, u64 start,
 				   u64 last, u32 asid);
 
+void vhost_vdpa_lockdep_assert_held(struct vdpa_device *vdpa)
+{
+	struct vhost_vdpa *v = vdpa_get_drvdata(vdpa);
+
+	if (WARN_ON(!v))
+		return;
+	lockdep_assert_held(&v->vdev.mutex);
+}
+
 static inline u32 iotlb_to_asid(struct vhost_iotlb *iotlb)
 {
 	struct vhost_vdpa_as *as = container_of(iotlb, struct
@@ -551,6 +563,149 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
 
 	return ops->suspend(vdpa);
 }
+static long vhost_vdpa_iommufd_set_device(struct vhost_vdpa *v,
+					  void __user *argp)
+{
+	struct device *dma_dev = vdpa_get_dma_dev(v->vdpa);
+	struct vhost_vdpa_set_iommufd set_iommufd;
+	struct vdpa_device *vdpa = v->vdpa;
+	struct iommufd_ctx *ictx;
+	unsigned long minsz;
+	u32 ioas_id, dev_id;
+	struct fd f;
+	long r = 0;
+
+	minsz = offsetofend(struct vhost_vdpa_set_iommufd, iommufd_ioasid);
+	if (copy_from_user(&set_iommufd, argp, minsz))
+		return -EFAULT;
+
+	/* Unset IOMMUFD */
+	if (set_iommufd.iommufd < 0) {
+		if (!vdpa->iommufd_ictx || !vdpa->iommufd_device)
+			return -EINVAL;
+		if (atomic_read(&vdpa->iommufd_users)) {
+			atomic_dec(&vdpa->iommufd_users);
+			return 0;
+		}
+		vdpa_iommufd_unbind(v->vdpa);
+		vdpa->iommufd_device = NULL;
+		vdpa->iommufd_ictx = NULL;
+		return iommu_attach_device(v->domain, dma_dev);
+	}
+
+	/* For same device but different groups, ++refcount only */
+	if (vdpa->iommufd_device)
+		goto out_inc;
+
+	r = -EBADF;
+	f = fdget(set_iommufd.iommufd);
+	if (!f.file)
+		goto out;
+
+	r = -EINVAL;
+	ictx = iommufd_ctx_from_file(f.file);
+	if (IS_ERR(ictx))
+		goto out_fdput;
+
+	if (v->domain) {
+		iommu_device_unuse_default_domain(dma_dev);
+		iommu_detach_device(v->domain, dma_dev);
+	}
+
+	ioas_id = set_iommufd.iommufd_ioasid;
+	r = vdpa_iommufd_bind(vdpa, ictx, &ioas_id, &dev_id);
+	if (r)
+		goto out_reattach;
+
+	set_iommufd.out_dev_id = dev_id;
+	r = copy_to_user(argp + minsz, &set_iommufd.out_dev_id,
+			 sizeof(set_iommufd.out_dev_id)) ?
+		    -EFAULT :
+		    0;
+	if (r)
+		goto out_device_unbind;
+
+	vdpa->iommufd_ictx = ictx;
+
+out_inc:
+	atomic_inc(&vdpa->iommufd_users);
+
+	goto out_fdput;
+
+out_device_unbind:
+
+	vdpa_iommufd_unbind(vdpa);
+out_reattach:
+	iommu_device_use_default_domain(dma_dev);
+	iommu_attach_device(v->domain, dma_dev);
+	iommufd_ctx_put(ictx);
+out_fdput:
+	fdput(f);
+out:
+	return r;
+}
+int vhost_vdpa_iommufd_ioas_attach(struct vhost_vdpa *v, void __user *arg)
+{
+	struct vdpa_device_attach_iommufd_as attach;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vdpa_device_attach_iommufd_as, ioas_id);
+
+	if (copy_from_user(&attach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags)
+		return -EINVAL;
+
+	if (!v->vdpa->config->bind_iommufd)
+		return -ENODEV;
+
+	if (!v->vdpa->iommufd_ictx) {
+		ret = -EINVAL;
+		return ret;
+	}
+
+	ret = v->vdpa->config->attach_ioas(v->vdpa, &attach.ioas_id);
+
+	if (ret)
+		return ret;
+
+	ret = copy_to_user(
+		      (void __user *)arg +
+			      offsetofend(struct vdpa_device_attach_iommufd_as,
+					  flags),
+		      &attach.ioas_id, sizeof(attach.ioas_id)) ?
+		      -EFAULT :
+		      0;
+
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+int vhost_vdpa_iommufd_ioas_detach(struct vhost_vdpa *v, void __user *arg)
+{
+	struct vdpa_device_detach_iommufd_as detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vdpa_device_detach_iommufd_as, flags);
+
+	if (copy_from_user(&detach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	if (!v->vdpa->config->bind_iommufd)
+		return -ENODEV;
+
+	if (v->vdpa->iommufd_ictx) {
+		return -EINVAL;
+	}
+	return v->vdpa->config->detach_ioas(v->vdpa);
+}
 
 /* After a successful return of this ioctl the device resumes processing
  * virtqueue descriptors. The device becomes fully operational the same way it
@@ -744,6 +899,18 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 	case VHOST_SET_LOG_FD:
 		r = -ENOIOCTLCMD;
 		break;
+	case VHOST_VDPA_SET_IOMMU_FD:
+
+		r = vhost_vdpa_iommufd_set_device(v, argp);
+		break;
+	case VDPA_DEVICE_ATTACH_IOMMUFD_AS:
+		r = vhost_vdpa_iommufd_ioas_attach(v, (void __user *)arg);
+		break;
+
+	case VDPA_DEVICE_DETACH_IOMMUFD_AS:
+		r = vhost_vdpa_iommufd_ioas_detach(v, (void __user *)arg);
+		break;
+
 	case VHOST_VDPA_SET_CONFIG_CALL:
 		r = vhost_vdpa_set_config_call(v, argp);
 		break;
@@ -896,6 +1063,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct vhost_iotlb *iotlb,
 	} else if (ops->set_map) {
 		if (!v->in_batch)
 			r = ops->set_map(vdpa, asid, iotlb);
+	} else if (!vdpa->iommufd_ictx) {
+		/* Legacy iommu domain pathway without IOMMUFD */
+		r = iommu_map(v->domain, iova, pa, size,
+			      perm_to_iommu_flags(perm), GFP_KERNEL);
 	} else {
 		r = iommu_map(v->domain, iova, pa, size,
 			      perm_to_iommu_flags(perm), GFP_KERNEL);
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index f5c48b61ab62..07e1b2c443ca 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -219,4 +219,70 @@
  */
 #define VHOST_VDPA_RESUME		_IO(VHOST_VIRTIO, 0x7E)
 
+/* vhost_vdpa_set_iommufd
+ * Input parameters:
+ * @iommufd: file descriptor from /dev/iommu; pass -1 to unset
+ * @iommufd_ioasid: IOAS identifier returned from ioctl(IOMMU_IOAS_ALLOC)
+ * Output parameters:
+ * @out_dev_id: device identifier
+ */
+struct vhost_vdpa_set_iommufd {
+	__s32 iommufd;
+	__u32 iommufd_ioasid;
+	__u32 out_dev_id;
+};
+
+#define VHOST_VDPA_SET_IOMMU_FD \
+	_IOW(VHOST_VIRTIO, 0x7F, struct vhost_vdpa_set_iommufd)
+
+/*
+ * VDPA_DEVICE_ATTACH_IOMMUFD_AS -
+ * _IOW(VHOST_VIRTIO, 0x7f, struct vdpa_device_attach_iommufd_as)
+ *
+ * Attach a vdpa device to an iommufd address space specified by IOAS
+ * id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VHOST_VDPA_SET_IOMMU_FD
+ *
+ * Undo by VDPA_DEVICE_DETACH_IOMMUFD_AS or device fd close.
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @ioas_id:	Input the target id which can represent an ioas
+ *		allocated via iommufd subsystem.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vdpa_device_attach_iommufd_as {
+	__u32 argsz;
+	__u32 flags;
+	__u32 ioas_id;
+};
+
+#define VDPA_DEVICE_ATTACH_IOMMUFD_AS \
+	_IOW(VHOST_VIRTIO, 0x82, struct vdpa_device_attach_iommufd_as)
+
+/*
+ * VDPA_DEVICE_DETACH_IOMMUFD_AS
+ *
+ * Detach a vdpa device from the iommufd address space it has been
+ * attached to. After it, device should be in a blocking DMA state.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VHOST_VDPA_SET_IOMMU_FD
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vdpa_device_detach_iommufd_as {
+	__u32 argsz;
+	__u32 flags;
+};
+
+#define VDPA_DEVICE_DETACH_IOMMUFD_AS \
+	_IOW(VHOST_VIRTIO, 0x83, struct vdpa_device_detach_iommufd_as)
+
 #endif
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 4/8] vdpa: Add new vdpa_config_ops to support iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Add 4 new vdpa_config_ops function to support iommufd

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 include/linux/vdpa.h | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 0e652026b776..233d80f9d910 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/device.h>
 #include <linux/interrupt.h>
+#include <linux/iommufd.h>
 #include <linux/vhost_iotlb.h>
 #include <linux/virtio_net.h>
 #include <linux/if_ether.h>
@@ -97,6 +98,12 @@ struct vdpa_device {
 	struct vdpa_mgmt_dev *mdev;
 	unsigned int ngroups;
 	unsigned int nas;
+	struct iommufd_access *iommufd_access;
+	struct iommufd_device *iommufd_device;
+	struct iommufd_ctx *iommufd_ictx;
+	unsigned long *vq_bitmap;
+	atomic_t iommufd_users;
+	bool iommufd_attached;
 };
 
 /**
@@ -332,6 +339,17 @@ struct vdpa_map_file {
  *				@vdev: vdpa device
  * @free:			Free resources that belongs to vDPA (optional)
  *				@vdev: vdpa device
+ * @bind_iommufd:              use vdpa_iommufd_physical_bind for an IOMMU
+ *                             backed device.
+ *                             otherwise use vdpa_iommufd_emulated_bind
+ * @unbind_iommufd:            use vdpa_iommufd_physical_unbind for an IOMMU
+ *                             backed device.
+ *                             otherwise, use vdpa_iommufd_emulated_unbind
+ * @attach_ioas:               use vdpa_iommufd_physical_attach_ioas for an
+ *                             IOMMU backed device.
+ * @detach_ioas:               Opposite of attach_ioas
+ * @free:			Free resources that belongs to vDPA (optional)
+ *				@vdev: vdpa device
  */
 struct vdpa_config_ops {
 	/* Virtqueue ops */
@@ -402,6 +420,13 @@ struct vdpa_config_ops {
 
 	/* Free device resources */
 	void (*free)(struct vdpa_device *vdev);
+	/* IOMMUFD ops */
+	int (*bind_iommufd)(struct vdpa_device *vdev, struct iommufd_ctx *ictx,
+			    u32 *out_device_id);
+	void (*unbind_iommufd)(struct vdpa_device *vdev);
+	int (*attach_ioas)(struct vdpa_device *vdev, u32 *pt_id);
+	int (*detach_ioas)(struct vdpa_device *vdev);
+
 };
 
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
@@ -570,4 +595,15 @@ struct vdpa_mgmt_dev {
 int vdpa_mgmtdev_register(struct vdpa_mgmt_dev *mdev);
 void vdpa_mgmtdev_unregister(struct vdpa_mgmt_dev *mdev);
 
-#endif /* _LINUX_VDPA_H */
+int vdpa_iommufd_physical_bind(struct vdpa_device *vdpa,
+			       struct iommufd_ctx *ictx, u32 *out_device_id);
+void vdpa_iommufd_physical_unbind(struct vdpa_device *vdpa);
+int vdpa_iommufd_physical_attach_ioas(struct vdpa_device *vdpa, u32 *pt_id);
+int vdpa_iommufd_physical_detach_ioas(struct vdpa_device *vdpa);
+int vdpa_iommufd_emulated_bind(struct vdpa_device *vdpa,
+			       struct iommufd_ctx *ictx, u32 *out_device_id);
+void vdpa_iommufd_emulated_unbind(struct vdpa_device *vdpa);
+int vdpa_iommufd_emulated_attach_ioas(struct vdpa_device *vdpa, u32 *pt_id);
+int vdpa_iommufd_emulated_detach_ioas(struct vdpa_device *vdpa);
+
+#endif
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 5/8] vdpa_sim :Add support for iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Add new vdpa_config_ops function to support iommufd

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 76d41058add9..9400ec32ec41 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -762,6 +762,10 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
 	.bind_mm		= vdpasim_bind_mm,
 	.unbind_mm		= vdpasim_unbind_mm,
 	.free                   = vdpasim_free,
+	.bind_iommufd           = vdpa_iommufd_emulated_bind,
+	.unbind_iommufd         = vdpa_iommufd_emulated_unbind,
+	.attach_ioas            = vdpa_iommufd_emulated_attach_ioas,
+	.detach_ioas            = vdpa_iommufd_emulated_detach_ioas,
 };
 
 static const struct vdpa_config_ops vdpasim_batch_config_ops = {
@@ -799,6 +803,10 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
 	.bind_mm		= vdpasim_bind_mm,
 	.unbind_mm		= vdpasim_unbind_mm,
 	.free                   = vdpasim_free,
+	.bind_iommufd        = vdpa_iommufd_emulated_bind,
+	.unbind_iommufd        = vdpa_iommufd_emulated_unbind,
+	.attach_ioas           = vdpa_iommufd_emulated_attach_ioas,
+	.detach_ioas           = vdpa_iommufd_emulated_detach_ioas,
 };
 
 MODULE_VERSION(DRV_VERSION);
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 7/8] vp_vdpa::Add support for iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Add new vdpa_config_ops function to support iommufd

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vdpa/virtio_pci/vp_vdpa.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c
index 281287fae89f..dd2c372d36a6 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -460,6 +460,10 @@ static const struct vdpa_config_ops vp_vdpa_ops = {
 	.set_config	= vp_vdpa_set_config,
 	.set_config_cb  = vp_vdpa_set_config_cb,
 	.get_vq_irq	= vp_vdpa_get_vq_irq,
+	.bind_iommufd = vdpa_iommufd_physical_bind,
+	.unbind_iommufd = vdpa_iommufd_physical_unbind,
+	.attach_ioas = vdpa_iommufd_physical_attach_ioas,
+	.detach_ioas = vdpa_iommufd_physical_detach_ioas,
 };
 
 static void vp_vdpa_free_irq_vectors(void *data)
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 6/8] vdpa: change the map/unmap process to support iommufd
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Add the check for iommufd_ictx,If vdpa don't have the iommufd_ictx
then will use the Legacy iommu domain pathway

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/vdpa.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index dfaddd833364..0e2dba59e1ce 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1067,9 +1067,6 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct vhost_iotlb *iotlb,
 		/* Legacy iommu domain pathway without IOMMUFD */
 		r = iommu_map(v->domain, iova, pa, size,
 			      perm_to_iommu_flags(perm), GFP_KERNEL);
-	} else {
-		r = iommu_map(v->domain, iova, pa, size,
-			      perm_to_iommu_flags(perm), GFP_KERNEL);
 	}
 	if (r) {
 		vhost_iotlb_del_range(iotlb, iova, iova + size - 1);
@@ -1095,8 +1092,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v,
 	if (ops->set_map) {
 		if (!v->in_batch)
 			ops->set_map(vdpa, asid, iotlb);
+	} else if (!vdpa->iommufd_ictx) {
+		/* Legacy iommu domain pathway without IOMMUFD */
+		iommu_unmap(v->domain, iova, size);
 	}
-
 }
 
 static int vhost_vdpa_va_map(struct vhost_vdpa *v,
@@ -1149,7 +1148,36 @@ static int vhost_vdpa_va_map(struct vhost_vdpa *v,
 
 	return ret;
 }
+#if 0
+int vhost_pin_pages(struct vdpa_device *device, dma_addr_t iova, int npage,
+		    int prot, struct page **pages)
+{
+	if (!pages || !npage)
+		return -EINVAL;
+	//if (!device->config->dma_unmap)
+	//return -EINVAL;
+
+	if (0) { //device->iommufd_access) {
+		int ret;
+
+		if (iova > ULONG_MAX)
+			return -EINVAL;
 
+		ret = iommufd_access_pin_pages(
+			device->iommufd_access, iova, npage * PAGE_SIZE, pages,
+			(prot & IOMMU_WRITE) ? IOMMUFD_ACCESS_RW_WRITE : 0);
+		if (ret) {
+
+			return ret;
+		}
+
+		return npage;
+	} else {
+		return pin_user_pages(iova, npage, prot, pages);
+	}
+	return -EINVAL;
+}
+#endif
 static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 			     struct vhost_iotlb *iotlb,
 			     u64 iova, u64 size, u64 uaddr, u32 perm)
@@ -1418,9 +1446,13 @@ static void vhost_vdpa_free_domain(struct vhost_vdpa *v)
 	struct device *dma_dev = vdpa_get_dma_dev(vdpa);
 
 	if (v->domain) {
-		iommu_detach_device(v->domain, dma_dev);
+		if (!vdpa->iommufd_ictx) {
+			iommu_detach_device(v->domain, dma_dev);
+		}
 		iommu_domain_free(v->domain);
 	}
+	if (vdpa->iommufd_ictx)
+		vdpa_iommufd_unbind(vdpa);
 
 	v->domain = NULL;
 }
@@ -1645,6 +1677,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
 	}
 
 	atomic_set(&v->opened, 0);
+	atomic_set(&vdpa->iommufd_users, 0);
 	v->minor = minor;
 	v->vdpa = vdpa;
 	v->nvqs = vdpa->nvqs;
-- 
2.34.3


^ permalink raw reply related

* [RFC v1 8/8] iommu: expose the function iommu_device_use_default_domain
From: Cindy Lu @ 2023-11-03 17:16 UTC (permalink / raw)
  To: lulu, jasowang, mst, yi.l.liu, jgg, linux-kernel, virtualization,
	netdev
In-Reply-To: <20231103171641.1703146-1-lulu@redhat.com>

Expose the function iommu_device_use_default_domain() and
iommu_device_unuse_default_domain(),
While vdpa bind the iommufd device and detach the iommu device,
vdpa need to call the function
iommu_device_unuse_default_domain() to release the owner

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3bfc56df4f78..987cbf8c9a87 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3164,6 +3164,7 @@ int iommu_device_use_default_domain(struct device *dev)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_device_use_default_domain);
 
 /**
  * iommu_device_unuse_default_domain() - Device driver stops handling device
@@ -3187,6 +3188,7 @@ void iommu_device_unuse_default_domain(struct device *dev)
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
 }
+EXPORT_SYMBOL_GPL(iommu_device_unuse_default_domain);
 
 static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
 {
-- 
2.34.3


^ permalink raw reply related

* Re: [PATCH net-next 02/19] netfilter: nft_set_rbtree: prefer sync gc to async worker
From: Simon Horman @ 2023-11-03 17:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw
In-Reply-To: <20231025212555.132775-3-pablo@netfilter.org>

On Wed, Oct 25, 2023 at 11:25:38PM +0200, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> There is no need for asynchronous garbage collection, rbtree inserts
> can only happen from the netlink control plane.
> 
> We already perform on-demand gc on insertion, in the area of the
> tree where the insertion takes place, but we don't do a full tree
> walk there for performance reasons.
> 
> Do a full gc walk at the end of the transaction instead and
> remove the async worker.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

...

> @@ -515,11 +523,7 @@ static void nft_rbtree_remove(const struct net *net,
>  	struct nft_rbtree *priv = nft_set_priv(set);
>  	struct nft_rbtree_elem *rbe = elem->priv;
>  
> -	write_lock_bh(&priv->lock);
> -	write_seqcount_begin(&priv->count);
> -	rb_erase(&rbe->node, &priv->root);
> -	write_seqcount_end(&priv->count);
> -	write_unlock_bh(&priv->lock);
> +	nft_rbtree_erase(priv, rbe);
>  }
>  
>  static void nft_rbtree_activate(const struct net *net,
> @@ -613,45 +617,40 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
>  	read_unlock_bh(&priv->lock);
>  }
>  
> -static void nft_rbtree_gc(struct work_struct *work)
> +static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set,
> +				 struct nft_rbtree *priv,
> +				 struct nft_rbtree_elem *rbe)
>  {
> +	struct nft_set_elem elem = {
> +		.priv	= rbe,
> +	};
> +
> +	nft_setelem_data_deactivate(net, set, &elem);
> +	nft_rbtree_erase(priv, rbe);
> +}
> +
> +static void nft_rbtree_gc(struct nft_set *set)
> +{
> +	struct nft_rbtree *priv = nft_set_priv(set);
>  	struct nft_rbtree_elem *rbe, *rbe_end = NULL;
>  	struct nftables_pernet *nft_net;

Hi Florian and Pablo,

I understand that this patch has been accepted upstream,
and that by implication this feedback is rather slow,
but I noticed that with this patch nft_net is now
set but otherwise unused in this function.

As flagged by clang-16 and gcc-13 W=1 builds.

> -	struct nft_rbtree *priv;
> +	struct rb_node *node, *next;
>  	struct nft_trans_gc *gc;
> -	struct rb_node *node;
> -	struct nft_set *set;
> -	unsigned int gc_seq;
>  	struct net *net;
>  
> -	priv = container_of(work, struct nft_rbtree, gc_work.work);
>  	set  = nft_set_container_of(priv);
>  	net  = read_pnet(&set->net);
>  	nft_net = nft_pernet(net);
> -	gc_seq  = READ_ONCE(nft_net->gc_seq);
>  
> -	if (nft_set_gc_is_pending(set))
> -		goto done;
> -
> -	gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL);
> +	gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
>  	if (!gc)
> -		goto done;
> -
> -	read_lock_bh(&priv->lock);
> -	for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
> +		return;
>  
> -		/* Ruleset has been updated, try later. */
> -		if (READ_ONCE(nft_net->gc_seq) != gc_seq) {
> -			nft_trans_gc_destroy(gc);
> -			gc = NULL;
> -			goto try_later;
> -		}
> +	for (node = rb_first(&priv->root); node ; node = next) {
> +		next = rb_next(node);
>  
>  		rbe = rb_entry(node, struct nft_rbtree_elem, node);
>  
> -		if (nft_set_elem_is_dead(&rbe->ext))
> -			goto dead_elem;
> -
>  		/* elements are reversed in the rbtree for historical reasons,
>  		 * from highest to lowest value, that is why end element is
>  		 * always visited before the start element.

...

^ permalink raw reply

* Re: [PATCH] net: phy: broadcom: Wire suspend/resume for BCM54612E
From: Marco von Rosenberg @ 2023-11-03 17:37 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli
  Cc: Florian Fainelli, Broadcom internal kernel review list,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Marco von Rosenberg
In-Reply-To: <999020b2-692b-4582-8ca0-e19c7b45ee92@gmail.com>

On Friday, November 3, 2023 4:39:55 AM CET Florian Fainelli wrote:
> We have an unconditional call to __phy_resume() in phy_start() and we
> should always have a call to phy_start() regardless of the path though
> you have a point Andrew that we should ensure that by the time
> phy_init_hw() is called we have taken the device out of IDDQ-SR.
> 
> > I agree with all of your points. This is just one way which happens to
> > solve this specific problem. Of course it might be asymmetric to see the
> > patch as a solution to my problem. However is there anything
> > fundamentally wrong with adding suspend/resume callbacks? I see some
> > other drivers having these callbacks defined and some not (it seems a bit
> > inconsistent throughout the drivers in broadcom.c to be honest).
> > 
> > I'm wondering if I should just omit this whole "motivation" paragraph in
> > the commit message and just use the commit message of commit 38b6a9073007
> > ("net: phy: broadcom: Wire suspend/resume for BCM50610 and BCM50610M") as
> > a template. I mean, regardless of my motivation, I would say it makes
> > sense for this PHY to support suspend and resume.
> 
> I would remove the motivation aspect from the paragraph and we could
> also improve the driver a bit to ensure that IDDQ-SR is disabled upon
> config_init(). Other than that your patch is just fine with me. Can you
> re-submit in a few days when net-next opens again?

Ok, I'll re-submit the patch when net-next is open again with an updated 
commit message. And I agree, disabling IDDQ-SR in config_init() would make 
sense for a future patch since this would fix this potential issue also for 
other PHYs.

	Marco



^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Linus Walleij @ 2023-11-03 17:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Luiz Angelo Daros de Luca, netdev, alsi, andrew, vivien.didelot,
	f.fainelli, davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <20231102155521.2yo5qpugdhkjy22x@skbuf>

On Thu, Nov 2, 2023 at 4:55 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> To be clear, something like this is what you mean, right?

Hey, it's 90% done already! :D
And we do away with all the hopeless IMPLY stuff.

> +realtek-interface-objs                 := realtek-interface-common.o
> +ifdef CONFIG_NET_DSA_REALTEK_MDIO
> +realtek-interface-objs                 += realtek-mdio.o
> +endif
> +ifdef CONFIG_NET_DSA_REALTEK_SMI
> +realtek-interface-objs                 += realtek-smi.o
> +endif

I would try to do

realtek-interface-objs-y                := realtek-interface-common.o
realtek-interface-objs-$(CONFIG_NET_DSA_REALTEK_MDIO) += realtek-mdio.o
realtek-interface-objs-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o
realtek-interface-objs := $(realtek-interface-objs-y)

I think it's equivalent just more compact?

Other than that it looks like what I would have done myself.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [RFC v1 8/8] iommu: expose the function iommu_device_use_default_domain
From: Jason Gunthorpe @ 2023-11-03 17:37 UTC (permalink / raw)
  To: Cindy Lu; +Cc: jasowang, mst, yi.l.liu, linux-kernel, virtualization, netdev
In-Reply-To: <20231103171641.1703146-9-lulu@redhat.com>

On Sat, Nov 04, 2023 at 01:16:41AM +0800, Cindy Lu wrote:
> Expose the function iommu_device_use_default_domain() and
> iommu_device_unuse_default_domain(),
> While vdpa bind the iommufd device and detach the iommu device,
> vdpa need to call the function
> iommu_device_unuse_default_domain() to release the owner

Definately not. You need to set the driver_managed_dma flag.

Jason

^ permalink raw reply

* RE: [PATCH RFC net-next v2 0/2] selftests/dpll: DPLL subsystem integration tests
From: Michalik, Michal @ 2023-11-03 17:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev@vger.kernel.org, vadim.fedorenko@linux.dev,
	Kubalewski, Arkadiusz, jonathan.lemon@gmail.com,
	pabeni@redhat.com, poros, Olech, Milena, mschmidt,
	linux-clk@vger.kernel.org, bvanassche@acm.org, kuba@kernel.org,
	davem@davemloft.net, edumazet@google.com
In-Reply-To: <ZUDNFHlO4GtA/UAh@nanopsycho>

On 31 October 2023 10:47 AM CET, Jiri Pirko wrote:

Jiri, much thanks for taking time and doing great review - I have learned
a lot, especially in patch 1/2 having the netdevsim DPLL implementation.

I will do my best to post the RFC v3 early next week.

> 
> Mon, Oct 30, 2023 at 05:53:24PM CET, michal.michalik@intel.com wrote:
>>The recently merged common DPLL interface discussed on a newsletter[1]
> 
> "newsletter"? Sounds a bit odd to me :)
> 

Yeah, you are right - will fix.

>>is introducing new, complex subsystem which requires proper integration
>>testing - this patch adds core for such framework, as well as the
> 
> "Patchset" perhaps? Also, what do you mean by "core"? The sentence
> sounds a bit weird to me.
> 

Will work to improve this.

>>initial test cases. Framework does not require neither any special
>>hardware nor any special system architecture.
>>
>>To properly test the DPLL subsystem this patch adds fake DPLL devices and it's
> 
> For patch desctiption, please stay within 72cols.
> Also, "it's" is most probably wrong in this sentence.
> 

Improve this as well.

>>pins implementation to netdevsim. Creating netdevsim devices and adding ports
>>to it register new DPLL devices and pins. First port of each netdevsim device
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This sentence does not make
> sense to me. Pehaps rephrase a bit?
> 
> 
>>acts as a entitiy which registers two DPLL devices: EEC and PPS DPLLs. First
> 
> typo: "entitiy"

Ok.

>>port also register the common pins: PPS and GNSS. Additionally each port
>>register also RCLK (recovered clock) pin for itself. That allow us to check
>>mutliple scenarios which might be problematic in real implementations (like
>>different ordering etc.)
>>
>>Patch adds few helper scripts, which are:
>>1) tools/testing/selftests/dpll/run_dpll_tests.sh
> 
> Please make this part of
> tools/testing/selftests/drivers/net/netdevsim/
> No special threat of dpll needed.
> 

That make a lot of sense to move it there, thanks for noticing that.

>>    Script is checking for all dependencies, creates temporary
>>    environment, installs required libraries and run all tests - can be
>>    used standalone
>>2) tools/testing/selftests/dpll/ynlfamilyhandler.py˙
>>    Library for easier ynl use in the pytest framework - can be used
>>    standalone
>>
>>[1] https://lore.kernel.org/netdev/169494842736.21621.10730860855645661664.git-patchwork-notify@kernel.org/
>>
>>Changelog:
>>v1 -> v2:
>>- moved from separate module to implementation in netdevsim
>>
>>Michal Michalik (2):
>>  netdevsim: implement DPLL for subsystem selftests
>>  selftests/dpll: add DPLL system integration selftests
>>
>> drivers/net/Kconfig                              |   1 +
>> drivers/net/netdevsim/Makefile                   |   2 +-
>> drivers/net/netdevsim/dpll.c                     | 438 +++++++++++++++++++++++
>> drivers/net/netdevsim/dpll.h                     |  81 +++++
>> drivers/net/netdevsim/netdev.c                   |  20 ++
>> drivers/net/netdevsim/netdevsim.h                |   4 +
>> tools/testing/selftests/Makefile                 |   1 +
>> tools/testing/selftests/dpll/Makefile            |   8 +
>> tools/testing/selftests/dpll/__init__.py         |   0
>> tools/testing/selftests/dpll/config              |   2 +
>> tools/testing/selftests/dpll/consts.py           |  34 ++
>> tools/testing/selftests/dpll/dpll_utils.py       | 109 ++++++
>> tools/testing/selftests/dpll/requirements.txt    |   3 +
>> tools/testing/selftests/dpll/run_dpll_tests.sh   |  75 ++++
>> tools/testing/selftests/dpll/test_dpll.py        | 414 +++++++++++++++++++++
>> tools/testing/selftests/dpll/ynlfamilyhandler.py |  49 +++
>> 16 files changed, 1240 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/netdevsim/dpll.c
>> create mode 100644 drivers/net/netdevsim/dpll.h
>> create mode 100644 tools/testing/selftests/dpll/Makefile
>> create mode 100644 tools/testing/selftests/dpll/__init__.py
>> create mode 100644 tools/testing/selftests/dpll/config
>> create mode 100644 tools/testing/selftests/dpll/consts.py
>> create mode 100644 tools/testing/selftests/dpll/dpll_utils.py
>> create mode 100644 tools/testing/selftests/dpll/requirements.txt
>> create mode 100755 tools/testing/selftests/dpll/run_dpll_tests.sh
>> create mode 100644 tools/testing/selftests/dpll/test_dpll.py
>> create mode 100644 tools/testing/selftests/dpll/ynlfamilyhandler.py
>>
>>-- 
>>2.9.5
>>
>>base-commit: 55c900477f5b3897d9038446f72a281cae0efd86

^ permalink raw reply

* RE: [PATCH RFC net-next v2 1/2] netdevsim: implement DPLL for subsystem selftests
From: Michalik, Michal @ 2023-11-03 17:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev@vger.kernel.org, vadim.fedorenko@linux.dev,
	Kubalewski, Arkadiusz, jonathan.lemon@gmail.com,
	pabeni@redhat.com, poros, Olech, Milena, mschmidt,
	linux-clk@vger.kernel.org, bvanassche@acm.org, kuba@kernel.org,
	davem@davemloft.net, edumazet@google.com
In-Reply-To: <ZUDeJiafjggGvLU/@nanopsycho>

On 31 October 2023 12:00 PM CET, Jiri Pirko wrote:
> 
> Mon, Oct 30, 2023 at 05:53:25PM CET, michal.michalik@intel.com wrote:
>>DPLL subsystem integration tests require a module which mimics the
>>behavior of real driver which supports DPLL hardware. To fully test the
>>subsystem the netdevsim is amended with DPLL implementation.
>>
>>Signed-off-by: Michal Michalik <michal.michalik@intel.com>
>>---
>> drivers/net/Kconfig               |   1 +
>> drivers/net/netdevsim/Makefile    |   2 +-
>> drivers/net/netdevsim/dpll.c      | 438 ++++++++++++++++++++++++++++++++++++++
>> drivers/net/netdevsim/dpll.h      |  81 +++++++
>> drivers/net/netdevsim/netdev.c    |  20 ++
>> drivers/net/netdevsim/netdevsim.h |   4 +
>> 6 files changed, 545 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/netdevsim/dpll.c
>> create mode 100644 drivers/net/netdevsim/dpll.h
>>
>>diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>>index af0da4b..633ec89 100644
>>--- a/drivers/net/Kconfig
>>+++ b/drivers/net/Kconfig
>>@@ -626,6 +626,7 @@ config NETDEVSIM
>> 	depends on PSAMPLE || PSAMPLE=n
>> 	depends on PTP_1588_CLOCK_MOCK || PTP_1588_CLOCK_MOCK=n
>> 	select NET_DEVLINK
>>+	select DPLL
>> 	help
>> 	  This driver is a developer testing tool and software model that can
>> 	  be used to test various control path networking APIs, especially
>>diff --git a/drivers/net/netdevsim/Makefile b/drivers/net/netdevsim/Makefile
>>index f8de93b..f338ffb 100644
>>--- a/drivers/net/netdevsim/Makefile
>>+++ b/drivers/net/netdevsim/Makefile
>>@@ -3,7 +3,7 @@
>> obj-$(CONFIG_NETDEVSIM) += netdevsim.o
>> 
>> netdevsim-objs := \
>>-	netdev.o dev.o ethtool.o fib.o bus.o health.o hwstats.o udp_tunnels.o
>>+	netdev.o dev.o ethtool.o fib.o bus.o health.o hwstats.o udp_tunnels.o dpll.o
>> 
>> ifeq ($(CONFIG_BPF_SYSCALL),y)
>> netdevsim-objs += \
>>diff --git a/drivers/net/netdevsim/dpll.c b/drivers/net/netdevsim/dpll.c
>>new file mode 100644
>>index 0000000..050f68e
>>--- /dev/null
>>+++ b/drivers/net/netdevsim/dpll.c
>>@@ -0,0 +1,438 @@
>>+// SPDX-License-Identifier: GPL-2.0
>>+/*
>>+ * Copyright (c) 2023, Intel Corporation.
>>+ * Author: Michal Michalik <michal.michalik@intel.com>
>>+ */
>>+#include "dpll.h"
>>+
>>+static struct dpll_pin_properties *
>>+create_pin_properties(const char *label, enum dpll_pin_type type,
> 
> Please make sure to follow the namespace prefix notation in your patch
> everywhere, functions, structs, defines.
> "nsim_"
> 

Thanks - I overlooked that, will change.

>>+		      unsigned long caps, u32 freq_supp_num, u64 fmin, u64 fmax)
>>+{
>>+	struct dpll_pin_frequency *freq_supp;
>>+	struct dpll_pin_properties *pp;
>>+
>>+	pp = kzalloc(sizeof(*pp), GFP_KERNEL);
>>+	if (!pp)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	freq_supp = kzalloc(sizeof(*freq_supp), GFP_KERNEL);
>>+	if (!freq_supp)
>>+		goto err;
>>+	*freq_supp =
>>+		(struct dpll_pin_frequency)DPLL_PIN_FREQUENCY_RANGE(fmin, fmax);
> 
> Drop the cast.
> 

Without the cast it does not compile.

>>+
>>+	pp->board_label = kasprintf(GFP_KERNEL, "%s_brd", label);
>>+	pp->panel_label = kasprintf(GFP_KERNEL, "%s_pnl", label);
>>+	pp->package_label = kasprintf(GFP_KERNEL, "%s_pcg", label);
>>+	pp->freq_supported_num = freq_supp_num;
>>+	pp->freq_supported = freq_supp;
>>+	pp->capabilities = caps;
>>+	pp->type = type;
>>+
>>+	return pp;
>>+err:
>>+	kfree(pp);
>>+	return ERR_PTR(-ENOMEM);
>>+}
>>+
>>+static struct dpll_pd *create_dpll_pd(int temperature, enum dpll_mode mode)
>>+{
>>+	struct dpll_pd *pd;
>>+
>>+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
>>+	if (!pd)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	pd->temperature = temperature;
>>+	pd->mode = mode;
>>+
>>+	return pd;
>>+}
>>+
>>+static struct pin_pd *create_pin_pd(u64 frequency, u32 prio,
>>+				    enum dpll_pin_direction direction)
>>+{
>>+	struct pin_pd *pd;
>>+
>>+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
>>+	if (!pd)
>>+		return ERR_PTR(-ENOMEM);
>>+
>>+	pd->state_dpll = DPLL_PIN_STATE_DISCONNECTED;
>>+	pd->state_pin = DPLL_PIN_STATE_DISCONNECTED;
>>+	pd->frequency = frequency;
>>+	pd->direction = direction;
>>+	pd->prio = prio;
>>+
>>+	return pd;
>>+}
>>+
>>+static int
>>+dds_ops_mode_get(const struct dpll_device *dpll, void *dpll_priv,
>>+		 enum dpll_mode *mode, struct netlink_ext_ack *extack)
>>+{
>>+	*mode = ((struct dpll_pd *)(dpll_priv))->mode;
> 
> Please have variable assigned by dpll_priv instead of this.
> Also, fix in the rest of the code.
> 

Please excuse me, I don't think I understand what you mean. Do you mean I should
create a local variable struct dpll_pd and use it for assignment like that?
	```
	struct dpll_pd *pd = dpll_priv;
	*mode = pd->mode;
	```

>>+	return 0;
>>+};
>>+
>>+static bool
>>+dds_ops_mode_supported(const struct dpll_device *dpll, void *dpll_priv,
>>+		       const enum dpll_mode mode,
>>+		       struct netlink_ext_ack *extack)
>>+{
>>+	return true;
>>+};
>>+
>>+static int
>>+dds_ops_lock_status_get(const struct dpll_device *dpll, void *dpll_priv,
>>+			enum dpll_lock_status *status,
>>+			struct netlink_ext_ack *extack)
>>+{
>>+	if (((struct dpll_pd *)dpll_priv)->mode == DPLL_MODE_MANUAL)
>>+		*status = DPLL_LOCK_STATUS_LOCKED;
>>+	else
>>+		*status = DPLL_LOCK_STATUS_UNLOCKED;
> 
> I don't understand the logic of this function. According to mode you
> return if status is locked or not? For this, you should expose a debugfs
> knob so the user can emulate changes of the HW.
> 

Assumption was, we are testing the API not trying to simulate the actual DPLL HW
behavior. I was going for, the "simpler the better" principle. But since
somebody else might want to use it differently and test more complex scenarios,
I will add debugfs entries for this interaction.

>>+	return 0;
>>+};
>>+
>>+static int
>>+dds_ops_temp_get(const struct dpll_device *dpll, void *dpll_priv, s32 *temp,
>>+		 struct netlink_ext_ack *extack)
>>+{
>>+	*temp = ((struct dpll_pd *)dpll_priv)->temperature;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_frequency_set(const struct dpll_pin *pin, void *pin_priv,
>>+		  const struct dpll_device *dpll, void *dpll_priv,
>>+		  const u64 frequency, struct netlink_ext_ack *extack)
>>+{
>>+	((struct pin_pd *)pin_priv)->frequency = frequency;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_frequency_get(const struct dpll_pin *pin, void *pin_priv,
>>+		  const struct dpll_device *dpll, void *dpll_priv,
>>+		  u64 *frequency, struct netlink_ext_ack *extack)
>>+{
>>+	*frequency = ((struct pin_pd *)pin_priv)->frequency;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_direction_set(const struct dpll_pin *pin, void *pin_priv,
>>+		  const struct dpll_device *dpll, void *dpll_priv,
>>+		  const enum dpll_pin_direction direction,
>>+		  struct netlink_ext_ack *extack)
>>+{
>>+	((struct pin_pd *)pin_priv)->direction = direction;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_direction_get(const struct dpll_pin *pin, void *pin_priv,
>>+		  const struct dpll_device *dpll, void *dpll_priv,
>>+		  enum dpll_pin_direction *direction,
>>+		  struct netlink_ext_ack *extack)
>>+{
>>+	*direction = ((struct pin_pd *)pin_priv)->direction;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_state_on_pin_get(const struct dpll_pin *pin, void *pin_priv,
>>+		     const struct dpll_pin *parent_pin, void *parent_priv,
>>+		     enum dpll_pin_state *state,
>>+		     struct netlink_ext_ack *extack)
>>+{
>>+	*state = ((struct pin_pd *)pin_priv)->state_pin;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_state_on_dpll_get(const struct dpll_pin *pin, void *pin_priv,
>>+		      const struct dpll_device *dpll, void *dpll_priv,
>>+		      enum dpll_pin_state *state,
>>+		      struct netlink_ext_ack *extack)
>>+{
>>+	*state = ((struct pin_pd *)pin_priv)->state_dpll;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_state_on_pin_set(const struct dpll_pin *pin, void *pin_priv,
>>+		     const struct dpll_pin *parent_pin, void *parent_priv,
>>+		     const enum dpll_pin_state state,
>>+		     struct netlink_ext_ack *extack)
>>+{
>>+	((struct pin_pd *)pin_priv)->state_pin = state;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_state_on_dpll_set(const struct dpll_pin *pin, void *pin_priv,
>>+		      const struct dpll_device *dpll, void *dpll_priv,
>>+		      const enum dpll_pin_state state,
>>+		      struct netlink_ext_ack *extack)
>>+{
>>+	((struct pin_pd *)pin_priv)->state_dpll = state;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_prio_get(const struct dpll_pin *pin, void *pin_priv,
>>+	     const struct dpll_device *dpll, void *dpll_priv,
>>+	     u32 *prio, struct netlink_ext_ack *extack)
>>+{
>>+	*prio = ((struct pin_pd *)pin_priv)->prio;
>>+	return 0;
>>+};
>>+
>>+static int
>>+pin_prio_set(const struct dpll_pin *pin, void *pin_priv,
>>+	     const struct dpll_device *dpll, void *dpll_priv,
>>+	     const u32 prio, struct netlink_ext_ack *extack)
>>+{
>>+	((struct pin_pd *)pin_priv)->prio = prio;
>>+	return 0;
>>+};
>>+
>>+static void
>>+free_pin_properties(struct dpll_pin_properties *pp)
>>+{
>>+	if (pp) {
> 
> How exactly pp could be null?
> 

In normal circumstances it seems it cannot, I will remove the check.

>>+		kfree(pp->board_label);
>>+		kfree(pp->panel_label);
>>+		kfree(pp->package_label);
>>+		kfree(pp->freq_supported);
>>+		kfree(pp);
>>+	}
>>+}
>>+
>>+static struct dpll_device_ops dds_ops = {
>>+	.mode_get = dds_ops_mode_get,
>>+	.mode_supported = dds_ops_mode_supported,
>>+	.lock_status_get = dds_ops_lock_status_get,
>>+	.temp_get = dds_ops_temp_get,
>>+};
>>+
>>+static struct dpll_pin_ops pin_ops = {
>>+	.frequency_set = pin_frequency_set,
>>+	.frequency_get = pin_frequency_get,
>>+	.direction_set = pin_direction_set,
>>+	.direction_get = pin_direction_get,
>>+	.state_on_pin_get = pin_state_on_pin_get,
>>+	.state_on_dpll_get = pin_state_on_dpll_get,
>>+	.state_on_pin_set = pin_state_on_pin_set,
>>+	.state_on_dpll_set = pin_state_on_dpll_set,
>>+	.prio_get = pin_prio_get,
>>+	.prio_set = pin_prio_set,
>>+};
>>+
>>+int nsim_dpll_init_owner(struct nsim_dpll_info *dpll, int devid)
>>+{
>>+	/* Create EEC DPLL */
>>+	dpll->dpll_e = dpll_device_get(DPLLS_CLOCK_ID + devid, EEC_DPLL_DEV,
> 
> "#define DPLLS_CLOCK_ID 234"
> 
> You guys always come up with some funky way of making up ids and names.
> Why this can't be randomly generated u64?
> 

As mentioned, "the simpler the better". Having it randomly generated implies
need of exposing this randomly generated number to tests. Test need to be able
unambiguously get this clock. But since I will be adding debugfs entries to
change the lock state, I can also add one for returning clock id. I need that
because I'm using more devices per one netdevsim module.

>>+				       THIS_MODULE);
>>+	if (IS_ERR(dpll->dpll_e))
>>+		goto dpll_e;
>>+	dpll->dpll_e_pd = create_dpll_pd(EEC_DPLL_TEMPERATURE,
>>+					 DPLL_MODE_AUTOMATIC);
>>+	if (IS_ERR(dpll->dpll_e))
>>+		goto dpll_e_pd;
>>+	if (dpll_device_register(dpll->dpll_e, DPLL_TYPE_EEC, &dds_ops,
>>+				 (void *)dpll->dpll_e_pd))
> 
> Please avoid pointless casts. (void *) cast for arg of type (void *) are
> especially pointless. Please make sure to fix this in the rest of the
> code as well.
> 

Thanks, will do.

>>+		goto e_reg;
>>+
>>+	/* Create PPS DPLL */
>>+	dpll->dpll_p = dpll_device_get(DPLLS_CLOCK_ID + devid, PPS_DPLL_DEV,
>>+				       THIS_MODULE);
>>+	if (IS_ERR(dpll->dpll_p))
>>+		goto dpll_p;
>>+	dpll->dpll_p_pd = create_dpll_pd(PPS_DPLL_TEMPERATURE,
>>+					 DPLL_MODE_MANUAL);
>>+	if (IS_ERR(dpll->dpll_p_pd))
>>+		goto dpll_p_pd;
>>+	if (dpll_device_register(dpll->dpll_p, DPLL_TYPE_PPS, &dds_ops,
> 
> You always use "int err" to store the return value of function calls
> returning 0/-EXXX like this one.
> 

Lesson learned, will fix.

>>+				 (void *)dpll->dpll_p_pd))
>>+		goto p_reg;
>>+
>>+	/* Create first pin (GNSS) */
>>+	dpll->pp_gnss = create_pin_properties("GNSS", DPLL_PIN_TYPE_GNSS,
>>+					      PIN_GNSS_CAPABILITIES,
>>+					      1, DPLL_PIN_FREQUENCY_1_HZ,
>>+					      DPLL_PIN_FREQUENCY_1_HZ);
>>+	if (IS_ERR(dpll->pp_gnss))
>>+		goto pp_gnss;
>>+	dpll->p_gnss = dpll_pin_get(DPLLS_CLOCK_ID + devid, PIN_GNSS,
>>+				    THIS_MODULE,
>>+				    dpll->pp_gnss);
>>+	if (IS_ERR(dpll->p_gnss))
>>+		goto p_gnss;
>>+	dpll->p_gnss_pd = create_pin_pd(DPLL_PIN_FREQUENCY_1_HZ,
>>+					PIN_GNSS_PRIORITY,
>>+					DPLL_PIN_DIRECTION_INPUT);
>>+	if (IS_ERR(dpll->p_gnss_pd))
>>+		goto p_gnss_pd;
>>+	if (dpll_pin_register(dpll->dpll_e, dpll->p_gnss, &pin_ops,
>>+			      (void *)dpll->p_gnss_pd))
>>+		goto e_gnss_reg;
>>+
>>+	/* Create second pin (PPS) */
>>+	dpll->pp_pps = create_pin_properties("PPS", DPLL_PIN_TYPE_EXT,
>>+					     PIN_PPS_CAPABILITIES,
>>+					     1, DPLL_PIN_FREQUENCY_1_HZ,
>>+					     DPLL_PIN_FREQUENCY_1_HZ);
>>+	if (IS_ERR(dpll->pp_pps))
>>+		goto pp_pps;
>>+	dpll->p_pps = dpll_pin_get(DPLLS_CLOCK_ID + devid, PIN_PPS, THIS_MODULE,
>>+				   dpll->pp_pps);
>>+	if (IS_ERR(dpll->p_pps))
>>+		goto p_pps;
>>+	dpll->p_pps_pd = create_pin_pd(DPLL_PIN_FREQUENCY_1_HZ,
>>+				       PIN_PPS_PRIORITY,
>>+				       DPLL_PIN_DIRECTION_INPUT);
>>+	if (IS_ERR(dpll->p_pps_pd))
>>+		goto p_pps_pd;
>>+	if (dpll_pin_register(dpll->dpll_e, dpll->p_pps, &pin_ops,
>>+			      (void *)dpll->p_pps_pd))
>>+		goto e_pps_reg;
>>+	if (dpll_pin_register(dpll->dpll_p, dpll->p_pps, &pin_ops,
>>+			      (void *)dpll->p_pps_pd))
>>+		goto p_pps_reg;
>>+
>>+	return 0;
>>+
>>+p_pps_reg:
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_pps, &pin_ops,
>>+			    (void *)dpll->p_pps_pd);
>>+e_pps_reg:
>>+	kfree(dpll->p_pps_pd);
>>+p_pps_pd:
>>+	dpll_pin_put(dpll->p_pps);
>>+p_pps:
>>+	free_pin_properties(dpll->pp_pps);
>>+pp_pps:
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_gnss, &pin_ops,
>>+			    (void *)dpll->p_gnss_pd);
>>+e_gnss_reg:
>>+	kfree(dpll->p_gnss_pd);
>>+p_gnss_pd:
>>+	dpll_pin_put(dpll->p_gnss);
>>+p_gnss:
>>+	free_pin_properties(dpll->pp_gnss);
>>+pp_gnss:
>>+	dpll_device_unregister(dpll->dpll_p, &dds_ops, (void *)dpll->dpll_p_pd);
>>+p_reg:
>>+	kfree(dpll->dpll_p_pd);
>>+dpll_p_pd:
>>+	dpll_device_put(dpll->dpll_p);
>>+dpll_p:
>>+	dpll_device_unregister(dpll->dpll_e, &dds_ops, (void *)dpll->dpll_e_pd);
>>+e_reg:
>>+	kfree(dpll->dpll_e_pd);
>>+dpll_e_pd:
>>+	dpll_device_put(dpll->dpll_e);
>>+dpll_e:
>>+	return -1;
>>+}
>>+
>>+void nsim_dpll_free_owner(struct nsim_dpll_info *dpll)
>>+{
>>+	/* Free GNSS pin */
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_gnss, &pin_ops,
>>+			    (void *)dpll->p_gnss_pd);
>>+	dpll_pin_put(dpll->p_gnss);
>>+	free_pin_properties(dpll->pp_gnss);
>>+	kfree(dpll->p_gnss_pd);
>>+
>>+	/* Free PPS pin */
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_pps, &pin_ops,
>>+			    (void *)dpll->p_pps_pd);
>>+	dpll_pin_unregister(dpll->dpll_p, dpll->p_pps, &pin_ops,
>>+			    (void *)dpll->p_pps_pd);
>>+	dpll_pin_put(dpll->p_pps);
>>+	free_pin_properties(dpll->pp_pps);
>>+	kfree(dpll->p_pps_pd);
>>+
>>+	/* Free DPLL EEC */
>>+	dpll_device_unregister(dpll->dpll_e, &dds_ops, (void *)dpll->dpll_e_pd);
>>+	dpll_device_put(dpll->dpll_e);
>>+	kfree(dpll->dpll_e_pd);
>>+
>>+	/* Free DPLL PPS */
>>+	dpll_device_unregister(dpll->dpll_p, &dds_ops, (void *)dpll->dpll_p_pd);
>>+	dpll_device_put(dpll->dpll_p);
>>+	kfree(dpll->dpll_p_pd);
>>+}
>>+
>>+int nsim_rclk_init(struct nsim_dpll_info *dpll, int devid, unsigned int index)
>>+{
>>+	char *name = kasprintf(GFP_KERNEL, "RCLK_%i", index);
>>+
>>+	/* Get EEC DPLL */
>>+	dpll->dpll_e = dpll_device_get(DPLLS_CLOCK_ID + devid, EEC_DPLL_DEV,
>>+				       THIS_MODULE);
>>+	if (IS_ERR(dpll->dpll_e))
>>+		goto dpll;
>>+
>>+	/* Get PPS DPLL */
>>+	dpll->dpll_p = dpll_device_get(DPLLS_CLOCK_ID + devid, PPS_DPLL_DEV,
>>+				       THIS_MODULE);
>>+	if (IS_ERR(dpll->dpll_p))
>>+		goto dpll;
>>+
>>+	/* Create Recovered clock pin (RCLK) */
>>+	dpll->pp_rclk = create_pin_properties(name,
>>+					      DPLL_PIN_TYPE_SYNCE_ETH_PORT,
>>+					      PIN_RCLK_CAPABILITIES, 1, 1e6,
>>+					      125e6);
>>+	if (IS_ERR(dpll->pp_rclk))
>>+		goto dpll;
>>+	dpll->p_rclk = dpll_pin_get(DPLLS_CLOCK_ID + devid, PIN_RCLK + index,
>>+				    THIS_MODULE, dpll->pp_rclk);
>>+	if (IS_ERR(dpll->p_rclk))
>>+		goto p_rclk;
>>+	dpll->p_rclk_pd = create_pin_pd(DPLL_PIN_FREQUENCY_10_MHZ,
>>+					PIN_RCLK_PRIORITY,
>>+					DPLL_PIN_DIRECTION_INPUT);
>>+	if (IS_ERR(dpll->p_rclk_pd))
>>+		goto p_rclk_pd;
>>+	if (dpll_pin_register(dpll->dpll_e, dpll->p_rclk, &pin_ops,
>>+			      (void *)dpll->p_rclk_pd))
>>+		goto dpll_e_reg;
>>+	if (dpll_pin_register(dpll->dpll_p, dpll->p_rclk, &pin_ops,
>>+			      (void *)dpll->p_rclk_pd))
>>+		goto dpll_p_reg;
>>+
>>+	return 0;
>>+
>>+dpll_p_reg:
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_rclk, &pin_ops,
>>+			    (void *)dpll->p_rclk_pd);
>>+dpll_e_reg:
>>+	kfree(dpll->p_rclk_pd);
>>+p_rclk_pd:
>>+	dpll_pin_put(dpll->p_rclk);
>>+p_rclk:
>>+	free_pin_properties(dpll->pp_rclk);
>>+dpll:
>>+	return -1;
>>+}
>>+
>>+void nsim_rclk_free(struct nsim_dpll_info *dpll)
>>+{
>>+	/* Free RCLK pin */
>>+	dpll_pin_unregister(dpll->dpll_e, dpll->p_rclk, &pin_ops,
>>+			    (void *)dpll->p_rclk_pd);
>>+	dpll_pin_unregister(dpll->dpll_p, dpll->p_rclk, &pin_ops,
>>+			    (void *)dpll->p_rclk_pd);
>>+	dpll_pin_put(dpll->p_rclk);
>>+	free_pin_properties(dpll->pp_rclk);
>>+	kfree(dpll->p_rclk_pd);
>>+	dpll_device_put(dpll->dpll_e);
>>+	dpll_device_put(dpll->dpll_p);
>>+}
>>diff --git a/drivers/net/netdevsim/dpll.h b/drivers/net/netdevsim/dpll.h
>>new file mode 100644
>>index 0000000..17db7f7
>>--- /dev/null
>>+++ b/drivers/net/netdevsim/dpll.h
>>@@ -0,0 +1,81 @@
>>+/* SPDX-License-Identifier: GPL-2.0 */
>>+/*
>>+ * Copyright (c) 2023, Intel Corporation.
>>+ * Author: Michal Michalik <michal.michalik@intel.com>
>>+ */
>>+
>>+#ifndef NSIM_DPLL_H
>>+#define NSIM_DPLL_H
> 
> Why you need a separate header for this? Just put necessary parts in
> netdevsim.h and leave the rest in the .c file.
> 

Good idea, thanks.

>>+
>>+#include <linux/types.h>
>>+#include <linux/netlink.h>
>>+
>>+#include <linux/dpll.h>
>>+#include <uapi/linux/dpll.h>
> 
> Why exactly do you need to include uapi header directly?
> 

You are right - will refactor that.

>>+
>>+#define EEC_DPLL_DEV 0
>>+#define EEC_DPLL_TEMPERATURE 20
>>+#define PPS_DPLL_DEV 1
>>+#define PPS_DPLL_TEMPERATURE 30
>>+#define DPLLS_CLOCK_ID 234
>>+
>>+#define PIN_GNSS 0
>>+#define PIN_GNSS_CAPABILITIES 2 /* DPLL_PIN_CAPS_PRIORITY_CAN_CHANGE */
>>+#define PIN_GNSS_PRIORITY 5
>>+
>>+#define PIN_PPS 1
>>+#define PIN_PPS_CAPABILITIES 7 /* DPLL_PIN_CAPS_DIRECTION_CAN_CHANGE
>>+				* || DPLL_PIN_CAPS_PRIORITY_CAN_CHANGE
>>+				* || DPLL_PIN_CAPS_STATE_CAN_CHANGE
> 
> You are kidding, correct? :)
> 

Not really, it's written directly because the tests are able to read the value
from here (they don't understand DPLL subsystem defines). Since we are changing
most of the code I will try to make the tests access this data differently (e.g.
via debugfs).

>>+				*/
>>+#define PIN_PPS_PRIORITY 6
>>+
>>+#define PIN_RCLK 2
>>+#define PIN_RCLK_CAPABILITIES 6 /* DPLL_PIN_CAPS_PRIORITY_CAN_CHANGE
>>+				 * || DPLL_PIN_CAPS_STATE_CAN_CHANGE
>>+				 */
>>+#define PIN_RCLK_PRIORITY 7
>>+
>>+#define EEC_PINS_NUMBER 3
>>+#define PPS_PINS_NUMBER 2
>>+
>>+struct dpll_pd {
> 
> Have not clue what do you mean by "pd".
> 

I meant "private data", will change this to something more meaningful.

>>+	enum dpll_mode mode;
>>+	int temperature;
>>+};
>>+
>>+struct pin_pd {
>>+	u64 frequency;
>>+	enum dpll_pin_direction direction;
>>+	enum dpll_pin_state state_pin;
>>+	enum dpll_pin_state state_dpll;
>>+	u32 prio;
>>+};
>>+
>>+struct nsim_dpll_info {
> 
> Drop "info".
> 

OK.

>>+	bool owner;
>>+
>>+	struct dpll_device *dpll_e;
>>+	struct dpll_pd *dpll_e_pd;
>>+	struct dpll_device *dpll_p;
>>+	struct dpll_pd *dpll_p_pd;
>>+
>>+	struct dpll_pin_properties *pp_gnss;
> 
> Why pointer? Just embed the properties struct, no?
> 

I can change both private data and properties to be embeded.

>>+	struct dpll_pin *p_gnss;
>>+	struct pin_pd *p_gnss_pd;
>>+
>>+	struct dpll_pin_properties *pp_pps;
>>+	struct dpll_pin *p_pps;
>>+	struct pin_pd *p_pps_pd;
>>+
>>+	struct dpll_pin_properties *pp_rclk;
>>+	struct dpll_pin *p_rclk;
>>+	struct pin_pd *p_rclk_pd;
>>+};
>>+
>>+int nsim_dpll_init_owner(struct nsim_dpll_info *dpll, int devid);
>>+void nsim_dpll_free_owner(struct nsim_dpll_info *dpll);
>>+int nsim_rclk_init(struct nsim_dpll_info *dpll, int devid, unsigned int index);
>>+void nsim_rclk_free(struct nsim_dpll_info *dpll);
>>+
>>+#endif /* NSIM_DPLL_H */
>>diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
>>index aecaf5f..78a936f 100644
>>--- a/drivers/net/netdevsim/netdev.c
>>+++ b/drivers/net/netdevsim/netdev.c
>>@@ -25,6 +25,7 @@
>> #include <net/udp_tunnel.h>
>> 
>> #include "netdevsim.h"
>>+#include "dpll.h"
>> 
>> static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> {
>>@@ -344,6 +345,20 @@ static int nsim_init_netdevsim(struct netdevsim *ns)
>> 	if (err)
>> 		goto err_ipsec_teardown;
>> 	rtnl_unlock();
>>+
>>+	if (ns->nsim_dev_port->port_index == 0) {
> 
> Does not make any sense to treat port 0 any different.
> 
> Please, move the init of dpll device to drivers/net/netdevsim/dev.c
> probably somewhere in nsim_drv_probe().
> 

Great idea - I will do it.

>>+		err = nsim_dpll_init_owner(&ns->dpll,
>>+					   ns->nsim_dev->nsim_bus_dev->dev.id);
>>+		if (err)
>>+			goto err_ipsec_teardown;
>>+	}
>>+
>>+	err = nsim_rclk_init(&ns->dpll, ns->nsim_dev->nsim_bus_dev->dev.id,
>>+			     ns->nsim_dev_port->port_index);
> 
> This is not related to netdev directly. Please move the pin init into
> drivers/net/netdevsim/dev.c, probably somewhere inside
> __nsim_dev_port_add()
> 
> Also, you don't call netdev_dpll_pin_set() and netdev_dpll_pin_clear().
> That is actually what you should call here in netdev initialization.
> 

Good catch - I will move to dev.c and use netdev_dpll_pin_set/clear()

>>+
>>+	if (err)
>>+		goto err_ipsec_teardown;
>>+
>> 	return 0;
>> 
>> err_ipsec_teardown:
>>@@ -419,6 +434,11 @@ void nsim_destroy(struct netdevsim *ns)
>> 	if (nsim_dev_port_is_pf(ns->nsim_dev_port))
>> 		nsim_udp_tunnels_info_destroy(dev);
>> 	mock_phc_destroy(ns->phc);
>>+
>>+	nsim_rclk_free(&ns->dpll);
>>+	if (ns->nsim_dev_port->port_index == 0)
>>+		nsim_dpll_free_owner(&ns->dpll);
>>+
>> 	free_netdev(dev);
>> }
>> 
>>diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
>>index 028c825..3d0138a 100644
>>--- a/drivers/net/netdevsim/netdevsim.h
>>+++ b/drivers/net/netdevsim/netdevsim.h
>>@@ -26,6 +26,8 @@
>> #include <net/xdp.h>
>> #include <net/macsec.h>
>> 
>>+#include "dpll.h"
>>+
>> #define DRV_NAME	"netdevsim"
>> 
>> #define NSIM_XDP_MAX_MTU	4000
>>@@ -125,6 +127,8 @@ struct netdevsim {
>> 	} udp_ports;
>> 
>> 	struct nsim_ethtool ethtool;
>>+
>>+	struct nsim_dpll_info dpll;
>> };
>> 
>> struct netdevsim *
>>-- 
>>2.9.5
>>


^ permalink raw reply

* Re: [PATCH net-next 02/19] netfilter: nft_set_rbtree: prefer sync gc to async worker
From: Florian Westphal @ 2023-11-03 17:55 UTC (permalink / raw)
  To: Simon Horman
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, kuba, pabeni,
	edumazet, fw
In-Reply-To: <20231103173442.GB768996@kernel.org>

Simon Horman <horms@kernel.org> wrote:
> Hi Florian and Pablo,
> 
> I understand that this patch has been accepted upstream,
> and that by implication this feedback is rather slow,
> but I noticed that with this patch nft_net is now
> set but otherwise unused in this function.

There is a patch queued for this, we'll pass this to -net next week.

^ permalink raw reply

* [PATCH net 0/4] vsock: fix server prevents clients from reconnecting
From: f.storniolo95 @ 2023-11-03 17:55 UTC (permalink / raw)
  To: luigi.leonardi, kvm
  Cc: davem, edumazet, mst, imbrenda, kuba, asias, stefanha, pabeni,
	sgarzare, netdev, linux-kernel, virtualization, Filippo Storniolo

From: Filippo Storniolo <f.storniolo95@gmail.com>

This patch series introduce fix and tests for the following vsock bug:
If the same remote peer, using the same port, tries to connect
to a server on a listening port more than once, the server will
reject the connection, causing a "connection reset by peer"
error on the remote peer. This is due to the presence of a
dangling socket from a previous connection in both the connected
and bound socket lists.
The inconsistency of the above lists only occurs when the remote
peer disconnects and the server remains active.
This bug does not occur when the server socket is closed.

More details on the first patch changelog.
The remaining patches are refactoring and test.

Filippo Storniolo (4):
  vsock/virtio: remove socket from connected/bound list on shutdown
  test/vsock fix: add missing check on socket creation
  test/vsock: refactor vsock_accept
  test/vsock: add dobule bind connect test

 net/vmw_vsock/virtio_transport_common.c | 16 +++--
 tools/testing/vsock/util.c              | 87 +++++++++++++++++++++----
 tools/testing/vsock/util.h              |  3 +
 tools/testing/vsock/vsock_test.c        | 50 ++++++++++++++
 4 files changed, 139 insertions(+), 17 deletions(-)

-- 
2.41.0


^ permalink raw reply

* [PATCH net 1/4] vsock/virtio: remove socket from connected/bound list on shutdown
From: f.storniolo95 @ 2023-11-03 17:55 UTC (permalink / raw)
  To: luigi.leonardi, kvm
  Cc: davem, edumazet, mst, imbrenda, kuba, asias, stefanha, pabeni,
	sgarzare, netdev, linux-kernel, virtualization, Filippo Storniolo,
	Daan De Meyer
In-Reply-To: <20231103175551.41025-1-f.storniolo95@gmail.com>

From: Filippo Storniolo <f.storniolo95@gmail.com>

If the same remote peer, using the same port, tries to connect
to a server on a listening port more than once, the server will
reject the connection, causing a "connection reset by peer"
error on the remote peer. This is due to the presence of a
dangling socket from a previous connection in both the connected
and bound socket lists.
The inconsistency of the above lists only occurs when the remote
peer disconnects and the server remains active.

This bug does not occur when the server socket is closed:
virtio_transport_release() will eventually schedule a call to
virtio_transport_do_close() and the latter will remove the socket
from the bound and connected socket lists and clear the sk_buff.

However, virtio_transport_do_close() will only perform the above
actions if it has been scheduled, and this will not happen
if the server is processing the shutdown message from a remote peer.

To fix this, introduce a call to vsock_remove_sock()
when the server is handling a client disconnect.
This is to remove the socket from the bound and connected socket
lists without clearing the sk_buff.

Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko")
Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Tested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Filippo Storniolo <f.storniolo95@gmail.com>
---
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index e22c81435ef7..4c595dd1fd64 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1369,11 +1369,17 @@ virtio_transport_recv_connected(struct sock *sk,
 			vsk->peer_shutdown |= RCV_SHUTDOWN;
 		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_SEND)
 			vsk->peer_shutdown |= SEND_SHUTDOWN;
-		if (vsk->peer_shutdown == SHUTDOWN_MASK &&
-		    vsock_stream_has_data(vsk) <= 0 &&
-		    !sock_flag(sk, SOCK_DONE)) {
-			(void)virtio_transport_reset(vsk, NULL);
-			virtio_transport_do_close(vsk, true);
+		if (vsk->peer_shutdown == SHUTDOWN_MASK) {
+			if (vsock_stream_has_data(vsk) <= 0 && !sock_flag(sk, SOCK_DONE)) {
+				(void)virtio_transport_reset(vsk, NULL);
+				virtio_transport_do_close(vsk, true);
+			}
+			/* Remove this socket anyway because the remote peer sent
+			 * the shutdown. This way a new connection will succeed
+			 * if the remote peer uses the same source port,
+			 * even if the old socket is still unreleased, but now disconnected.
+			 */
+			vsock_remove_sock(vsk);
 		}
 		if (le32_to_cpu(virtio_vsock_hdr(skb)->flags))
 			sk->sk_state_change(sk);
-- 
2.41.0


^ permalink raw reply related

* [PATCH net 2/4] test/vsock fix: add missing check on socket creation
From: f.storniolo95 @ 2023-11-03 17:55 UTC (permalink / raw)
  To: luigi.leonardi, kvm
  Cc: davem, edumazet, mst, imbrenda, kuba, asias, stefanha, pabeni,
	sgarzare, netdev, linux-kernel, virtualization, Filippo Storniolo
In-Reply-To: <20231103175551.41025-1-f.storniolo95@gmail.com>

From: Filippo Storniolo <f.storniolo95@gmail.com>

Add check on socket() return value in vsock_listen()
and vsock_connect()

Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Filippo Storniolo <f.storniolo95@gmail.com>
---
 tools/testing/vsock/util.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 92336721321a..698b0b44a2ee 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -104,6 +104,10 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
 	control_expectln("LISTENING");
 
 	fd = socket(AF_VSOCK, type, 0);
+	if (fd < 0) {
+		perror("socket");
+		exit(EXIT_FAILURE);
+	}
 
 	timeout_begin(TIMEOUT);
 	do {
@@ -158,6 +162,10 @@ static int vsock_accept(unsigned int cid, unsigned int port,
 	int old_errno;
 
 	fd = socket(AF_VSOCK, type, 0);
+	if (fd < 0) {
+		perror("socket");
+		exit(EXIT_FAILURE);
+	}
 
 	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
 		perror("bind");
-- 
2.41.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox