netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] xsk: fix bug when trying to use both copy and zero-copy mode
@ 2018-09-18 10:12 Magnus Karlsson
  2018-09-18 10:12 ` [PATCH bpf-next 1/2] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
  2018-09-18 10:12 ` [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
  0 siblings, 2 replies; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-18 10:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev

Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.

This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This
way we always know what kind of umem has been bound to a queue id and
can act appropriately at bind time.

Patch 1: Introduces a umem reference in the netdev_rx_queue and
         netdev_queue structs.
Patch 2: Records which queue_id is bound to which umem and make sure
         that you cannot bind two different umems to the same queue_id.

I based this patch set on bpf-next commit 70e88c758a6b
("selftests/bpf: fix bpf_flow.c build")

Thanks: Magnus

Magnus Karlsson (2):
  net: add umem reference in netdev{_rx}_queue
  xsk: fix bug when trying to use both copy and zero-copy on one queue
    id

 include/linux/netdevice.h |  6 ++++
 net/xdp/xdp_umem.c        | 87 ++++++++++++++++++++++++++++++++++++++---------
 net/xdp/xdp_umem.h        |  2 +-
 3 files changed, 77 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 1/2] net: add umem reference in netdev{_rx}_queue
  2018-09-18 10:12 [PATCH bpf-next 0/2] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
@ 2018-09-18 10:12 ` Magnus Karlsson
  2018-09-18 10:12 ` [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
  1 sibling, 0 replies; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-18 10:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev

These references to the umem will be used to store information
on what kind of AF_XDP umem that is bound to a queue id, if any.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/linux/netdevice.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e2b3bd7..9915d47 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -609,6 +609,9 @@ struct netdev_queue {
 
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_umem         *umem;
+#endif
 /*
  * write-mostly part
  */
@@ -738,6 +741,9 @@ struct netdev_rx_queue {
 	struct kobject			kobj;
 	struct net_device		*dev;
 	struct xdp_rxq_info		xdp_rxq;
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_umem                 *umem;
+#endif
 } ____cacheline_aligned_in_smp;
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-18 10:12 [PATCH bpf-next 0/2] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
  2018-09-18 10:12 ` [PATCH bpf-next 1/2] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
@ 2018-09-18 10:12 ` Magnus Karlsson
  2018-09-18 17:22   ` Y Song
  1 sibling, 1 reply; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-18 10:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev

Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.

This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xdp_umem.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----------
 net/xdp/xdp_umem.h |  2 +-
 2 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b3b632c..12300b5 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
 	}
 }
 
+/* The umem is stored both in the _rx struct and the _tx struct as we do
+ * not know if the device has more tx queues than rx, or the opposite.
+ * This might also change during run time.
+ */
+static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
+				u16 queue_id)
+{
+	if (queue_id < dev->real_num_rx_queues)
+		dev->_rx[queue_id].umem = umem;
+	if (queue_id < dev->real_num_tx_queues)
+		dev->_tx[queue_id].umem = umem;
+}
+
+static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
+					      u16 queue_id)
+{
+	if (queue_id < dev->real_num_rx_queues)
+		return dev->_rx[queue_id].umem;
+	if (queue_id < dev->real_num_tx_queues)
+		return dev->_tx[queue_id].umem;
+
+	return NULL;
+}
+
+static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
+{
+	/* Zero out the entry independent on how many queues are configured
+	 * at this point in time, as it might be used in the future.
+	 */
+	if (queue_id < dev->num_rx_queues)
+		dev->_rx[queue_id].umem = NULL;
+	if (queue_id < dev->num_tx_queues)
+		dev->_tx[queue_id].umem = NULL;
+}
+
 int xdp_umem_query(struct net_device *dev, u16 queue_id)
 {
 	struct netdev_bpf bpf;
@@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
 }
 
 int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u32 queue_id, u16 flags)
+			u16 queue_id, u16 flags)
 {
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
-	int err;
+	int err = 0;
 
 	force_zc = flags & XDP_ZEROCOPY;
 	force_copy = flags & XDP_COPY;
@@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (force_zc && force_copy)
 		return -EINVAL;
 
+	rtnl_lock();
+	if (xdp_get_umem_from_qid(dev, queue_id)) {
+		err = -EBUSY;
+		goto rtnl_unlock;
+	}
+
+	xdp_reg_umem_at_qid(dev, umem, queue_id);
+	umem->dev = dev;
+	umem->queue_id = queue_id;
 	if (force_copy)
-		return 0;
+		/* For copy-mode, we are done. */
+		goto rtnl_unlock;
 
-	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
-		return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
+	if (!dev->netdev_ops->ndo_bpf ||
+	    !dev->netdev_ops->ndo_xsk_async_xmit) {
+		err = -EOPNOTSUPP;
+		goto err_unreg_umem;
+	}
 
-	rtnl_lock();
 	err = xdp_umem_query(dev, queue_id);
-	if (err) {
-		err = err < 0 ? -EOPNOTSUPP : -EBUSY;
-		goto err_rtnl_unlock;
-	}
+	if (err)
+		goto err_unreg_umem;
 
 	bpf.command = XDP_SETUP_XSK_UMEM;
 	bpf.xsk.umem = umem;
@@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 
 	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
 	if (err)
-		goto err_rtnl_unlock;
+		goto err_unreg_umem;
 	rtnl_unlock();
 
 	dev_hold(dev);
-	umem->dev = dev;
-	umem->queue_id = queue_id;
 	umem->zc = true;
 	return 0;
 
-err_rtnl_unlock:
+err_unreg_umem:
+	xdp_clear_umem_at_qid(dev, queue_id);
+	if (!force_zc)
+		err = 0; /* fallback to copy mode */
+rtnl_unlock:
 	rtnl_unlock();
-	return force_zc ? err : 0; /* fail or fallback */
+	return err;
 }
 
 static void xdp_umem_clear_dev(struct xdp_umem *umem)
@@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
 	struct netdev_bpf bpf;
 	int err;
 
-	if (umem->dev) {
+	if (umem->zc) {
 		bpf.command = XDP_SETUP_XSK_UMEM;
 		bpf.xsk.umem = NULL;
 		bpf.xsk.queue_id = umem->queue_id;
@@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
 			WARN(1, "failed to disable umem!\n");
 
 		dev_put(umem->dev);
-		umem->dev = NULL;
+		umem->zc = false;
+	}
+
+	if (umem->dev) {
+		rtnl_lock();
+		xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
+		rtnl_unlock();
 	}
 }
 
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index c8be1ad..2760322 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -9,7 +9,7 @@
 #include <net/xdp_sock.h>
 
 int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u32 queue_id, u16 flags);
+			u16 queue_id, u16 flags);
 bool xdp_umem_validate_queues(struct xdp_umem *umem);
 void xdp_get_umem(struct xdp_umem *umem);
 void xdp_put_umem(struct xdp_umem *umem);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-18 10:12 ` [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
@ 2018-09-18 17:22   ` Y Song
  2018-09-19  1:55     ` Jakub Kicinski
  2018-09-19  7:14     ` Magnus Karlsson
  0 siblings, 2 replies; 11+ messages in thread
From: Y Song @ 2018-09-18 17:22 UTC (permalink / raw)
  To: magnus.karlsson
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	netdev

On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
<magnus.karlsson@intel.com> wrote:
>
> Previously, the xsk code did not record which umem was bound to a
> specific queue id. This was not required if all drivers were zero-copy
> enabled as this had to be recorded in the driver anyway. So if a user
> tried to bind two umems to the same queue, the driver would say
> no. But if copy-mode was first enabled and then zero-copy mode (or the
> reverse order), we mistakenly enabled both of them on the same umem
> leading to buggy behavior. The main culprit for this is that we did
> not store the association of umem to queue id in the copy case and
> only relied on the driver reporting this. As this relation was not
> stored in the driver for copy mode (it does not rely on the AF_XDP
> NDOs), this obviously could not work.
>
> This patch fixes the problem by always recording the umem to queue id
> relationship in the netdev_queue and netdev_rx_queue structs. This way
> we always know what kind of umem has been bound to a queue id and can
> act appropriately at bind time.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  net/xdp/xdp_umem.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----------
>  net/xdp/xdp_umem.h |  2 +-
>  2 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index b3b632c..12300b5 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
>         }
>  }
>
> +/* The umem is stored both in the _rx struct and the _tx struct as we do
> + * not know if the device has more tx queues than rx, or the opposite.
> + * This might also change during run time.
> + */
> +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> +                               u16 queue_id)
> +{
> +       if (queue_id < dev->real_num_rx_queues)
> +               dev->_rx[queue_id].umem = umem;
> +       if (queue_id < dev->real_num_tx_queues)
> +               dev->_tx[queue_id].umem = umem;
> +}
> +
> +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> +                                             u16 queue_id)
> +{
> +       if (queue_id < dev->real_num_rx_queues)
> +               return dev->_rx[queue_id].umem;
> +       if (queue_id < dev->real_num_tx_queues)
> +               return dev->_tx[queue_id].umem;
> +
> +       return NULL;
> +}
> +
> +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> +{
> +       /* Zero out the entry independent on how many queues are configured
> +        * at this point in time, as it might be used in the future.
> +        */
> +       if (queue_id < dev->num_rx_queues)
> +               dev->_rx[queue_id].umem = NULL;
> +       if (queue_id < dev->num_tx_queues)
> +               dev->_tx[queue_id].umem = NULL;
> +}
> +

I am sure whether the following scenario can happen or not.
Could you clarify?
   1. suppose initially we have num_rx_queues = num_tx_queues = 10
       xdp_reg_umem_at_qid() set umem1 to queue_id = 8
   2. num_tx_queues is changed to 5
   3. xdp_clear_umem_at_qid() is called for queue_id = 8,
       and dev->_rx[8].umum = 0.
   4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
       dev->_rx[8].umem = umem2
   5. num_tx_queues is changed to 10
  Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
a problem?

>  int xdp_umem_query(struct net_device *dev, u16 queue_id)
>  {
>         struct netdev_bpf bpf;
> @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
>  }
>
>  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> -                       u32 queue_id, u16 flags)
> +                       u16 queue_id, u16 flags)
>  {
>         bool force_zc, force_copy;
>         struct netdev_bpf bpf;
> -       int err;
> +       int err = 0;
>
>         force_zc = flags & XDP_ZEROCOPY;
>         force_copy = flags & XDP_COPY;
> @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
>         if (force_zc && force_copy)
>                 return -EINVAL;
>
> +       rtnl_lock();
> +       if (xdp_get_umem_from_qid(dev, queue_id)) {
> +               err = -EBUSY;
> +               goto rtnl_unlock;
> +       }
> +
> +       xdp_reg_umem_at_qid(dev, umem, queue_id);
> +       umem->dev = dev;
> +       umem->queue_id = queue_id;
>         if (force_copy)
> -               return 0;
> +               /* For copy-mode, we are done. */
> +               goto rtnl_unlock;
>
> -       if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
> -               return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
> +       if (!dev->netdev_ops->ndo_bpf ||
> +           !dev->netdev_ops->ndo_xsk_async_xmit) {
> +               err = -EOPNOTSUPP;
> +               goto err_unreg_umem;
> +       }
>
> -       rtnl_lock();
>         err = xdp_umem_query(dev, queue_id);
> -       if (err) {
> -               err = err < 0 ? -EOPNOTSUPP : -EBUSY;
> -               goto err_rtnl_unlock;
> -       }
> +       if (err)
> +               goto err_unreg_umem;
>
>         bpf.command = XDP_SETUP_XSK_UMEM;
>         bpf.xsk.umem = umem;
> @@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
>
>         err = dev->netdev_ops->ndo_bpf(dev, &bpf);
>         if (err)
> -               goto err_rtnl_unlock;
> +               goto err_unreg_umem;
>         rtnl_unlock();
>
>         dev_hold(dev);
> -       umem->dev = dev;
> -       umem->queue_id = queue_id;
>         umem->zc = true;
>         return 0;
>
> -err_rtnl_unlock:
> +err_unreg_umem:
> +       xdp_clear_umem_at_qid(dev, queue_id);

You did not clear umem->dev and umem->queue_id,is a problem here?
For example in xdp_umem_clear_dev(), umem->dev is checked.

> +       if (!force_zc)
> +               err = 0; /* fallback to copy mode */
> +rtnl_unlock:
>         rtnl_unlock();
> -       return force_zc ? err : 0; /* fail or fallback */
> +       return err;
>  }
>
>  static void xdp_umem_clear_dev(struct xdp_umem *umem)
> @@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
>         struct netdev_bpf bpf;
>         int err;
>
> -       if (umem->dev) {
> +       if (umem->zc) {
>                 bpf.command = XDP_SETUP_XSK_UMEM;
>                 bpf.xsk.umem = NULL;
>                 bpf.xsk.queue_id = umem->queue_id;
> @@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
>                         WARN(1, "failed to disable umem!\n");
>
>                 dev_put(umem->dev);
> -               umem->dev = NULL;
> +               umem->zc = false;
> +       }
> +
> +       if (umem->dev) {
> +               rtnl_lock();
> +               xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
> +               rtnl_unlock();

Previously, umem->dev is reset to NULL. Now, it is left as is. I
assume it is not possible
that this function xdp_umem_clear_dev() is called again for this umem, right?

>         }
>  }
>
> diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> index c8be1ad..2760322 100644
> --- a/net/xdp/xdp_umem.h
> +++ b/net/xdp/xdp_umem.h
> @@ -9,7 +9,7 @@
>  #include <net/xdp_sock.h>
>
>  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> -                       u32 queue_id, u16 flags);
> +                       u16 queue_id, u16 flags);
>  bool xdp_umem_validate_queues(struct xdp_umem *umem);
>  void xdp_get_umem(struct xdp_umem *umem);
>  void xdp_put_umem(struct xdp_umem *umem);
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-18 17:22   ` Y Song
@ 2018-09-19  1:55     ` Jakub Kicinski
  2018-09-19  7:18       ` Magnus Karlsson
  2018-09-19  7:14     ` Magnus Karlsson
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2018-09-19  1:55 UTC (permalink / raw)
  To: Y Song
  Cc: magnus.karlsson, Björn Töpel, Alexei Starovoitov,
	Daniel Borkmann, netdev

On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > + * not know if the device has more tx queues than rx, or the opposite.
> > + * This might also change during run time.
> > + */
> > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > +                               u16 queue_id)
> > +{
> > +       if (queue_id < dev->real_num_rx_queues)
> > +               dev->_rx[queue_id].umem = umem;
> > +       if (queue_id < dev->real_num_tx_queues)
> > +               dev->_tx[queue_id].umem = umem;
> > +}
> > +
> > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > +                                             u16 queue_id)
> > +{
> > +       if (queue_id < dev->real_num_rx_queues)
> > +               return dev->_rx[queue_id].umem;
> > +       if (queue_id < dev->real_num_tx_queues)
> > +               return dev->_tx[queue_id].umem;
> > +
> > +       return NULL;
> > +}
> > +
> > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > +{
> > +       /* Zero out the entry independent on how many queues are configured
> > +        * at this point in time, as it might be used in the future.
> > +        */
> > +       if (queue_id < dev->num_rx_queues)
> > +               dev->_rx[queue_id].umem = NULL;
> > +       if (queue_id < dev->num_tx_queues)
> > +               dev->_tx[queue_id].umem = NULL;
> > +}
> > +  
> 
> I am sure whether the following scenario can happen or not.
> Could you clarify?
>    1. suppose initially we have num_rx_queues = num_tx_queues = 10
>        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
>    2. num_tx_queues is changed to 5
>    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
>        and dev->_rx[8].umum = 0.
>    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
>        dev->_rx[8].umem = umem2
>    5. num_tx_queues is changed to 10
>   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> a problem?

Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is 
not under rtnl_lock so it doesn't count for much.  Why not do all the
checks against num_[rt]x_queues here, instead of real_..?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-18 17:22   ` Y Song
  2018-09-19  1:55     ` Jakub Kicinski
@ 2018-09-19  7:14     ` Magnus Karlsson
  2018-09-20  4:47       ` Y Song
  1 sibling, 1 reply; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-19  7:14 UTC (permalink / raw)
  To: ys114321
  Cc: Karlsson, Magnus, Björn Töpel, ast, Daniel Borkmann,
	Network Development

On Tue, Sep 18, 2018 at 7:23 PM Y Song <ys114321@gmail.com> wrote:
>
> On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
> <magnus.karlsson@intel.com> wrote:
> >
> > Previously, the xsk code did not record which umem was bound to a
> > specific queue id. This was not required if all drivers were zero-copy
> > enabled as this had to be recorded in the driver anyway. So if a user
> > tried to bind two umems to the same queue, the driver would say
> > no. But if copy-mode was first enabled and then zero-copy mode (or the
> > reverse order), we mistakenly enabled both of them on the same umem
> > leading to buggy behavior. The main culprit for this is that we did
> > not store the association of umem to queue id in the copy case and
> > only relied on the driver reporting this. As this relation was not
> > stored in the driver for copy mode (it does not rely on the AF_XDP
> > NDOs), this obviously could not work.
> >
> > This patch fixes the problem by always recording the umem to queue id
> > relationship in the netdev_queue and netdev_rx_queue structs. This way
> > we always know what kind of umem has been bound to a queue id and can
> > act appropriately at bind time.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >  net/xdp/xdp_umem.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----------
> >  net/xdp/xdp_umem.h |  2 +-
> >  2 files changed, 71 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > index b3b632c..12300b5 100644
> > --- a/net/xdp/xdp_umem.c
> > +++ b/net/xdp/xdp_umem.c
> > @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
> >         }
> >  }
> >
> > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > + * not know if the device has more tx queues than rx, or the opposite.
> > + * This might also change during run time.
> > + */
> > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > +                               u16 queue_id)
> > +{
> > +       if (queue_id < dev->real_num_rx_queues)
> > +               dev->_rx[queue_id].umem = umem;
> > +       if (queue_id < dev->real_num_tx_queues)
> > +               dev->_tx[queue_id].umem = umem;
> > +}
> > +
> > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > +                                             u16 queue_id)
> > +{
> > +       if (queue_id < dev->real_num_rx_queues)
> > +               return dev->_rx[queue_id].umem;
> > +       if (queue_id < dev->real_num_tx_queues)
> > +               return dev->_tx[queue_id].umem;
> > +
> > +       return NULL;
> > +}
> > +
> > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > +{
> > +       /* Zero out the entry independent on how many queues are configured
> > +        * at this point in time, as it might be used in the future.
> > +        */
> > +       if (queue_id < dev->num_rx_queues)
> > +               dev->_rx[queue_id].umem = NULL;
> > +       if (queue_id < dev->num_tx_queues)
> > +               dev->_tx[queue_id].umem = NULL;
> > +}
> > +
>
> I am sure whether the following scenario can happen or not.
> Could you clarify?
>    1. suppose initially we have num_rx_queues = num_tx_queues = 10
>        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
>    2. num_tx_queues is changed to 5

You probably mean real_num_tx_queues here. This is the current number
of queues configured via e.g. ethtool. num_tx_queues will not change
unless you change device (or device driver).

>    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
>        and dev->_rx[8].umum = 0.

At this point both _rx[8].umem and _tx[8].umem are set to NULL as the
test is against num_[rx|tx]_queues which is the max allowed for the
device, not the current allocated one which is real_num_tx_queues.
With this in mind, the scenario below will not happen. Do you agree?

>    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
>        dev->_rx[8].umem = umem2
>    5. num_tx_queues is changed to 10
>   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> a problem?
>
> >  int xdp_umem_query(struct net_device *dev, u16 queue_id)
> >  {
> >         struct netdev_bpf bpf;
> > @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
> >  }
> >
> >  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -                       u32 queue_id, u16 flags)
> > +                       u16 queue_id, u16 flags)
> >  {
> >         bool force_zc, force_copy;
> >         struct netdev_bpf bpf;
> > -       int err;
> > +       int err = 0;
> >
> >         force_zc = flags & XDP_ZEROCOPY;
> >         force_copy = flags & XDP_COPY;
> > @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> >         if (force_zc && force_copy)
> >                 return -EINVAL;
> >
> > +       rtnl_lock();
> > +       if (xdp_get_umem_from_qid(dev, queue_id)) {
> > +               err = -EBUSY;
> > +               goto rtnl_unlock;
> > +       }
> > +
> > +       xdp_reg_umem_at_qid(dev, umem, queue_id);
> > +       umem->dev = dev;
> > +       umem->queue_id = queue_id;
> >         if (force_copy)
> > -               return 0;
> > +               /* For copy-mode, we are done. */
> > +               goto rtnl_unlock;
> >
> > -       if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
> > -               return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
> > +       if (!dev->netdev_ops->ndo_bpf ||
> > +           !dev->netdev_ops->ndo_xsk_async_xmit) {
> > +               err = -EOPNOTSUPP;
> > +               goto err_unreg_umem;
> > +       }
> >
> > -       rtnl_lock();
> >         err = xdp_umem_query(dev, queue_id);
> > -       if (err) {
> > -               err = err < 0 ? -EOPNOTSUPP : -EBUSY;
> > -               goto err_rtnl_unlock;
> > -       }
> > +       if (err)
> > +               goto err_unreg_umem;
> >
> >         bpf.command = XDP_SETUP_XSK_UMEM;
> >         bpf.xsk.umem = umem;
> > @@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> >
> >         err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> >         if (err)
> > -               goto err_rtnl_unlock;
> > +               goto err_unreg_umem;
> >         rtnl_unlock();
> >
> >         dev_hold(dev);
> > -       umem->dev = dev;
> > -       umem->queue_id = queue_id;
> >         umem->zc = true;
> >         return 0;
> >
> > -err_rtnl_unlock:
> > +err_unreg_umem:
> > +       xdp_clear_umem_at_qid(dev, queue_id);
>
> You did not clear umem->dev and umem->queue_id,is a problem here?
> For example in xdp_umem_clear_dev(), umem->dev is checked.

As the umem might be shared, I cannot clear these variables as they
are used by another socket when it is shared.

The umem->dev check in xdp_umem_clear_dev() is for sockets that are
killed when only half-ways set up. In that case, umem->dev might still
be NULL.

> > +       if (!force_zc)
> > +               err = 0; /* fallback to copy mode */
> > +rtnl_unlock:
> >         rtnl_unlock();
> > -       return force_zc ? err : 0; /* fail or fallback */
> > +       return err;
> >  }
> >
> >  static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > @@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> >         struct netdev_bpf bpf;
> >         int err;
> >
> > -       if (umem->dev) {
> > +       if (umem->zc) {
> >                 bpf.command = XDP_SETUP_XSK_UMEM;
> >                 bpf.xsk.umem = NULL;
> >                 bpf.xsk.queue_id = umem->queue_id;
> > @@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> >                         WARN(1, "failed to disable umem!\n");
> >
> >                 dev_put(umem->dev);
> > -               umem->dev = NULL;
> > +               umem->zc = false;
> > +       }
> > +
> > +       if (umem->dev) {
> > +               rtnl_lock();
> > +               xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
> > +               rtnl_unlock();
>
> Previously, umem->dev is reset to NULL. Now, it is left as is. I
> assume it is not possible
> that this function xdp_umem_clear_dev() is called again for this umem, right?

This function will only be called once for each umem / queue_id pair.
We only allow one such binding with this patch. This is what was
broken with the original code.

Thanks Song for your review. I appreciate that you take a look at my code.

/Magnus

> >         }
> >  }
> >
> > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> > index c8be1ad..2760322 100644
> > --- a/net/xdp/xdp_umem.h
> > +++ b/net/xdp/xdp_umem.h
> > @@ -9,7 +9,7 @@
> >  #include <net/xdp_sock.h>
> >
> >  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -                       u32 queue_id, u16 flags);
> > +                       u16 queue_id, u16 flags);
> >  bool xdp_umem_validate_queues(struct xdp_umem *umem);
> >  void xdp_get_umem(struct xdp_umem *umem);
> >  void xdp_put_umem(struct xdp_umem *umem);
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-19  1:55     ` Jakub Kicinski
@ 2018-09-19  7:18       ` Magnus Karlsson
  2018-09-20  9:17         ` Magnus Karlsson
  0 siblings, 1 reply; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-19  7:18 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: ys114321, Karlsson, Magnus, Björn Töpel, ast,
	Daniel Borkmann, Network Development

On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > + * not know if the device has more tx queues than rx, or the opposite.
> > > + * This might also change during run time.
> > > + */
> > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > +                               u16 queue_id)
> > > +{
> > > +       if (queue_id < dev->real_num_rx_queues)
> > > +               dev->_rx[queue_id].umem = umem;
> > > +       if (queue_id < dev->real_num_tx_queues)
> > > +               dev->_tx[queue_id].umem = umem;
> > > +}
> > > +
> > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > +                                             u16 queue_id)
> > > +{
> > > +       if (queue_id < dev->real_num_rx_queues)
> > > +               return dev->_rx[queue_id].umem;
> > > +       if (queue_id < dev->real_num_tx_queues)
> > > +               return dev->_tx[queue_id].umem;
> > > +
> > > +       return NULL;
> > > +}
> > > +
> > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > +{
> > > +       /* Zero out the entry independent on how many queues are configured
> > > +        * at this point in time, as it might be used in the future.
> > > +        */
> > > +       if (queue_id < dev->num_rx_queues)
> > > +               dev->_rx[queue_id].umem = NULL;
> > > +       if (queue_id < dev->num_tx_queues)
> > > +               dev->_tx[queue_id].umem = NULL;
> > > +}
> > > +
> >
> > I am sure whether the following scenario can happen or not.
> > Could you clarify?
> >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> >    2. num_tx_queues is changed to 5
> >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> >        and dev->_rx[8].umum = 0.
> >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> >        dev->_rx[8].umem = umem2
> >    5. num_tx_queues is changed to 10
> >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > a problem?
>
> Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> not under rtnl_lock so it doesn't count for much.  Why not do all the
> checks against num_[rt]x_queues here, instead of real_..?

You are correct, two separate rtnl_lock regions is broken. Will spin a
v2 tomorrow when I am back in the office.

Thanks Jakub for catching this. I really appreciate you reviewing my code.

/Magnus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-19  7:14     ` Magnus Karlsson
@ 2018-09-20  4:47       ` Y Song
  0 siblings, 0 replies; 11+ messages in thread
From: Y Song @ 2018-09-20  4:47 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: magnus.karlsson, Björn Töpel, Alexei Starovoitov,
	Daniel Borkmann, netdev

On Wed, Sep 19, 2018 at 12:15 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, Sep 18, 2018 at 7:23 PM Y Song <ys114321@gmail.com> wrote:
> >
> > On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
> > <magnus.karlsson@intel.com> wrote:
> > >
> > > Previously, the xsk code did not record which umem was bound to a
> > > specific queue id. This was not required if all drivers were zero-copy
> > > enabled as this had to be recorded in the driver anyway. So if a user
> > > tried to bind two umems to the same queue, the driver would say
> > > no. But if copy-mode was first enabled and then zero-copy mode (or the
> > > reverse order), we mistakenly enabled both of them on the same umem
> > > leading to buggy behavior. The main culprit for this is that we did
> > > not store the association of umem to queue id in the copy case and
> > > only relied on the driver reporting this. As this relation was not
> > > stored in the driver for copy mode (it does not rely on the AF_XDP
> > > NDOs), this obviously could not work.
> > >
> > > This patch fixes the problem by always recording the umem to queue id
> > > relationship in the netdev_queue and netdev_rx_queue structs. This way
> > > we always know what kind of umem has been bound to a queue id and can
> > > act appropriately at bind time.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > ---
> > >  net/xdp/xdp_umem.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----------
> > >  net/xdp/xdp_umem.h |  2 +-
> > >  2 files changed, 71 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > > index b3b632c..12300b5 100644
> > > --- a/net/xdp/xdp_umem.c
> > > +++ b/net/xdp/xdp_umem.c
> > > @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
> > >         }
> > >  }
> > >
> > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > + * not know if the device has more tx queues than rx, or the opposite.
> > > + * This might also change during run time.
> > > + */
> > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > +                               u16 queue_id)
> > > +{
> > > +       if (queue_id < dev->real_num_rx_queues)
> > > +               dev->_rx[queue_id].umem = umem;
> > > +       if (queue_id < dev->real_num_tx_queues)
> > > +               dev->_tx[queue_id].umem = umem;
> > > +}
> > > +
> > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > +                                             u16 queue_id)
> > > +{
> > > +       if (queue_id < dev->real_num_rx_queues)
> > > +               return dev->_rx[queue_id].umem;
> > > +       if (queue_id < dev->real_num_tx_queues)
> > > +               return dev->_tx[queue_id].umem;
> > > +
> > > +       return NULL;
> > > +}
> > > +
> > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > +{
> > > +       /* Zero out the entry independent on how many queues are configured
> > > +        * at this point in time, as it might be used in the future.
> > > +        */
> > > +       if (queue_id < dev->num_rx_queues)
> > > +               dev->_rx[queue_id].umem = NULL;
> > > +       if (queue_id < dev->num_tx_queues)
> > > +               dev->_tx[queue_id].umem = NULL;
> > > +}
> > > +
> >
> > I am sure whether the following scenario can happen or not.
> > Could you clarify?
> >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> >    2. num_tx_queues is changed to 5
>
> You probably mean real_num_tx_queues here. This is the current number
> of queues configured via e.g. ethtool. num_tx_queues will not change
> unless you change device (or device driver).

I am actually not really familiar with this area of
code. Your explanation definitely helps my understanding.

>
> >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> >        and dev->_rx[8].umum = 0.
>
> At this point both _rx[8].umem and _tx[8].umem are set to NULL as the
> test is against num_[rx|tx]_queues which is the max allowed for the
> device, not the current allocated one which is real_num_tx_queues.
> With this in mind, the scenario below will not happen. Do you agree?

I agree. If num_{rx,tx}_queues are not changed, than clear should set
both with NULL and the step 4/5 won't happen any more.

>
> >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> >        dev->_rx[8].umem = umem2
> >    5. num_tx_queues is changed to 10
> >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > a problem?
> >
> > >  int xdp_umem_query(struct net_device *dev, u16 queue_id)
> > >  {
> > >         struct netdev_bpf bpf;
> > > @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
> > >  }
> > >
> > >  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > > -                       u32 queue_id, u16 flags)
> > > +                       u16 queue_id, u16 flags)
> > >  {
> > >         bool force_zc, force_copy;
> > >         struct netdev_bpf bpf;
> > > -       int err;
> > > +       int err = 0;
> > >
> > >         force_zc = flags & XDP_ZEROCOPY;
> > >         force_copy = flags & XDP_COPY;
> > > @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > >         if (force_zc && force_copy)
> > >                 return -EINVAL;
> > >
> > > +       rtnl_lock();
> > > +       if (xdp_get_umem_from_qid(dev, queue_id)) {
> > > +               err = -EBUSY;
> > > +               goto rtnl_unlock;
> > > +       }
> > > +
> > > +       xdp_reg_umem_at_qid(dev, umem, queue_id);
> > > +       umem->dev = dev;
> > > +       umem->queue_id = queue_id;
> > >         if (force_copy)
> > > -               return 0;
> > > +               /* For copy-mode, we are done. */
> > > +               goto rtnl_unlock;
> > >
> > > -       if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
> > > -               return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
> > > +       if (!dev->netdev_ops->ndo_bpf ||
> > > +           !dev->netdev_ops->ndo_xsk_async_xmit) {
> > > +               err = -EOPNOTSUPP;
> > > +               goto err_unreg_umem;
> > > +       }
> > >
> > > -       rtnl_lock();
> > >         err = xdp_umem_query(dev, queue_id);
> > > -       if (err) {
> > > -               err = err < 0 ? -EOPNOTSUPP : -EBUSY;
> > > -               goto err_rtnl_unlock;
> > > -       }
> > > +       if (err)
> > > +               goto err_unreg_umem;
> > >
> > >         bpf.command = XDP_SETUP_XSK_UMEM;
> > >         bpf.xsk.umem = umem;
> > > @@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > >
> > >         err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> > >         if (err)
> > > -               goto err_rtnl_unlock;
> > > +               goto err_unreg_umem;
> > >         rtnl_unlock();
> > >
> > >         dev_hold(dev);
> > > -       umem->dev = dev;
> > > -       umem->queue_id = queue_id;
> > >         umem->zc = true;
> > >         return 0;
> > >
> > > -err_rtnl_unlock:
> > > +err_unreg_umem:
> > > +       xdp_clear_umem_at_qid(dev, queue_id);
> >
> > You did not clear umem->dev and umem->queue_id,is a problem here?
> > For example in xdp_umem_clear_dev(), umem->dev is checked.
>
> As the umem might be shared, I cannot clear these variables as they
> are used by another socket when it is shared.
>
> The umem->dev check in xdp_umem_clear_dev() is for sockets that are
> killed when only half-ways set up. In that case, umem->dev might still
> be NULL.

Make sense.

>
> > > +       if (!force_zc)
> > > +               err = 0; /* fallback to copy mode */
> > > +rtnl_unlock:
> > >         rtnl_unlock();
> > > -       return force_zc ? err : 0; /* fail or fallback */
> > > +       return err;
> > >  }
> > >
> > >  static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > > @@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > >         struct netdev_bpf bpf;
> > >         int err;
> > >
> > > -       if (umem->dev) {
> > > +       if (umem->zc) {
> > >                 bpf.command = XDP_SETUP_XSK_UMEM;
> > >                 bpf.xsk.umem = NULL;
> > >                 bpf.xsk.queue_id = umem->queue_id;
> > > @@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > >                         WARN(1, "failed to disable umem!\n");
> > >
> > >                 dev_put(umem->dev);
> > > -               umem->dev = NULL;
> > > +               umem->zc = false;
> > > +       }
> > > +
> > > +       if (umem->dev) {
> > > +               rtnl_lock();
> > > +               xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
> > > +               rtnl_unlock();
> >
> > Previously, umem->dev is reset to NULL. Now, it is left as is. I
> > assume it is not possible
> > that this function xdp_umem_clear_dev() is called again for this umem, right?
>
> This function will only be called once for each umem / queue_id pair.
> We only allow one such binding with this patch. This is what was
> broken with the original code.

Thanks for explanation.

>
> Thanks Song for your review. I appreciate that you take a look at my code.
>
> /Magnus
>
> > >         }
> > >  }
> > >
> > > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> > > index c8be1ad..2760322 100644
> > > --- a/net/xdp/xdp_umem.h
> > > +++ b/net/xdp/xdp_umem.h
> > > @@ -9,7 +9,7 @@
> > >  #include <net/xdp_sock.h>
> > >
> > >  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > > -                       u32 queue_id, u16 flags);
> > > +                       u16 queue_id, u16 flags);
> > >  bool xdp_umem_validate_queues(struct xdp_umem *umem);
> > >  void xdp_get_umem(struct xdp_umem *umem);
> > >  void xdp_put_umem(struct xdp_umem *umem);
> > > --
> > > 2.7.4
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-19  7:18       ` Magnus Karlsson
@ 2018-09-20  9:17         ` Magnus Karlsson
  2018-09-20 14:41           ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-20  9:17 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: ys114321, Karlsson, Magnus, Björn Töpel, ast,
	Daniel Borkmann, Network Development

On Wed, Sep 19, 2018 at 9:18 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > > + * not know if the device has more tx queues than rx, or the opposite.
> > > > + * This might also change during run time.
> > > > + */
> > > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > > +                               u16 queue_id)
> > > > +{
> > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > +               dev->_rx[queue_id].umem = umem;
> > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > +               dev->_tx[queue_id].umem = umem;
> > > > +}
> > > > +
> > > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > > +                                             u16 queue_id)
> > > > +{
> > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > +               return dev->_rx[queue_id].umem;
> > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > +               return dev->_tx[queue_id].umem;
> > > > +
> > > > +       return NULL;
> > > > +}
> > > > +
> > > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > > +{
> > > > +       /* Zero out the entry independent on how many queues are configured
> > > > +        * at this point in time, as it might be used in the future.
> > > > +        */
> > > > +       if (queue_id < dev->num_rx_queues)
> > > > +               dev->_rx[queue_id].umem = NULL;
> > > > +       if (queue_id < dev->num_tx_queues)
> > > > +               dev->_tx[queue_id].umem = NULL;
> > > > +}
> > > > +
> > >
> > > I am sure whether the following scenario can happen or not.
> > > Could you clarify?
> > >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > >    2. num_tx_queues is changed to 5
> > >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > >        and dev->_rx[8].umum = 0.
> > >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > >        dev->_rx[8].umem = umem2
> > >    5. num_tx_queues is changed to 10
> > >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > > a problem?
> >
> > Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> > not under rtnl_lock so it doesn't count for much.  Why not do all the
> > checks against num_[rt]x_queues here, instead of real_..?
>
> You are correct, two separate rtnl_lock regions is broken. Will spin a
> v2 tomorrow when I am back in the office.
>
> Thanks Jakub for catching this. I really appreciate you reviewing my code.

Sorry, forgot to answer your question about why real_num_ instead of
num_. If we used num_ instead, we would get a behavior where we can
open a socket on a queue id, let us say 10, that will not be active if
real_num_ is below 10. So no traffic will flow on this socket until we
issue an ethtool command to state that we have 10 or more queues
configured. While this behavior is sane (and consistent), I believe it
will lead to a more complicated driver implementation, break the
current uapi (since we will return an error if you try to bind to a
queue id that is not active at the moment) and I like my suggestion
below better :-).

What I would like to suggest is that the current model at bind() is
kept, that is it will fail if you try to bind to a non-active queue
id. But we add some code in ethool_set_channels in net/core/ethtool.c
to check if you have an active af_xdp socket on a queue id and try to
make it inactive, we return an error and disallow the change. This
would IMHO be like not allowing a load module to be unloaded if
someone is using it or not allowing unmounting a file system if files
are in use. What do you think? If you think this is the right way to
go, I can implement this in a follow up patch or in this patch set if
that makes more sense, let me know. This suggestion would actually
make it simpler to implement zero-copy support in the driver. Some of
the complications we are having today is due to the fact that ethtool
can come in from the side and change things for us when an AF_XDP
socket is active on a queue id. And implementing zero copy support in
the driver needs to get simpler IMO.

Please let me know what you think.

Thanks: Magnus

> /Magnus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-20  9:17         ` Magnus Karlsson
@ 2018-09-20 14:41           ` Jakub Kicinski
  2018-09-24  8:37             ` Magnus Karlsson
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2018-09-20 14:41 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: ys114321, Karlsson, Magnus, Björn Töpel, ast,
	Daniel Borkmann, Network Development

On Thu, 20 Sep 2018 11:17:17 +0200, Magnus Karlsson wrote:
> On Wed, Sep 19, 2018 at 9:18 AM Magnus Karlsson wrote:
> > On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski wrote:  
> > >
> > > On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:  
> > > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > > > + * not know if the device has more tx queues than rx, or the opposite.
> > > > > + * This might also change during run time.
> > > > > + */
> > > > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > > > +                               u16 queue_id)
> > > > > +{
> > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > +               dev->_rx[queue_id].umem = umem;
> > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > +               dev->_tx[queue_id].umem = umem;
> > > > > +}
> > > > > +
> > > > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > > > +                                             u16 queue_id)
> > > > > +{
> > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > +               return dev->_rx[queue_id].umem;
> > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > +               return dev->_tx[queue_id].umem;
> > > > > +
> > > > > +       return NULL;
> > > > > +}
> > > > > +
> > > > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > > > +{
> > > > > +       /* Zero out the entry independent on how many queues are configured
> > > > > +        * at this point in time, as it might be used in the future.
> > > > > +        */
> > > > > +       if (queue_id < dev->num_rx_queues)
> > > > > +               dev->_rx[queue_id].umem = NULL;
> > > > > +       if (queue_id < dev->num_tx_queues)
> > > > > +               dev->_tx[queue_id].umem = NULL;
> > > > > +}
> > > > > +  
> > > >
> > > > I am sure whether the following scenario can happen or not.
> > > > Could you clarify?
> > > >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > > >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > > >    2. num_tx_queues is changed to 5
> > > >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > > >        and dev->_rx[8].umum = 0.
> > > >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > > >        dev->_rx[8].umem = umem2
> > > >    5. num_tx_queues is changed to 10
> > > >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > > > a problem?  
> > >
> > > Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> > > not under rtnl_lock so it doesn't count for much.  Why not do all the
> > > checks against num_[rt]x_queues here, instead of real_..?  
> >
> > You are correct, two separate rtnl_lock regions is broken. Will spin a
> > v2 tomorrow when I am back in the office.
> >
> > Thanks Jakub for catching this. I really appreciate you reviewing my code.  
> 
> Sorry, forgot to answer your question about why real_num_ instead of
> num_. If we used num_ instead, we would get a behavior where we can
> open a socket on a queue id, let us say 10, that will not be active if
> real_num_ is below 10. So no traffic will flow on this socket until we
> issue an ethtool command to state that we have 10 or more queues
> configured. While this behavior is sane (and consistent), I believe it
> will lead to a more complicated driver implementation, break the
> current uapi (since we will return an error if you try to bind to a
> queue id that is not active at the moment) and I like my suggestion
> below better :-).
> 
> What I would like to suggest is that the current model at bind() is
> kept, that is it will fail if you try to bind to a non-active queue
> id. But we add some code in ethool_set_channels in net/core/ethtool.c
> to check if you have an active af_xdp socket on a queue id and try to
> make it inactive, we return an error and disallow the change. This
> would IMHO be like not allowing a load module to be unloaded if
> someone is using it or not allowing unmounting a file system if files
> are in use. What do you think? If you think this is the right way to
> go, I can implement this in a follow up patch or in this patch set if
> that makes more sense, let me know. This suggestion would actually
> make it simpler to implement zero-copy support in the driver. Some of
> the complications we are having today is due to the fact that ethtool
> can come in from the side and change things for us when an AF_XDP
> socket is active on a queue id. And implementing zero copy support in
> the driver needs to get simpler IMO.
> 
> Please let me know what you think.

I think I'd like that:
https://patchwork.ozlabs.org/project/netdev/list/?series=57843&state=*
in particular:
https://patchwork.ozlabs.org/patch/949891/
;)

If we fix the resize indeed you could stick to real_* but to be honest,
I'm not entirely clear on what the advantage would be?  The arrays are
sized to num..queues, do you envision a use case where having
xdp_get_umem_from_qid() return a NULL when asked for qid > real_..
would help?  Is there one for xdp_reg_umem_at_qid()

I have no strong preference here, just wondering.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-09-20 14:41           ` Jakub Kicinski
@ 2018-09-24  8:37             ` Magnus Karlsson
  0 siblings, 0 replies; 11+ messages in thread
From: Magnus Karlsson @ 2018-09-24  8:37 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: Y Song, Karlsson, Magnus, Björn Töpel, ast,
	Daniel Borkmann, Network Development

On Thu, Sep 20, 2018 at 4:41 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 20 Sep 2018 11:17:17 +0200, Magnus Karlsson wrote:
> > On Wed, Sep 19, 2018 at 9:18 AM Magnus Karlsson wrote:
> > > On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski wrote:
> > > >
> > > > On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > > > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > > > > + * not know if the device has more tx queues than rx, or the opposite.
> > > > > > + * This might also change during run time.
> > > > > > + */
> > > > > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > > > > +                               u16 queue_id)
> > > > > > +{
> > > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > > +               dev->_rx[queue_id].umem = umem;
> > > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > > +               dev->_tx[queue_id].umem = umem;
> > > > > > +}
> > > > > > +
> > > > > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > > > > +                                             u16 queue_id)
> > > > > > +{
> > > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > > +               return dev->_rx[queue_id].umem;
> > > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > > +               return dev->_tx[queue_id].umem;
> > > > > > +
> > > > > > +       return NULL;
> > > > > > +}
> > > > > > +
> > > > > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > > > > +{
> > > > > > +       /* Zero out the entry independent on how many queues are configured
> > > > > > +        * at this point in time, as it might be used in the future.
> > > > > > +        */
> > > > > > +       if (queue_id < dev->num_rx_queues)
> > > > > > +               dev->_rx[queue_id].umem = NULL;
> > > > > > +       if (queue_id < dev->num_tx_queues)
> > > > > > +               dev->_tx[queue_id].umem = NULL;
> > > > > > +}
> > > > > > +
> > > > >
> > > > > I am sure whether the following scenario can happen or not.
> > > > > Could you clarify?
> > > > >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > > > >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > > > >    2. num_tx_queues is changed to 5
> > > > >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > > > >        and dev->_rx[8].umum = 0.
> > > > >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > > > >        dev->_rx[8].umem = umem2
> > > > >    5. num_tx_queues is changed to 10
> > > > >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > > > > a problem?
> > > >
> > > > Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> > > > not under rtnl_lock so it doesn't count for much.  Why not do all the
> > > > checks against num_[rt]x_queues here, instead of real_..?
> > >
> > > You are correct, two separate rtnl_lock regions is broken. Will spin a
> > > v2 tomorrow when I am back in the office.
> > >
> > > Thanks Jakub for catching this. I really appreciate you reviewing my code.
> >
> > Sorry, forgot to answer your question about why real_num_ instead of
> > num_. If we used num_ instead, we would get a behavior where we can
> > open a socket on a queue id, let us say 10, that will not be active if
> > real_num_ is below 10. So no traffic will flow on this socket until we
> > issue an ethtool command to state that we have 10 or more queues
> > configured. While this behavior is sane (and consistent), I believe it
> > will lead to a more complicated driver implementation, break the
> > current uapi (since we will return an error if you try to bind to a
> > queue id that is not active at the moment) and I like my suggestion
> > below better :-).
> >
> > What I would like to suggest is that the current model at bind() is
> > kept, that is it will fail if you try to bind to a non-active queue
> > id. But we add some code in ethool_set_channels in net/core/ethtool.c
> > to check if you have an active af_xdp socket on a queue id and try to
> > make it inactive, we return an error and disallow the change. This
> > would IMHO be like not allowing a load module to be unloaded if
> > someone is using it or not allowing unmounting a file system if files
> > are in use. What do you think? If you think this is the right way to
> > go, I can implement this in a follow up patch or in this patch set if
> > that makes more sense, let me know. This suggestion would actually
> > make it simpler to implement zero-copy support in the driver. Some of
> > the complications we are having today is due to the fact that ethtool
> > can come in from the side and change things for us when an AF_XDP
> > socket is active on a queue id. And implementing zero copy support in
> > the driver needs to get simpler IMO.
> >
> > Please let me know what you think.
>
> I think I'd like that:
> https://patchwork.ozlabs.org/project/netdev/list/?series=57843&state=*
> in particular:
> https://patchwork.ozlabs.org/patch/949891/
> ;)

Perfect :-). I will drop the code I wrote to fix this and use yours in
my patch set. Please let me know if that does not work for you.

> If we fix the resize indeed you could stick to real_* but to be honest,
> I'm not entirely clear on what the advantage would be?  The arrays are
> sized to num..queues, do you envision a use case where having
> xdp_get_umem_from_qid() return a NULL when asked for qid > real_..
> would help?  Is there one for xdp_reg_umem_at_qid()

I just think that testing against real_ combined with not allowing to
remove queues that have an active af_xdp socket makes it consistent.
You cannot start an af_xdp socket on a non-active queue_id. You have
to make the queue active (increase real_num_ so it encompasses the
queue in question) first, then you can start your socket, and in order
to decrease real_num, you first have to remove the socket. AF_XDP
sockets are only on active queue ids. Should make the state handling
simpler, IMHO.

/Magnus

> I have no strong preference here, just wondering.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-09-24 14:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-18 10:12 [PATCH bpf-next 0/2] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
2018-09-18 10:12 ` [PATCH bpf-next 1/2] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
2018-09-18 10:12 ` [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
2018-09-18 17:22   ` Y Song
2018-09-19  1:55     ` Jakub Kicinski
2018-09-19  7:18       ` Magnus Karlsson
2018-09-20  9:17         ` Magnus Karlsson
2018-09-20 14:41           ` Jakub Kicinski
2018-09-24  8:37             ` Magnus Karlsson
2018-09-19  7:14     ` Magnus Karlsson
2018-09-20  4:47       ` Y Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).