Re: [PATCH net-next v5 5/8] TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Simon Schippers <simon.schippers@tu-dortmund.de>
Cc: willemdebruijn.kernel@gmail.com, jasowang@redhat.com,
	eperezma@redhat.com, stephen@networkplumber.org,
	leiyang@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	kvm@vger.kernel.org, Tim Gebauer <tim.gebauer@tu-dortmund.de>
Subject: Re: [PATCH net-next v5 5/8] TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net
Date: Tue, 23 Sep 2025 12:23:56 -0400	[thread overview]
Message-ID: <20250923122147-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20250922221553.47802-6-simon.schippers@tu-dortmund.de>

On Tue, Sep 23, 2025 at 12:15:50AM +0200, Simon Schippers wrote:
> The wrappers tun_ring_consume_batched/tap_ring_consume_batched are similar
> to the wrappers tun_ring_consume/tap_ring_consume. They deal with
> consuming a batch of entries of the ptr_ring and then waking the
> netdev queue whenever entries get invalidated to be used again by the
> producer.
> To avoid waking the netdev queue when the ptr_ring is full, it is checked
> if the netdev queue is stopped before invalidating entries. Like that the
> netdev queue can be safely woken after invalidating entries.
> 
> The READ_ONCE in __ptr_ring_peek, paired with the smp_wmb() in
> __ptr_ring_produce within tun_net_xmit guarantees that the information
> about the netdev queue being stopped is visible after __ptr_ring_peek is
> called.

READ_ONCE generally can't pair with smp_wmb


From Documentation/memory-barriers.txt


SMP BARRIER PAIRING
-------------------
                                 
When dealing with CPU-CPU interactions, certain types of memory barrier should
always be paired.  A lack of appropriate pairing is almost certainly an error.


....


A write barrier pairs
with an address-dependency barrier, a control dependency, an acquire barrier,
a release barrier, a read barrier, or a general barrier.




> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> ---
>  drivers/net/tap.c      | 52 ++++++++++++++++++++++++++++++++++++++++
>  drivers/net/tun.c      | 54 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/if_tap.h |  6 +++++
>  include/linux/if_tun.h |  7 ++++++
>  4 files changed, 119 insertions(+)
> 
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index f8292721a9d6..651d48612329 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -1216,6 +1216,58 @@ struct socket *tap_get_socket(struct file *file)
>  }
>  EXPORT_SYMBOL_GPL(tap_get_socket);
>  
> +int tap_ring_consume_batched(struct file *file,
> +			     void **array, int n)
> +{
> +	struct tap_queue *q = file->private_data;
> +	struct netdev_queue *txq;
> +	struct net_device *dev;
> +	bool will_invalidate;
> +	bool stopped;
> +	void *ptr;
> +	int i;
> +
> +	spin_lock(&q->ring.consumer_lock);
> +	ptr = __ptr_ring_peek(&q->ring);
> +
> +	if (!ptr) {
> +		spin_unlock(&q->ring.consumer_lock);
> +		return 0;
> +	}
> +
> +	i = 0;
> +	do {
> +		/* Check if the queue stopped before zeroing out, so no
> +		 * ptr get produced in the meantime, because this could
> +		 * result in waking even though the ptr_ring is full.
> +		 * The order of the operations is ensured by barrier().
> +		 */
> +		will_invalidate = __ptr_ring_will_invalidate(&q->ring);
> +		if (unlikely(will_invalidate)) {
> +			rcu_read_lock();
> +			dev = rcu_dereference(q->tap)->dev;
> +			txq = netdev_get_tx_queue(dev, q->queue_index);
> +			stopped = netif_tx_queue_stopped(txq);
> +		}
> +		barrier();
> +		__ptr_ring_discard_one(&q->ring, will_invalidate);
> +
> +		if (unlikely(will_invalidate)) {
> +			if (stopped)
> +				netif_tx_wake_queue(txq);
> +			rcu_read_unlock();
> +		}
> +
> +		array[i++] = ptr;
> +		if (i >= n)
> +			break;
> +	} while ((ptr = __ptr_ring_peek(&q->ring)));
> +	spin_unlock(&q->ring.consumer_lock);
> +
> +	return i;
> +}
> +EXPORT_SYMBOL_GPL(tap_ring_consume_batched);
> +
>  struct ptr_ring *tap_get_ptr_ring(struct file *file)
>  {
>  	struct tap_queue *q;
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 682df8157b55..7566b22780fb 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -3759,6 +3759,60 @@ struct socket *tun_get_socket(struct file *file)
>  }
>  EXPORT_SYMBOL_GPL(tun_get_socket);
>  
> +int tun_ring_consume_batched(struct file *file,
> +			     void **array, int n)
> +{
> +	struct tun_file *tfile = file->private_data;
> +	struct netdev_queue *txq;
> +	struct net_device *dev;
> +	bool will_invalidate;
> +	bool stopped;
> +	void *ptr;
> +	int i;
> +
> +	spin_lock(&tfile->tx_ring.consumer_lock);
> +	ptr = __ptr_ring_peek(&tfile->tx_ring);
> +
> +	if (!ptr) {
> +		spin_unlock(&tfile->tx_ring.consumer_lock);
> +		return 0;
> +	}
> +
> +	i = 0;
> +	do {
> +		/* Check if the queue stopped before zeroing out, so no
> +		 * ptr get produced in the meantime, because this could
> +		 * result in waking even though the ptr_ring is full.
> +		 * The order of the operations is ensured by barrier().
> +		 */
> +		will_invalidate =
> +			__ptr_ring_will_invalidate(&tfile->tx_ring);
> +		if (unlikely(will_invalidate)) {
> +			rcu_read_lock();
> +			dev = rcu_dereference(tfile->tun)->dev;
> +			txq = netdev_get_tx_queue(dev,
> +						  tfile->queue_index);
> +			stopped = netif_tx_queue_stopped(txq);
> +		}
> +		barrier();
> +		__ptr_ring_discard_one(&tfile->tx_ring, will_invalidate);
> +
> +		if (unlikely(will_invalidate)) {
> +			if (stopped)
> +				netif_tx_wake_queue(txq);
> +			rcu_read_unlock();
> +		}
> +
> +		array[i++] = ptr;
> +		if (i >= n)
> +			break;
> +	} while ((ptr = __ptr_ring_peek(&tfile->tx_ring)));
> +	spin_unlock(&tfile->tx_ring.consumer_lock);
> +
> +	return i;
> +}
> +EXPORT_SYMBOL_GPL(tun_ring_consume_batched);
> +
>  struct ptr_ring *tun_get_tx_ring(struct file *file)
>  {
>  	struct tun_file *tfile;
> diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
> index 553552fa635c..2e5542d6aef4 100644
> --- a/include/linux/if_tap.h
> +++ b/include/linux/if_tap.h
> @@ -11,6 +11,7 @@ struct socket;
>  #if IS_ENABLED(CONFIG_TAP)
>  struct socket *tap_get_socket(struct file *);
>  struct ptr_ring *tap_get_ptr_ring(struct file *file);
> +int tap_ring_consume_batched(struct file *file, void **array, int n);
>  #else
>  #include <linux/err.h>
>  #include <linux/errno.h>
> @@ -22,6 +23,11 @@ static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
>  {
>  	return ERR_PTR(-EINVAL);
>  }
> +static inline int tap_ring_consume_batched(struct file *f,
> +						void **array, int n)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_TAP */
>  
>  /*
> diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
> index 80166eb62f41..5b41525ac007 100644
> --- a/include/linux/if_tun.h
> +++ b/include/linux/if_tun.h
> @@ -22,6 +22,7 @@ struct tun_msg_ctl {
>  #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
>  struct socket *tun_get_socket(struct file *);
>  struct ptr_ring *tun_get_tx_ring(struct file *file);
> +int tun_ring_consume_batched(struct file *file, void **array, int n);
>  
>  static inline bool tun_is_xdp_frame(void *ptr)
>  {
> @@ -55,6 +56,12 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
>  	return ERR_PTR(-EINVAL);
>  }
>  
> +static inline int tun_ring_consume_batched(struct file *file,
> +						void **array, int n)
> +{
> +	return 0;
> +}
> +
>  static inline bool tun_is_xdp_frame(void *ptr)
>  {
>  	return false;
> -- 
> 2.43.0

next prev parent reply	other threads:[~2025-09-23 16:24 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22 22:15 [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 1/8] __ptr_ring_full_next: Returns if ring will be full after next insertion Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 2/8] Move the decision of invalidation out of __ptr_ring_discard_one Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring Simon Schippers
2025-09-23 14:47   ` Michael S. Tsirkin
2025-09-24  5:41     ` Simon Schippers
2025-09-24  5:50       ` Michael S. Tsirkin
2025-09-22 22:15 ` [PATCH net-next v5 4/8] TUN & TAP: Wake netdev queue after consuming an entry Simon Schippers
2025-09-23 14:54   ` Michael S. Tsirkin
2025-09-23 16:36   ` Michael S. Tsirkin
2025-09-24  5:56     ` Simon Schippers
2025-09-24  6:55       ` Michael S. Tsirkin
2025-09-24  7:42         ` Simon Schippers
2025-09-24  7:49           ` Michael S. Tsirkin
2025-09-24  8:40             ` Simon Schippers
2025-09-24  9:00               ` Michael S. Tsirkin
2025-09-28 21:27     ` Simon Schippers
2025-09-28 22:33       ` Michael S. Tsirkin
2025-09-29  9:43         ` Simon Schippers
2025-10-11  9:15           ` Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 5/8] TUN & TAP: Provide ptr_ring_consume_batched wrappers for vhost_net Simon Schippers
2025-09-23 16:23   ` Michael S. Tsirkin [this message]
2025-09-22 22:15 ` [PATCH net-next v5 6/8] TUN & TAP: Provide ptr_ring_unconsume " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 7/8] TUN & TAP: Methods to determine whether file is TUN/TAP " Simon Schippers
2025-09-22 22:15 ` [PATCH net-next v5 8/8] vhost_net: Replace rx_ring with calls of TUN/TAP wrappers Simon Schippers
2025-09-23 14:14   ` kernel test robot
2025-09-26 13:47   ` kernel test robot
2025-09-23 14:55 ` [PATCH net-next v5 0/8] TUN/TAP & vhost_net: netdev queue flow control to avoid ptr_ring tail drop Michael S. Tsirkin
2025-09-24  5:59   ` Simon Schippers
2025-09-24  6:12     ` Michael S. Tsirkin
2025-09-24  7:18 ` Michael S. Tsirkin
2025-09-24  7:33   ` Jason Wang
2025-09-24  7:41     ` Michael S. Tsirkin
2025-09-24  8:08       ` Jason Wang
2025-09-24  8:09         ` Michael S. Tsirkin
2025-09-24  8:30           ` Jason Wang
2025-09-24  8:54             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250923122147-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=leiyang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=simon.schippers@tu-dortmund.de \
    --cc=stephen@networkplumber.org \
    --cc=tim.gebauer@tu-dortmund.de \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.