netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] tun zerocopy stats
@ 2017-10-06 22:25 Willem de Bruijn
  2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw)
  To: netdev; +Cc: mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add zerocopy transfer statistics to the vhost_net/tun zerocopy path.

I've been using this to verify recent changes to zerocopy tuning [1].
Sharing more widely, as it may be useful in similar future work.

Use ethtool stats as interface, as these are defined per device
driver and can easily be extended.

Make the zerocopy release callback take an extra hop through the tun
driver to allow the driver to increment its counters.

Care must be taken to avoid adding an alloc/free to this hot path.
Since the caller already must allocate a ubuf_info, make it allocate
two at a time and grant one to the tun device.

 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices
 2/3: add zerocopy tx and tx_err counters
 3/3: convert vhost_net to pass a pair of ubuf_info to tun

[1] http://patchwork.ozlabs.org/patch/822613/

Willem de Bruijn (3):
  tun: ethtool stats
  tun: expand ethtool stats with zerocopy
  vhost_net: support tun zerocopy stats

 drivers/net/tun.c   | 111 ++++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/net.c |   7 ++--
 2 files changed, 108 insertions(+), 10 deletions(-)

-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 1/3] tun: ethtool stats
  2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn
@ 2017-10-06 22:25 ` Willem de Bruijn
  2017-10-06 22:30   ` Stephen Hemminger
  2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw)
  To: netdev; +Cc: mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Support ethtool -S on tun devices. This interface allows exporting
device-specific stats not present in rtnl stats.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 57e4c31fa84a..df6ef9670d05 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -194,6 +194,15 @@ struct tun_flow_entry {
 
 #define TUN_NUM_FLOW_ENTRIES 1024
 
+static const struct {
+	const char string[ETH_GSTRING_LEN];
+} tun_ethtool_stats_keys[] = {
+	{ "rx_packets" },
+	{ "tx_packets" },
+	{ "rx_bytes" },
+	{ "tx_bytes" },
+};
+
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
  * device, socket filter, sndbuf and vnet header size were restore when the
  * file were attached to a persist device.
@@ -2954,6 +2963,32 @@ static int tun_set_coalesce(struct net_device *dev,
 	return 0;
 }
 
+static int tun_get_sset_count(struct net_device *dev, int string_set)
+{
+	if (string_set == ETH_SS_STATS)
+		return ARRAY_SIZE(tun_ethtool_stats_keys);
+
+	return -EINVAL;
+}
+
+static void tun_get_strings(struct net_device *dev, u32 stringset, u8 *data)
+{
+	if (stringset == ETH_SS_STATS)
+		memcpy(data, &tun_ethtool_stats_keys,
+		       sizeof(tun_ethtool_stats_keys));
+}
+
+static void tun_get_ethtool_stats(struct net_device *dev,
+				  struct ethtool_stats *stats, u64 *data)
+{
+	const int ethtool_stats_bytelen =
+		ARRAY_SIZE(tun_ethtool_stats_keys) * sizeof(u64);
+	struct rtnl_link_stats64 link_stats64 = {0};
+
+	tun_net_get_stats64(dev, &link_stats64);
+	memcpy(data, &link_stats64, ethtool_stats_bytelen);
+}
+
 static const struct ethtool_ops tun_ethtool_ops = {
 	.get_drvinfo	= tun_get_drvinfo,
 	.get_msglevel	= tun_get_msglevel,
@@ -2963,6 +2998,9 @@ static const struct ethtool_ops tun_ethtool_ops = {
 	.get_coalesce   = tun_get_coalesce,
 	.set_coalesce   = tun_set_coalesce,
 	.get_link_ksettings = tun_get_link_ksettings,
+	.get_sset_count	= tun_get_sset_count,
+	.get_strings	= tun_get_strings,
+	.get_ethtool_stats = tun_get_ethtool_stats,
 };
 
 static int tun_queue_resize(struct tun_struct *tun)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy
  2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn
  2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn
@ 2017-10-06 22:25 ` Willem de Bruijn
  2017-10-06 22:32   ` Willem de Bruijn
  2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn
  2017-10-10  3:52 ` [PATCH RFC 0/3] " David Miller
  3 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw)
  To: netdev; +Cc: mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add two counters to tun ethool stats to count zerocopy completions.
- tx_zerocopy counts completions that succeeded without copy.
- tx_zerocopy_err counts those that triggered a copy.

Tun intercepts completions by replacing the zerocopy completion
handler (ubuf_info) prepared by the packet source (vhost-net)
with its own.

To avoid adding another ubuf_info alloc/free in the datapath, only
enter this mode if the packet source passed an array of two ubuf_info
to the tun device.

Pass msg_controllen to tun_get_user to detect this.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/tun.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 69 insertions(+), 10 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index df6ef9670d05..286787d8e875 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -149,6 +149,8 @@ struct tun_pcpu_stats {
 	u32 rx_dropped;
 	u32 tx_dropped;
 	u32 rx_frame_errors;
+	u64 tx_zerocopy;
+	u64 tx_zerocopy_err;
 };
 
 /* A tun_file connects an open character device to a tuntap netdevice. It
@@ -201,6 +203,8 @@ static const struct {
 	{ "tx_packets" },
 	{ "rx_bytes" },
 	{ "tx_bytes" },
+	{ "tx_zerocopy" },
+	{ "tx_zerocopy_err" },
 };
 
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
@@ -1082,8 +1086,9 @@ static void tun_set_headroom(struct net_device *dev, int new_hr)
 	tun->align = new_hr;
 }
 
-static void
-tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
+static void __tun_net_get_stats64(struct net_device *dev,
+				  struct rtnl_link_stats64 *stats,
+				  u64 *ethtool_stats)
 {
 	u32 rx_dropped = 0, tx_dropped = 0, rx_frame_errors = 0;
 	struct tun_struct *tun = netdev_priv(dev);
@@ -1103,6 +1108,16 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 			txbytes		= p->tx_bytes;
 		} while (u64_stats_fetch_retry(&p->syncp, start));
 
+		if (ethtool_stats) {
+			ethtool_stats[0] += rxpackets;
+			ethtool_stats[1] += txpackets;
+			ethtool_stats[2] += rxbytes;
+			ethtool_stats[3] += txbytes;
+			ethtool_stats[4] += p->tx_zerocopy;
+			ethtool_stats[5] += p->tx_zerocopy_err;
+			continue;
+		}
+
 		stats->rx_packets	+= rxpackets;
 		stats->rx_bytes		+= rxbytes;
 		stats->tx_packets	+= txpackets;
@@ -1113,11 +1128,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		rx_frame_errors	+= p->rx_frame_errors;
 		tx_dropped	+= p->tx_dropped;
 	}
+
+	if (ethtool_stats)
+		return;
+
 	stats->rx_dropped  = rx_dropped;
 	stats->rx_frame_errors = rx_frame_errors;
 	stats->tx_dropped = tx_dropped;
 }
 
+static void tun_net_get_stats64(struct net_device *dev,
+				struct rtnl_link_stats64 *stats)
+{
+	__tun_net_get_stats64(dev, stats, NULL);
+}
+
 static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 		       struct netlink_ext_ack *extack)
 {
@@ -1537,10 +1562,44 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	return NULL;
 }
 
+static void tun_zerocopy_callback(struct ubuf_info *ubuf, bool success)
+{
+	struct tun_struct *tun = ubuf->ctx;
+
+	if (success)
+		this_cpu_inc(tun->pcpu_stats->tx_zerocopy);
+	else
+		this_cpu_inc(tun->pcpu_stats->tx_zerocopy_err);
+
+	ubuf = ubuf - 1;
+	ubuf->callback(ubuf, success);
+}
+
+static void tun_set_zerocopy(struct tun_struct *tun, struct sk_buff *skb,
+			     struct ubuf_info *ubuf, size_t len)
+{
+	const int ulen = sizeof(*ubuf);
+
+	if (len != ulen && len != ulen * 2) {
+		WARN_ON_ONCE(1);
+		return;
+	}
+
+	/* if caller passed two ubuf, one is for tun to interpose callback */
+	if (len == ulen * 2) {
+		ubuf = ubuf + 1;
+		ubuf->callback = tun_zerocopy_callback;
+		ubuf->ctx = tun;
+		refcount_set(&ubuf->refcnt, 1);
+	}
+
+	skb_shinfo(skb)->destructor_arg = ubuf;
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
-			    void *msg_control, struct iov_iter *from,
-			    int noblock, bool more)
+			    void *msg_control, size_t msg_controllen,
+			    struct iov_iter *from, int noblock, bool more)
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
@@ -1713,7 +1772,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
+		tun_set_zerocopy(tun, skb, (void *)msg_control, msg_controllen);
 		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 	} else if (msg_control) {
@@ -1798,7 +1857,7 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!tun)
 		return -EBADFD;
 
-	result = tun_get_user(tun, tfile, NULL, from,
+	result = tun_get_user(tun, tfile, NULL, 0, from,
 			      file->f_flags & O_NONBLOCK, false);
 
 	tun_put(tun);
@@ -2058,7 +2117,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (!tun)
 		return -EBADFD;
 
-	ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
+	ret = tun_get_user(tun, tfile, m->msg_control, m->msg_controllen,
+			   &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
 	tun_put(tun);
@@ -2983,10 +3043,9 @@ static void tun_get_ethtool_stats(struct net_device *dev,
 {
 	const int ethtool_stats_bytelen =
 		ARRAY_SIZE(tun_ethtool_stats_keys) * sizeof(u64);
-	struct rtnl_link_stats64 link_stats64 = {0};
 
-	tun_net_get_stats64(dev, &link_stats64);
-	memcpy(data, &link_stats64, ethtool_stats_bytelen);
+	memset(data, 0, ethtool_stats_bytelen);
+	__tun_net_get_stats64(dev, NULL, data);
 }
 
 static const struct ethtool_ops tun_ethtool_ops = {
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 3/3] vhost_net: support tun zerocopy stats
  2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn
  2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn
  2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn
@ 2017-10-06 22:25 ` Willem de Bruijn
  2017-10-10  3:52 ` [PATCH RFC 0/3] " David Miller
  3 siblings, 0 replies; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw)
  To: netdev; +Cc: mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Allow the tun device to intercept zerocopy completions to increment
its counters.

Pass an array of two struct ubuf_info to the device. The first is
initialized as usual and used to notify vhost_net. The second is
granted to the tun device for its callback.

Use the existing per-device ubuf_info pool, but take out two
elements at a time, so double the size.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec8699e..22988a7df656 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -47,6 +47,7 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
 /* MAX number of TX used buffers for outstanding zerocopy */
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
+#define UBUFLEN 2	/* Number of consecutive ubuf_info passed to tun */
 
 /*
  * For transmit, used buffer len is unused; we override it to track buffer
@@ -254,7 +255,7 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n)
 		if (!zcopy)
 			continue;
 		n->vqs[i].ubuf_info = kmalloc(sizeof(*n->vqs[i].ubuf_info) *
-					      UIO_MAXIOV, GFP_KERNEL);
+					      UIO_MAXIOV * UBUFLEN, GFP_KERNEL);
 		if  (!n->vqs[i].ubuf_info)
 			goto err;
 	}
@@ -526,7 +527,7 @@ static void handle_tx(struct vhost_net *net)
 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
 		if (zcopy_used) {
 			struct ubuf_info *ubuf;
-			ubuf = nvq->ubuf_info + nvq->upend_idx;
+			ubuf = nvq->ubuf_info + (nvq->upend_idx * UBUFLEN);
 
 			vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
 			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
@@ -535,7 +536,7 @@ static void handle_tx(struct vhost_net *net)
 			ubuf->desc = nvq->upend_idx;
 			refcount_set(&ubuf->refcnt, 1);
 			msg.msg_control = ubuf;
-			msg.msg_controllen = sizeof(ubuf);
+			msg.msg_controllen = sizeof(*ubuf) * UBUFLEN;
 			ubufs = nvq->ubufs;
 			atomic_inc(&ubufs->refcount);
 			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] tun: ethtool stats
  2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn
@ 2017-10-06 22:30   ` Stephen Hemminger
  2017-10-06 22:37     ` Willem de Bruijn
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2017-10-06 22:30 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, mst, jasowang, Willem de Bruijn

On Fri,  6 Oct 2017 18:25:14 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> From: Willem de Bruijn <willemb@google.com>
> 
> Support ethtool -S on tun devices. This interface allows exporting
> device-specific stats not present in rtnl stats.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 57e4c31fa84a..df6ef9670d05 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -194,6 +194,15 @@ struct tun_flow_entry {
>  
>  #define TUN_NUM_FLOW_ENTRIES 1024
>  
> +static const struct {
> +	const char string[ETH_GSTRING_LEN];
> +} tun_ethtool_stats_keys[] = {
> +	{ "rx_packets" },
> +	{ "tx_packets" },
> +	{ "rx_bytes" },
> +	{ "tx_bytes" }

It looks like you are just exporting the statistics taht are already
through normal path. The purpose of ethtool stats is to provide
statistics unique to the device, not to repeat what is available throug
ip, ifconfig, etc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy
  2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn
@ 2017-10-06 22:32   ` Willem de Bruijn
  0 siblings, 0 replies; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:32 UTC (permalink / raw)
  To: Network Development; +Cc: Michael S. Tsirkin, Jason Wang, Willem de Bruijn

On Fri, Oct 6, 2017 at 6:25 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Add two counters to tun ethool stats to count zerocopy completions.
> - tx_zerocopy counts completions that succeeded without copy.
> - tx_zerocopy_err counts those that triggered a copy.
>
> Tun intercepts completions by replacing the zerocopy completion
> handler (ubuf_info) prepared by the packet source (vhost-net)
> with its own.
>
> To avoid adding another ubuf_info alloc/free in the datapath, only
> enter this mode if the packet source passed an array of two ubuf_info
> to the tun device.
>
> Pass msg_controllen to tun_get_user to detect this.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/tun.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 69 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index df6ef9670d05..286787d8e875 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -149,6 +149,8 @@ struct tun_pcpu_stats {
>         u32 rx_dropped;
>         u32 tx_dropped;
>         u32 rx_frame_errors;
> +       u64 tx_zerocopy;
> +       u64 tx_zerocopy_err;
>  };
>
>  /* A tun_file connects an open character device to a tuntap netdevice. It
> @@ -201,6 +203,8 @@ static const struct {
>         { "tx_packets" },
>         { "rx_bytes" },
>         { "tx_bytes" },
> +       { "tx_zerocopy" },
> +       { "tx_zerocopy_err" },
>  };
>
>  /* Since the socket were moved to tun_file, to preserve the behavior of persist
> @@ -1082,8 +1086,9 @@ static void tun_set_headroom(struct net_device *dev, int new_hr)
>         tun->align = new_hr;
>  }
>
> -static void
> -tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> +static void __tun_net_get_stats64(struct net_device *dev,
> +                                 struct rtnl_link_stats64 *stats,
> +                                 u64 *ethtool_stats)
>  {
>         u32 rx_dropped = 0, tx_dropped = 0, rx_frame_errors = 0;
>         struct tun_struct *tun = netdev_priv(dev);
> @@ -1103,6 +1108,16 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>                         txbytes         = p->tx_bytes;
>                 } while (u64_stats_fetch_retry(&p->syncp, start));
>
> +               if (ethtool_stats) {
> +                       ethtool_stats[0] += rxpackets;
> +                       ethtool_stats[1] += txpackets;
> +                       ethtool_stats[2] += rxbytes;
> +                       ethtool_stats[3] += txbytes;
> +                       ethtool_stats[4] += p->tx_zerocopy;
> +                       ethtool_stats[5] += p->tx_zerocopy_err;
> +                       continue;
> +               }
> +
>                 stats->rx_packets       += rxpackets;
>                 stats->rx_bytes         += rxbytes;
>                 stats->tx_packets       += txpackets;
> @@ -1113,11 +1128,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>                 rx_frame_errors += p->rx_frame_errors;
>                 tx_dropped      += p->tx_dropped;
>         }
> +
> +       if (ethtool_stats)
> +               return;
> +
>         stats->rx_dropped  = rx_dropped;
>         stats->rx_frame_errors = rx_frame_errors;
>         stats->tx_dropped = tx_dropped;
>  }
>
> +static void tun_net_get_stats64(struct net_device *dev,
> +                               struct rtnl_link_stats64 *stats)
> +{
> +       __tun_net_get_stats64(dev, stats, NULL);
> +}
> +
>  static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>                        struct netlink_ext_ack *extack)
>  {
> @@ -1537,10 +1562,44 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>         return NULL;
>  }
>
> +static void tun_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> +{
> +       struct tun_struct *tun = ubuf->ctx;

This might need an explicit reference to avoid the tun device going
away. Noticed that too late.. need to take a closer look.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] tun: ethtool stats
  2017-10-06 22:30   ` Stephen Hemminger
@ 2017-10-06 22:37     ` Willem de Bruijn
  2017-10-06 23:12       ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-06 22:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn

On Fri, Oct 6, 2017 at 6:30 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri,  6 Oct 2017 18:25:14 -0400
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Support ethtool -S on tun devices. This interface allows exporting
>> device-specific stats not present in rtnl stats.
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 57e4c31fa84a..df6ef9670d05 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -194,6 +194,15 @@ struct tun_flow_entry {
>>
>>  #define TUN_NUM_FLOW_ENTRIES 1024
>>
>> +static const struct {
>> +     const char string[ETH_GSTRING_LEN];
>> +} tun_ethtool_stats_keys[] = {
>> +     { "rx_packets" },
>> +     { "tx_packets" },
>> +     { "rx_bytes" },
>> +     { "tx_bytes" }
>
> It looks like you are just exporting the statistics taht are already
> through normal path. The purpose of ethtool stats is to provide
> statistics unique to the device, not to repeat what is available throug
> ip, ifconfig, etc.

A follow-up patch adds non-standard statistics.

I included these common ones, because the new zerocopy counters
are only interesting in relation to tx_packets.

It also seems customary among existing network drivers to include
at least these base counters in the ethtool output.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] tun: ethtool stats
  2017-10-06 22:37     ` Willem de Bruijn
@ 2017-10-06 23:12       ` Stephen Hemminger
  2017-10-06 23:26         ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2017-10-06 23:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn

On Fri, 6 Oct 2017 18:37:01 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> > It looks like you are just exporting the statistics taht are already
> > through normal path. The purpose of ethtool stats is to provide
> > statistics unique to the device, not to repeat what is available throug
> > ip, ifconfig, etc.  
> 
> A follow-up patch adds non-standard statistics.
> 
> I included these common ones, because the new zerocopy counters
> are only interesting in relation to tx_packets.
> 
> It also seems customary among existing network drivers to include
> at least these base counters in the ethtool output.

Only some drivers that slipped through before I started noticing.
Do Not Repeat.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] tun: ethtool stats
  2017-10-06 23:12       ` Stephen Hemminger
@ 2017-10-06 23:26         ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2017-10-06 23:26 UTC (permalink / raw)
  To: stephen; +Cc: willemdebruijn.kernel, netdev, mst, jasowang, willemb

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Fri, 6 Oct 2017 16:12:45 -0700

> On Fri, 6 Oct 2017 18:37:01 -0400
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
>> > It looks like you are just exporting the statistics taht are already
>> > through normal path. The purpose of ethtool stats is to provide
>> > statistics unique to the device, not to repeat what is available throug
>> > ip, ifconfig, etc.  
>> 
>> A follow-up patch adds non-standard statistics.
>> 
>> I included these common ones, because the new zerocopy counters
>> are only interesting in relation to tx_packets.
>> 
>> It also seems customary among existing network drivers to include
>> at least these base counters in the ethtool output.
> 
> Only some drivers that slipped through before I started noticing.
> Do Not Repeat.

Agreed, that is a bad practice unless it is being done to export
per-queue versions of those standard netdev statistics.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn
                   ` (2 preceding siblings ...)
  2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn
@ 2017-10-10  3:52 ` David Miller
  2017-10-10 15:29   ` Willem de Bruijn
  3 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2017-10-10  3:52 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, mst, jasowang, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Fri,  6 Oct 2017 18:25:13 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path.
> 
> I've been using this to verify recent changes to zerocopy tuning [1].
> Sharing more widely, as it may be useful in similar future work.
> 
> Use ethtool stats as interface, as these are defined per device
> driver and can easily be extended.
> 
> Make the zerocopy release callback take an extra hop through the tun
> driver to allow the driver to increment its counters.
> 
> Care must be taken to avoid adding an alloc/free to this hot path.
> Since the caller already must allocate a ubuf_info, make it allocate
> two at a time and grant one to the tun device.
> 
>  1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices
>  2/3: add zerocopy tx and tx_err counters
>  3/3: convert vhost_net to pass a pair of ubuf_info to tun
> 
> [1] http://patchwork.ozlabs.org/patch/822613/

This looks mostly fine to me, but I don't know enough about how vhost
and tap interact to tell whether this makes sense to upstream.

What are the runtime costs for these new statistics?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-10  3:52 ` [PATCH RFC 0/3] " David Miller
@ 2017-10-10 15:29   ` Willem de Bruijn
  2017-10-10 17:23     ` Stephen Hemminger
  2017-10-10 17:39     ` David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-10 15:29 UTC (permalink / raw)
  To: David Miller
  Cc: Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn

On Mon, Oct 9, 2017 at 11:52 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Fri,  6 Oct 2017 18:25:13 -0400
>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path.
>>
>> I've been using this to verify recent changes to zerocopy tuning [1].
>> Sharing more widely, as it may be useful in similar future work.
>>
>> Use ethtool stats as interface, as these are defined per device
>> driver and can easily be extended.
>>
>> Make the zerocopy release callback take an extra hop through the tun
>> driver to allow the driver to increment its counters.
>>
>> Care must be taken to avoid adding an alloc/free to this hot path.
>> Since the caller already must allocate a ubuf_info, make it allocate
>> two at a time and grant one to the tun device.
>>
>>  1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices
>>  2/3: add zerocopy tx and tx_err counters
>>  3/3: convert vhost_net to pass a pair of ubuf_info to tun
>>
>> [1] http://patchwork.ozlabs.org/patch/822613/
>
> This looks mostly fine to me, but I don't know enough about how vhost
> and tap interact to tell whether this makes sense to upstream.

Thanks for taking a look. The need for monitoring these stats has
come up in a couple of patch evaluation discussions, so I wanted
to share at least one implementation to get the data.

Because the choice to use zerocopy is based on heuristics and
there is a cost if it mispredicts, I think we even want to being able
to continuously monitor this in production.

The implementation is probably not ready for that as is.

> What are the runtime costs for these new statistics?

It currently doubles the size of the ubuf_info memory pool. That can be
fixed, as the current size is UIO_MAXIOV (1024), but the number of
zerocopy packets in flight is bound by VHOST_MAX_PEND (128).

It also adds an indirect function call to call to each zerocopy skb free
path, though.

If there is a way to expose these stats through vhost_net directly,
instead of through tun, that may be better. But I did not see a
suitable interface. Perhaps debugfs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-10 15:29   ` Willem de Bruijn
@ 2017-10-10 17:23     ` Stephen Hemminger
  2017-10-10 17:39     ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2017-10-10 17:23 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn

On Tue, 10 Oct 2017 11:29:33 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> On Mon, Oct 9, 2017 at 11:52 PM, David Miller <davem@davemloft.net> wrote:
> > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Date: Fri,  6 Oct 2017 18:25:13 -0400
> >  
> >> From: Willem de Bruijn <willemb@google.com>
> >>
> >> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path.
> >>
> >> I've been using this to verify recent changes to zerocopy tuning [1].
> >> Sharing more widely, as it may be useful in similar future work.
> >>
> >> Use ethtool stats as interface, as these are defined per device
> >> driver and can easily be extended.
> >>
> >> Make the zerocopy release callback take an extra hop through the tun
> >> driver to allow the driver to increment its counters.
> >>
> >> Care must be taken to avoid adding an alloc/free to this hot path.
> >> Since the caller already must allocate a ubuf_info, make it allocate
> >> two at a time and grant one to the tun device.
> >>
> >>  1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices
> >>  2/3: add zerocopy tx and tx_err counters
> >>  3/3: convert vhost_net to pass a pair of ubuf_info to tun
> >>
> >> [1] http://patchwork.ozlabs.org/patch/822613/  
> >
> > This looks mostly fine to me, but I don't know enough about how vhost
> > and tap interact to tell whether this makes sense to upstream.  
> 
> Thanks for taking a look. The need for monitoring these stats has
> come up in a couple of patch evaluation discussions, so I wanted
> to share at least one implementation to get the data.
> 
> Because the choice to use zerocopy is based on heuristics and
> there is a cost if it mispredicts, I think we even want to being able
> to continuously monitor this in production.
> 
> The implementation is probably not ready for that as is.

Another alternative is to use tracepoints for this.
If you need statistics in production then per-cpu (or per-queue) stats
would have less impact. Tracepoints have no visible impact unless used.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-10 15:29   ` Willem de Bruijn
  2017-10-10 17:23     ` Stephen Hemminger
@ 2017-10-10 17:39     ` David Miller
  2017-10-10 19:11       ` Willem de Bruijn
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2017-10-10 17:39 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, mst, jasowang, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 10 Oct 2017 11:29:33 -0400

> If there is a way to expose these stats through vhost_net directly,
> instead of through tun, that may be better. But I did not see a
> suitable interface. Perhaps debugfs.

Please don't use debugfs, thank you :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-10 17:39     ` David Miller
@ 2017-10-10 19:11       ` Willem de Bruijn
  2017-10-11  3:15         ` Jason Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-10 19:11 UTC (permalink / raw)
  To: David Miller
  Cc: Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn

On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Tue, 10 Oct 2017 11:29:33 -0400
>
>> If there is a way to expose these stats through vhost_net directly,
>> instead of through tun, that may be better. But I did not see a
>> suitable interface. Perhaps debugfs.
>
> Please don't use debugfs, thank you :-)

Okay. I'll take a look at tracing for on-demand measurement.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-10 19:11       ` Willem de Bruijn
@ 2017-10-11  3:15         ` Jason Wang
  2017-10-11 21:44           ` Willem de Bruijn
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2017-10-11  3:15 UTC (permalink / raw)
  To: Willem de Bruijn, David Miller
  Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn



On 2017年10月11日 03:11, Willem de Bruijn wrote:
> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote:
>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>> Date: Tue, 10 Oct 2017 11:29:33 -0400
>>
>>> If there is a way to expose these stats through vhost_net directly,
>>> instead of through tun, that may be better. But I did not see a
>>> suitable interface. Perhaps debugfs.
>> Please don't use debugfs, thank you :-)
> Okay. I'll take a look at tracing for on-demand measurement.

This reminds me a past series that adding tracepoints to vhost/net[1]. 
It can count zero/datacopy independently and even contains a sample 
program to show the stats.

Thanks

[1] https://lists.oasis-open.org/archives/virtio-dev/201403/msg00025.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-11  3:15         ` Jason Wang
@ 2017-10-11 21:44           ` Willem de Bruijn
  2017-10-12 11:21             ` Jason Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2017-10-11 21:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, Network Development, Michael S. Tsirkin,
	Willem de Bruijn

On Tue, Oct 10, 2017 at 11:15 PM, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2017年10月11日 03:11, Willem de Bruijn wrote:
>>
>> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote:
>>>
>>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>>> Date: Tue, 10 Oct 2017 11:29:33 -0400
>>>
>>>> If there is a way to expose these stats through vhost_net directly,
>>>> instead of through tun, that may be better. But I did not see a
>>>> suitable interface. Perhaps debugfs.
>>>
>>> Please don't use debugfs, thank you :-)
>>
>> Okay. I'll take a look at tracing for on-demand measurement.
>
>
> This reminds me a past series that adding tracepoints to vhost/net[1]. It
> can count zero/datacopy independently and even contains a sample program to
> show the stats.

Interesting, thanks!

For occasional evaluation, we can also use a bpf kprobe for the time being:

  bpf_program = """
  #include <uapi/linux/ptrace.h>
  #include <bcc/proto.h>

  BPF_ARRAY(count, u64, 2);

  void inc_counter(struct pt_regs *ctx) {
      bool success;
      int key;
      u64 *val;

      success = PT_REGS_PARM2(ctx);
      key = success ? 0 : 1;
      val = count.lookup(&key);
      if (val)
          lock_xadd(val, 1);
  }
  """

  b = bcc.BPF(text=bpf_program)
  b.attach_kprobe(event="vhost_zerocopy_callback", fn_name="inc_counter")

  time.sleep(5)

  print("vhost_zerocopy_callback: Y:%d N:%d" %
        (b["count"][ctypes.c_int(0)].value,
         b["count"][ctypes.c_int(1)].value))

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/3] tun zerocopy stats
  2017-10-11 21:44           ` Willem de Bruijn
@ 2017-10-12 11:21             ` Jason Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Wang @ 2017-10-12 11:21 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Network Development, Michael S. Tsirkin,
	Willem de Bruijn



On 2017年10月12日 05:44, Willem de Bruijn wrote:
> On Tue, Oct 10, 2017 at 11:15 PM, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2017年10月11日 03:11, Willem de Bruijn wrote:
>>> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>>>> Date: Tue, 10 Oct 2017 11:29:33 -0400
>>>>
>>>>> If there is a way to expose these stats through vhost_net directly,
>>>>> instead of through tun, that may be better. But I did not see a
>>>>> suitable interface. Perhaps debugfs.
>>>> Please don't use debugfs, thank you :-)
>>> Okay. I'll take a look at tracing for on-demand measurement.
>>
>> This reminds me a past series that adding tracepoints to vhost/net[1]. It
>> can count zero/datacopy independently and even contains a sample program to
>> show the stats.
> Interesting, thanks!
>
> For occasional evaluation, we can also use a bpf kprobe for the time being:
>
>    bpf_program = """
>    #include <uapi/linux/ptrace.h>
>    #include <bcc/proto.h>
>
>    BPF_ARRAY(count, u64, 2);
>
>    void inc_counter(struct pt_regs *ctx) {
>        bool success;
>        int key;
>        u64 *val;
>
>        success = PT_REGS_PARM2(ctx);
>        key = success ? 0 : 1;
>        val = count.lookup(&key);
>        if (val)
>            lock_xadd(val, 1);
>    }
>    """
>
>    b = bcc.BPF(text=bpf_program)
>    b.attach_kprobe(event="vhost_zerocopy_callback", fn_name="inc_counter")
>
>    time.sleep(5)
>
>    print("vhost_zerocopy_callback: Y:%d N:%d" %
>          (b["count"][ctypes.c_int(0)].value,
>           b["count"][ctypes.c_int(1)].value))

Thanks for the tips, looks flexible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-12 11:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn
2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn
2017-10-06 22:30   ` Stephen Hemminger
2017-10-06 22:37     ` Willem de Bruijn
2017-10-06 23:12       ` Stephen Hemminger
2017-10-06 23:26         ` David Miller
2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn
2017-10-06 22:32   ` Willem de Bruijn
2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn
2017-10-10  3:52 ` [PATCH RFC 0/3] " David Miller
2017-10-10 15:29   ` Willem de Bruijn
2017-10-10 17:23     ` Stephen Hemminger
2017-10-10 17:39     ` David Miller
2017-10-10 19:11       ` Willem de Bruijn
2017-10-11  3:15         ` Jason Wang
2017-10-11 21:44           ` Willem de Bruijn
2017-10-12 11:21             ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).