* [PATCH RFC 0/3] tun zerocopy stats @ 2017-10-06 22:25 Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw) To: netdev; +Cc: mst, jasowang, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path. I've been using this to verify recent changes to zerocopy tuning [1]. Sharing more widely, as it may be useful in similar future work. Use ethtool stats as interface, as these are defined per device driver and can easily be extended. Make the zerocopy release callback take an extra hop through the tun driver to allow the driver to increment its counters. Care must be taken to avoid adding an alloc/free to this hot path. Since the caller already must allocate a ubuf_info, make it allocate two at a time and grant one to the tun device. 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices 2/3: add zerocopy tx and tx_err counters 3/3: convert vhost_net to pass a pair of ubuf_info to tun [1] http://patchwork.ozlabs.org/patch/822613/ Willem de Bruijn (3): tun: ethtool stats tun: expand ethtool stats with zerocopy vhost_net: support tun zerocopy stats drivers/net/tun.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++---- drivers/vhost/net.c | 7 ++-- 2 files changed, 108 insertions(+), 10 deletions(-) -- 2.14.2.920.gcf0c67979c-goog ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 1/3] tun: ethtool stats 2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn @ 2017-10-06 22:25 ` Willem de Bruijn 2017-10-06 22:30 ` Stephen Hemminger 2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw) To: netdev; +Cc: mst, jasowang, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Support ethtool -S on tun devices. This interface allows exporting device-specific stats not present in rtnl stats. Signed-off-by: Willem de Bruijn <willemb@google.com> --- drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 57e4c31fa84a..df6ef9670d05 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -194,6 +194,15 @@ struct tun_flow_entry { #define TUN_NUM_FLOW_ENTRIES 1024 +static const struct { + const char string[ETH_GSTRING_LEN]; +} tun_ethtool_stats_keys[] = { + { "rx_packets" }, + { "tx_packets" }, + { "rx_bytes" }, + { "tx_bytes" }, +}; + /* Since the socket were moved to tun_file, to preserve the behavior of persist * device, socket filter, sndbuf and vnet header size were restore when the * file were attached to a persist device. @@ -2954,6 +2963,32 @@ static int tun_set_coalesce(struct net_device *dev, return 0; } +static int tun_get_sset_count(struct net_device *dev, int string_set) +{ + if (string_set == ETH_SS_STATS) + return ARRAY_SIZE(tun_ethtool_stats_keys); + + return -EINVAL; +} + +static void tun_get_strings(struct net_device *dev, u32 stringset, u8 *data) +{ + if (stringset == ETH_SS_STATS) + memcpy(data, &tun_ethtool_stats_keys, + sizeof(tun_ethtool_stats_keys)); +} + +static void tun_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *stats, u64 *data) +{ + const int ethtool_stats_bytelen = + ARRAY_SIZE(tun_ethtool_stats_keys) * sizeof(u64); + struct rtnl_link_stats64 link_stats64 = {0}; + + tun_net_get_stats64(dev, &link_stats64); + memcpy(data, &link_stats64, ethtool_stats_bytelen); +} + static const struct ethtool_ops tun_ethtool_ops = { .get_drvinfo = tun_get_drvinfo, .get_msglevel = tun_get_msglevel, @@ -2963,6 +2998,9 @@ static const struct ethtool_ops tun_ethtool_ops = { .get_coalesce = tun_get_coalesce, .set_coalesce = tun_set_coalesce, .get_link_ksettings = tun_get_link_ksettings, + .get_sset_count = tun_get_sset_count, + .get_strings = tun_get_strings, + .get_ethtool_stats = tun_get_ethtool_stats, }; static int tun_queue_resize(struct tun_struct *tun) -- 2.14.2.920.gcf0c67979c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/3] tun: ethtool stats 2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn @ 2017-10-06 22:30 ` Stephen Hemminger 2017-10-06 22:37 ` Willem de Bruijn 0 siblings, 1 reply; 17+ messages in thread From: Stephen Hemminger @ 2017-10-06 22:30 UTC (permalink / raw) To: Willem de Bruijn; +Cc: netdev, mst, jasowang, Willem de Bruijn On Fri, 6 Oct 2017 18:25:14 -0400 Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > From: Willem de Bruijn <willemb@google.com> > > Support ethtool -S on tun devices. This interface allows exporting > device-specific stats not present in rtnl stats. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++ > 1 file changed, 38 insertions(+) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 57e4c31fa84a..df6ef9670d05 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -194,6 +194,15 @@ struct tun_flow_entry { > > #define TUN_NUM_FLOW_ENTRIES 1024 > > +static const struct { > + const char string[ETH_GSTRING_LEN]; > +} tun_ethtool_stats_keys[] = { > + { "rx_packets" }, > + { "tx_packets" }, > + { "rx_bytes" }, > + { "tx_bytes" } It looks like you are just exporting the statistics taht are already through normal path. The purpose of ethtool stats is to provide statistics unique to the device, not to repeat what is available throug ip, ifconfig, etc. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/3] tun: ethtool stats 2017-10-06 22:30 ` Stephen Hemminger @ 2017-10-06 22:37 ` Willem de Bruijn 2017-10-06 23:12 ` Stephen Hemminger 0 siblings, 1 reply; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:37 UTC (permalink / raw) To: Stephen Hemminger Cc: Network Development, Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Fri, Oct 6, 2017 at 6:30 PM, Stephen Hemminger <stephen@networkplumber.org> wrote: > On Fri, 6 Oct 2017 18:25:14 -0400 > Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > >> From: Willem de Bruijn <willemb@google.com> >> >> Support ethtool -S on tun devices. This interface allows exporting >> device-specific stats not present in rtnl stats. >> >> Signed-off-by: Willem de Bruijn <willemb@google.com> >> --- >> drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 38 insertions(+) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index 57e4c31fa84a..df6ef9670d05 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -194,6 +194,15 @@ struct tun_flow_entry { >> >> #define TUN_NUM_FLOW_ENTRIES 1024 >> >> +static const struct { >> + const char string[ETH_GSTRING_LEN]; >> +} tun_ethtool_stats_keys[] = { >> + { "rx_packets" }, >> + { "tx_packets" }, >> + { "rx_bytes" }, >> + { "tx_bytes" } > > It looks like you are just exporting the statistics taht are already > through normal path. The purpose of ethtool stats is to provide > statistics unique to the device, not to repeat what is available throug > ip, ifconfig, etc. A follow-up patch adds non-standard statistics. I included these common ones, because the new zerocopy counters are only interesting in relation to tx_packets. It also seems customary among existing network drivers to include at least these base counters in the ethtool output. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/3] tun: ethtool stats 2017-10-06 22:37 ` Willem de Bruijn @ 2017-10-06 23:12 ` Stephen Hemminger 2017-10-06 23:26 ` David Miller 0 siblings, 1 reply; 17+ messages in thread From: Stephen Hemminger @ 2017-10-06 23:12 UTC (permalink / raw) To: Willem de Bruijn Cc: Network Development, Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Fri, 6 Oct 2017 18:37:01 -0400 Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > It looks like you are just exporting the statistics taht are already > > through normal path. The purpose of ethtool stats is to provide > > statistics unique to the device, not to repeat what is available throug > > ip, ifconfig, etc. > > A follow-up patch adds non-standard statistics. > > I included these common ones, because the new zerocopy counters > are only interesting in relation to tx_packets. > > It also seems customary among existing network drivers to include > at least these base counters in the ethtool output. Only some drivers that slipped through before I started noticing. Do Not Repeat. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/3] tun: ethtool stats 2017-10-06 23:12 ` Stephen Hemminger @ 2017-10-06 23:26 ` David Miller 0 siblings, 0 replies; 17+ messages in thread From: David Miller @ 2017-10-06 23:26 UTC (permalink / raw) To: stephen; +Cc: willemdebruijn.kernel, netdev, mst, jasowang, willemb From: Stephen Hemminger <stephen@networkplumber.org> Date: Fri, 6 Oct 2017 16:12:45 -0700 > On Fri, 6 Oct 2017 18:37:01 -0400 > Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > >> > It looks like you are just exporting the statistics taht are already >> > through normal path. The purpose of ethtool stats is to provide >> > statistics unique to the device, not to repeat what is available throug >> > ip, ifconfig, etc. >> >> A follow-up patch adds non-standard statistics. >> >> I included these common ones, because the new zerocopy counters >> are only interesting in relation to tx_packets. >> >> It also seems customary among existing network drivers to include >> at least these base counters in the ethtool output. > > Only some drivers that slipped through before I started noticing. > Do Not Repeat. Agreed, that is a bad practice unless it is being done to export per-queue versions of those standard netdev statistics. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy 2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn @ 2017-10-06 22:25 ` Willem de Bruijn 2017-10-06 22:32 ` Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn 2017-10-10 3:52 ` [PATCH RFC 0/3] " David Miller 3 siblings, 1 reply; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw) To: netdev; +Cc: mst, jasowang, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Add two counters to tun ethool stats to count zerocopy completions. - tx_zerocopy counts completions that succeeded without copy. - tx_zerocopy_err counts those that triggered a copy. Tun intercepts completions by replacing the zerocopy completion handler (ubuf_info) prepared by the packet source (vhost-net) with its own. To avoid adding another ubuf_info alloc/free in the datapath, only enter this mode if the packet source passed an array of two ubuf_info to the tun device. Pass msg_controllen to tun_get_user to detect this. Signed-off-by: Willem de Bruijn <willemb@google.com> --- drivers/net/tun.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 69 insertions(+), 10 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index df6ef9670d05..286787d8e875 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -149,6 +149,8 @@ struct tun_pcpu_stats { u32 rx_dropped; u32 tx_dropped; u32 rx_frame_errors; + u64 tx_zerocopy; + u64 tx_zerocopy_err; }; /* A tun_file connects an open character device to a tuntap netdevice. It @@ -201,6 +203,8 @@ static const struct { { "tx_packets" }, { "rx_bytes" }, { "tx_bytes" }, + { "tx_zerocopy" }, + { "tx_zerocopy_err" }, }; /* Since the socket were moved to tun_file, to preserve the behavior of persist @@ -1082,8 +1086,9 @@ static void tun_set_headroom(struct net_device *dev, int new_hr) tun->align = new_hr; } -static void -tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) +static void __tun_net_get_stats64(struct net_device *dev, + struct rtnl_link_stats64 *stats, + u64 *ethtool_stats) { u32 rx_dropped = 0, tx_dropped = 0, rx_frame_errors = 0; struct tun_struct *tun = netdev_priv(dev); @@ -1103,6 +1108,16 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) txbytes = p->tx_bytes; } while (u64_stats_fetch_retry(&p->syncp, start)); + if (ethtool_stats) { + ethtool_stats[0] += rxpackets; + ethtool_stats[1] += txpackets; + ethtool_stats[2] += rxbytes; + ethtool_stats[3] += txbytes; + ethtool_stats[4] += p->tx_zerocopy; + ethtool_stats[5] += p->tx_zerocopy_err; + continue; + } + stats->rx_packets += rxpackets; stats->rx_bytes += rxbytes; stats->tx_packets += txpackets; @@ -1113,11 +1128,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) rx_frame_errors += p->rx_frame_errors; tx_dropped += p->tx_dropped; } + + if (ethtool_stats) + return; + stats->rx_dropped = rx_dropped; stats->rx_frame_errors = rx_frame_errors; stats->tx_dropped = tx_dropped; } +static void tun_net_get_stats64(struct net_device *dev, + struct rtnl_link_stats64 *stats) +{ + __tun_net_get_stats64(dev, stats, NULL); +} + static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -1537,10 +1562,44 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, return NULL; } +static void tun_zerocopy_callback(struct ubuf_info *ubuf, bool success) +{ + struct tun_struct *tun = ubuf->ctx; + + if (success) + this_cpu_inc(tun->pcpu_stats->tx_zerocopy); + else + this_cpu_inc(tun->pcpu_stats->tx_zerocopy_err); + + ubuf = ubuf - 1; + ubuf->callback(ubuf, success); +} + +static void tun_set_zerocopy(struct tun_struct *tun, struct sk_buff *skb, + struct ubuf_info *ubuf, size_t len) +{ + const int ulen = sizeof(*ubuf); + + if (len != ulen && len != ulen * 2) { + WARN_ON_ONCE(1); + return; + } + + /* if caller passed two ubuf, one is for tun to interpose callback */ + if (len == ulen * 2) { + ubuf = ubuf + 1; + ubuf->callback = tun_zerocopy_callback; + ubuf->ctx = tun; + refcount_set(&ubuf->refcnt, 1); + } + + skb_shinfo(skb)->destructor_arg = ubuf; +} + /* Get packet from user space buffer */ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, - void *msg_control, struct iov_iter *from, - int noblock, bool more) + void *msg_control, size_t msg_controllen, + struct iov_iter *from, int noblock, bool more) { struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) }; struct sk_buff *skb; @@ -1713,7 +1772,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { - skb_shinfo(skb)->destructor_arg = msg_control; + tun_set_zerocopy(tun, skb, (void *)msg_control, msg_controllen); skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; } else if (msg_control) { @@ -1798,7 +1857,7 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!tun) return -EBADFD; - result = tun_get_user(tun, tfile, NULL, from, + result = tun_get_user(tun, tfile, NULL, 0, from, file->f_flags & O_NONBLOCK, false); tun_put(tun); @@ -2058,7 +2117,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) if (!tun) return -EBADFD; - ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, + ret = tun_get_user(tun, tfile, m->msg_control, m->msg_controllen, + &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); tun_put(tun); @@ -2983,10 +3043,9 @@ static void tun_get_ethtool_stats(struct net_device *dev, { const int ethtool_stats_bytelen = ARRAY_SIZE(tun_ethtool_stats_keys) * sizeof(u64); - struct rtnl_link_stats64 link_stats64 = {0}; - tun_net_get_stats64(dev, &link_stats64); - memcpy(data, &link_stats64, ethtool_stats_bytelen); + memset(data, 0, ethtool_stats_bytelen); + __tun_net_get_stats64(dev, NULL, data); } static const struct ethtool_ops tun_ethtool_ops = { -- 2.14.2.920.gcf0c67979c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy 2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn @ 2017-10-06 22:32 ` Willem de Bruijn 0 siblings, 0 replies; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:32 UTC (permalink / raw) To: Network Development; +Cc: Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Fri, Oct 6, 2017 at 6:25 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > From: Willem de Bruijn <willemb@google.com> > > Add two counters to tun ethool stats to count zerocopy completions. > - tx_zerocopy counts completions that succeeded without copy. > - tx_zerocopy_err counts those that triggered a copy. > > Tun intercepts completions by replacing the zerocopy completion > handler (ubuf_info) prepared by the packet source (vhost-net) > with its own. > > To avoid adding another ubuf_info alloc/free in the datapath, only > enter this mode if the packet source passed an array of two ubuf_info > to the tun device. > > Pass msg_controllen to tun_get_user to detect this. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > drivers/net/tun.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 69 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index df6ef9670d05..286787d8e875 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -149,6 +149,8 @@ struct tun_pcpu_stats { > u32 rx_dropped; > u32 tx_dropped; > u32 rx_frame_errors; > + u64 tx_zerocopy; > + u64 tx_zerocopy_err; > }; > > /* A tun_file connects an open character device to a tuntap netdevice. It > @@ -201,6 +203,8 @@ static const struct { > { "tx_packets" }, > { "rx_bytes" }, > { "tx_bytes" }, > + { "tx_zerocopy" }, > + { "tx_zerocopy_err" }, > }; > > /* Since the socket were moved to tun_file, to preserve the behavior of persist > @@ -1082,8 +1086,9 @@ static void tun_set_headroom(struct net_device *dev, int new_hr) > tun->align = new_hr; > } > > -static void > -tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > +static void __tun_net_get_stats64(struct net_device *dev, > + struct rtnl_link_stats64 *stats, > + u64 *ethtool_stats) > { > u32 rx_dropped = 0, tx_dropped = 0, rx_frame_errors = 0; > struct tun_struct *tun = netdev_priv(dev); > @@ -1103,6 +1108,16 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > txbytes = p->tx_bytes; > } while (u64_stats_fetch_retry(&p->syncp, start)); > > + if (ethtool_stats) { > + ethtool_stats[0] += rxpackets; > + ethtool_stats[1] += txpackets; > + ethtool_stats[2] += rxbytes; > + ethtool_stats[3] += txbytes; > + ethtool_stats[4] += p->tx_zerocopy; > + ethtool_stats[5] += p->tx_zerocopy_err; > + continue; > + } > + > stats->rx_packets += rxpackets; > stats->rx_bytes += rxbytes; > stats->tx_packets += txpackets; > @@ -1113,11 +1128,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > rx_frame_errors += p->rx_frame_errors; > tx_dropped += p->tx_dropped; > } > + > + if (ethtool_stats) > + return; > + > stats->rx_dropped = rx_dropped; > stats->rx_frame_errors = rx_frame_errors; > stats->tx_dropped = tx_dropped; > } > > +static void tun_net_get_stats64(struct net_device *dev, > + struct rtnl_link_stats64 *stats) > +{ > + __tun_net_get_stats64(dev, stats, NULL); > +} > + > static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog, > struct netlink_ext_ack *extack) > { > @@ -1537,10 +1562,44 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, > return NULL; > } > > +static void tun_zerocopy_callback(struct ubuf_info *ubuf, bool success) > +{ > + struct tun_struct *tun = ubuf->ctx; This might need an explicit reference to avoid the tun device going away. Noticed that too late.. need to take a closer look. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 3/3] vhost_net: support tun zerocopy stats 2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn @ 2017-10-06 22:25 ` Willem de Bruijn 2017-10-10 3:52 ` [PATCH RFC 0/3] " David Miller 3 siblings, 0 replies; 17+ messages in thread From: Willem de Bruijn @ 2017-10-06 22:25 UTC (permalink / raw) To: netdev; +Cc: mst, jasowang, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Allow the tun device to intercept zerocopy completions to increment its counters. Pass an array of two struct ubuf_info to the device. The first is initialized as usual and used to notify vhost_net. The second is granted to the tun device for its callback. Use the existing per-device ubuf_info pool, but take out two elements at a time, so double the size. Signed-off-by: Willem de Bruijn <willemb@google.com> --- drivers/vhost/net.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 58585ec8699e..22988a7df656 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -47,6 +47,7 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" /* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 #define VHOST_GOODCOPY_LEN 256 +#define UBUFLEN 2 /* Number of consecutive ubuf_info passed to tun */ /* * For transmit, used buffer len is unused; we override it to track buffer @@ -254,7 +255,7 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n) if (!zcopy) continue; n->vqs[i].ubuf_info = kmalloc(sizeof(*n->vqs[i].ubuf_info) * - UIO_MAXIOV, GFP_KERNEL); + UIO_MAXIOV * UBUFLEN, GFP_KERNEL); if (!n->vqs[i].ubuf_info) goto err; } @@ -526,7 +527,7 @@ static void handle_tx(struct vhost_net *net) /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { struct ubuf_info *ubuf; - ubuf = nvq->ubuf_info + nvq->upend_idx; + ubuf = nvq->ubuf_info + (nvq->upend_idx * UBUFLEN); vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; @@ -535,7 +536,7 @@ static void handle_tx(struct vhost_net *net) ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); msg.msg_control = ubuf; - msg.msg_controllen = sizeof(ubuf); + msg.msg_controllen = sizeof(*ubuf) * UBUFLEN; ubufs = nvq->ubufs; atomic_inc(&ubufs->refcount); nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV; -- 2.14.2.920.gcf0c67979c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn ` (2 preceding siblings ...) 2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn @ 2017-10-10 3:52 ` David Miller 2017-10-10 15:29 ` Willem de Bruijn 3 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2017-10-10 3:52 UTC (permalink / raw) To: willemdebruijn.kernel; +Cc: netdev, mst, jasowang, willemb From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Fri, 6 Oct 2017 18:25:13 -0400 > From: Willem de Bruijn <willemb@google.com> > > Add zerocopy transfer statistics to the vhost_net/tun zerocopy path. > > I've been using this to verify recent changes to zerocopy tuning [1]. > Sharing more widely, as it may be useful in similar future work. > > Use ethtool stats as interface, as these are defined per device > driver and can easily be extended. > > Make the zerocopy release callback take an extra hop through the tun > driver to allow the driver to increment its counters. > > Care must be taken to avoid adding an alloc/free to this hot path. > Since the caller already must allocate a ubuf_info, make it allocate > two at a time and grant one to the tun device. > > 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices > 2/3: add zerocopy tx and tx_err counters > 3/3: convert vhost_net to pass a pair of ubuf_info to tun > > [1] http://patchwork.ozlabs.org/patch/822613/ This looks mostly fine to me, but I don't know enough about how vhost and tap interact to tell whether this makes sense to upstream. What are the runtime costs for these new statistics? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-10 3:52 ` [PATCH RFC 0/3] " David Miller @ 2017-10-10 15:29 ` Willem de Bruijn 2017-10-10 17:23 ` Stephen Hemminger 2017-10-10 17:39 ` David Miller 0 siblings, 2 replies; 17+ messages in thread From: Willem de Bruijn @ 2017-10-10 15:29 UTC (permalink / raw) To: David Miller Cc: Network Development, Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Mon, Oct 9, 2017 at 11:52 PM, David Miller <davem@davemloft.net> wrote: > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Date: Fri, 6 Oct 2017 18:25:13 -0400 > >> From: Willem de Bruijn <willemb@google.com> >> >> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path. >> >> I've been using this to verify recent changes to zerocopy tuning [1]. >> Sharing more widely, as it may be useful in similar future work. >> >> Use ethtool stats as interface, as these are defined per device >> driver and can easily be extended. >> >> Make the zerocopy release callback take an extra hop through the tun >> driver to allow the driver to increment its counters. >> >> Care must be taken to avoid adding an alloc/free to this hot path. >> Since the caller already must allocate a ubuf_info, make it allocate >> two at a time and grant one to the tun device. >> >> 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices >> 2/3: add zerocopy tx and tx_err counters >> 3/3: convert vhost_net to pass a pair of ubuf_info to tun >> >> [1] http://patchwork.ozlabs.org/patch/822613/ > > This looks mostly fine to me, but I don't know enough about how vhost > and tap interact to tell whether this makes sense to upstream. Thanks for taking a look. The need for monitoring these stats has come up in a couple of patch evaluation discussions, so I wanted to share at least one implementation to get the data. Because the choice to use zerocopy is based on heuristics and there is a cost if it mispredicts, I think we even want to being able to continuously monitor this in production. The implementation is probably not ready for that as is. > What are the runtime costs for these new statistics? It currently doubles the size of the ubuf_info memory pool. That can be fixed, as the current size is UIO_MAXIOV (1024), but the number of zerocopy packets in flight is bound by VHOST_MAX_PEND (128). It also adds an indirect function call to call to each zerocopy skb free path, though. If there is a way to expose these stats through vhost_net directly, instead of through tun, that may be better. But I did not see a suitable interface. Perhaps debugfs. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-10 15:29 ` Willem de Bruijn @ 2017-10-10 17:23 ` Stephen Hemminger 2017-10-10 17:39 ` David Miller 1 sibling, 0 replies; 17+ messages in thread From: Stephen Hemminger @ 2017-10-10 17:23 UTC (permalink / raw) To: Willem de Bruijn Cc: David Miller, Network Development, Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Tue, 10 Oct 2017 11:29:33 -0400 Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > On Mon, Oct 9, 2017 at 11:52 PM, David Miller <davem@davemloft.net> wrote: > > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > Date: Fri, 6 Oct 2017 18:25:13 -0400 > > > >> From: Willem de Bruijn <willemb@google.com> > >> > >> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path. > >> > >> I've been using this to verify recent changes to zerocopy tuning [1]. > >> Sharing more widely, as it may be useful in similar future work. > >> > >> Use ethtool stats as interface, as these are defined per device > >> driver and can easily be extended. > >> > >> Make the zerocopy release callback take an extra hop through the tun > >> driver to allow the driver to increment its counters. > >> > >> Care must be taken to avoid adding an alloc/free to this hot path. > >> Since the caller already must allocate a ubuf_info, make it allocate > >> two at a time and grant one to the tun device. > >> > >> 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices > >> 2/3: add zerocopy tx and tx_err counters > >> 3/3: convert vhost_net to pass a pair of ubuf_info to tun > >> > >> [1] http://patchwork.ozlabs.org/patch/822613/ > > > > This looks mostly fine to me, but I don't know enough about how vhost > > and tap interact to tell whether this makes sense to upstream. > > Thanks for taking a look. The need for monitoring these stats has > come up in a couple of patch evaluation discussions, so I wanted > to share at least one implementation to get the data. > > Because the choice to use zerocopy is based on heuristics and > there is a cost if it mispredicts, I think we even want to being able > to continuously monitor this in production. > > The implementation is probably not ready for that as is. Another alternative is to use tracepoints for this. If you need statistics in production then per-cpu (or per-queue) stats would have less impact. Tracepoints have no visible impact unless used. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-10 15:29 ` Willem de Bruijn 2017-10-10 17:23 ` Stephen Hemminger @ 2017-10-10 17:39 ` David Miller 2017-10-10 19:11 ` Willem de Bruijn 1 sibling, 1 reply; 17+ messages in thread From: David Miller @ 2017-10-10 17:39 UTC (permalink / raw) To: willemdebruijn.kernel; +Cc: netdev, mst, jasowang, willemb From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Tue, 10 Oct 2017 11:29:33 -0400 > If there is a way to expose these stats through vhost_net directly, > instead of through tun, that may be better. But I did not see a > suitable interface. Perhaps debugfs. Please don't use debugfs, thank you :-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-10 17:39 ` David Miller @ 2017-10-10 19:11 ` Willem de Bruijn 2017-10-11 3:15 ` Jason Wang 0 siblings, 1 reply; 17+ messages in thread From: Willem de Bruijn @ 2017-10-10 19:11 UTC (permalink / raw) To: David Miller Cc: Network Development, Michael S. Tsirkin, Jason Wang, Willem de Bruijn On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote: > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Date: Tue, 10 Oct 2017 11:29:33 -0400 > >> If there is a way to expose these stats through vhost_net directly, >> instead of through tun, that may be better. But I did not see a >> suitable interface. Perhaps debugfs. > > Please don't use debugfs, thank you :-) Okay. I'll take a look at tracing for on-demand measurement. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-10 19:11 ` Willem de Bruijn @ 2017-10-11 3:15 ` Jason Wang 2017-10-11 21:44 ` Willem de Bruijn 0 siblings, 1 reply; 17+ messages in thread From: Jason Wang @ 2017-10-11 3:15 UTC (permalink / raw) To: Willem de Bruijn, David Miller Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn On 2017年10月11日 03:11, Willem de Bruijn wrote: > On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote: >> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> >> Date: Tue, 10 Oct 2017 11:29:33 -0400 >> >>> If there is a way to expose these stats through vhost_net directly, >>> instead of through tun, that may be better. But I did not see a >>> suitable interface. Perhaps debugfs. >> Please don't use debugfs, thank you :-) > Okay. I'll take a look at tracing for on-demand measurement. This reminds me a past series that adding tracepoints to vhost/net[1]. It can count zero/datacopy independently and even contains a sample program to show the stats. Thanks [1] https://lists.oasis-open.org/archives/virtio-dev/201403/msg00025.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-11 3:15 ` Jason Wang @ 2017-10-11 21:44 ` Willem de Bruijn 2017-10-12 11:21 ` Jason Wang 0 siblings, 1 reply; 17+ messages in thread From: Willem de Bruijn @ 2017-10-11 21:44 UTC (permalink / raw) To: Jason Wang Cc: David Miller, Network Development, Michael S. Tsirkin, Willem de Bruijn On Tue, Oct 10, 2017 at 11:15 PM, Jason Wang <jasowang@redhat.com> wrote: > > > On 2017年10月11日 03:11, Willem de Bruijn wrote: >> >> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote: >>> >>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> >>> Date: Tue, 10 Oct 2017 11:29:33 -0400 >>> >>>> If there is a way to expose these stats through vhost_net directly, >>>> instead of through tun, that may be better. But I did not see a >>>> suitable interface. Perhaps debugfs. >>> >>> Please don't use debugfs, thank you :-) >> >> Okay. I'll take a look at tracing for on-demand measurement. > > > This reminds me a past series that adding tracepoints to vhost/net[1]. It > can count zero/datacopy independently and even contains a sample program to > show the stats. Interesting, thanks! For occasional evaluation, we can also use a bpf kprobe for the time being: bpf_program = """ #include <uapi/linux/ptrace.h> #include <bcc/proto.h> BPF_ARRAY(count, u64, 2); void inc_counter(struct pt_regs *ctx) { bool success; int key; u64 *val; success = PT_REGS_PARM2(ctx); key = success ? 0 : 1; val = count.lookup(&key); if (val) lock_xadd(val, 1); } """ b = bcc.BPF(text=bpf_program) b.attach_kprobe(event="vhost_zerocopy_callback", fn_name="inc_counter") time.sleep(5) print("vhost_zerocopy_callback: Y:%d N:%d" % (b["count"][ctypes.c_int(0)].value, b["count"][ctypes.c_int(1)].value)) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/3] tun zerocopy stats 2017-10-11 21:44 ` Willem de Bruijn @ 2017-10-12 11:21 ` Jason Wang 0 siblings, 0 replies; 17+ messages in thread From: Jason Wang @ 2017-10-12 11:21 UTC (permalink / raw) To: Willem de Bruijn Cc: David Miller, Network Development, Michael S. Tsirkin, Willem de Bruijn On 2017年10月12日 05:44, Willem de Bruijn wrote: > On Tue, Oct 10, 2017 at 11:15 PM, Jason Wang <jasowang@redhat.com> wrote: >> >> On 2017年10月11日 03:11, Willem de Bruijn wrote: >>> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote: >>>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> >>>> Date: Tue, 10 Oct 2017 11:29:33 -0400 >>>> >>>>> If there is a way to expose these stats through vhost_net directly, >>>>> instead of through tun, that may be better. But I did not see a >>>>> suitable interface. Perhaps debugfs. >>>> Please don't use debugfs, thank you :-) >>> Okay. I'll take a look at tracing for on-demand measurement. >> >> This reminds me a past series that adding tracepoints to vhost/net[1]. It >> can count zero/datacopy independently and even contains a sample program to >> show the stats. > Interesting, thanks! > > For occasional evaluation, we can also use a bpf kprobe for the time being: > > bpf_program = """ > #include <uapi/linux/ptrace.h> > #include <bcc/proto.h> > > BPF_ARRAY(count, u64, 2); > > void inc_counter(struct pt_regs *ctx) { > bool success; > int key; > u64 *val; > > success = PT_REGS_PARM2(ctx); > key = success ? 0 : 1; > val = count.lookup(&key); > if (val) > lock_xadd(val, 1); > } > """ > > b = bcc.BPF(text=bpf_program) > b.attach_kprobe(event="vhost_zerocopy_callback", fn_name="inc_counter") > > time.sleep(5) > > print("vhost_zerocopy_callback: Y:%d N:%d" % > (b["count"][ctypes.c_int(0)].value, > b["count"][ctypes.c_int(1)].value)) Thanks for the tips, looks flexible. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-10-12 11:21 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-06 22:25 [PATCH RFC 0/3] tun zerocopy stats Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 1/3] tun: ethtool stats Willem de Bruijn 2017-10-06 22:30 ` Stephen Hemminger 2017-10-06 22:37 ` Willem de Bruijn 2017-10-06 23:12 ` Stephen Hemminger 2017-10-06 23:26 ` David Miller 2017-10-06 22:25 ` [PATCH RFC 2/3] tun: expand ethtool stats with zerocopy Willem de Bruijn 2017-10-06 22:32 ` Willem de Bruijn 2017-10-06 22:25 ` [PATCH RFC 3/3] vhost_net: support tun zerocopy stats Willem de Bruijn 2017-10-10 3:52 ` [PATCH RFC 0/3] " David Miller 2017-10-10 15:29 ` Willem de Bruijn 2017-10-10 17:23 ` Stephen Hemminger 2017-10-10 17:39 ` David Miller 2017-10-10 19:11 ` Willem de Bruijn 2017-10-11 3:15 ` Jason Wang 2017-10-11 21:44 ` Willem de Bruijn 2017-10-12 11:21 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).