From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [RFC PATCH net-next] net: pktgen: packet bursting via skb->xmit_more Date: Fri, 26 Sep 2014 10:05:42 +0200 Message-ID: <20140926100542.7e543c4e@redhat.com> References: <1411692382-8898-1-git-send-email-ast@plumgrid.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Eric Dumazet , John Fastabend , netdev@vger.kernel.org, brouer@redhat.com To: Alexei Starovoitov Return-path: Received: from mx1.redhat.com ([209.132.183.28]:48703 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753404AbaIZIFu (ORCPT ); Fri, 26 Sep 2014 04:05:50 -0400 In-Reply-To: <1411692382-8898-1-git-send-email-ast@plumgrid.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 25 Sep 2014 17:46:22 -0700 Alexei Starovoitov wrote: > This patch demonstrates the effect of delaying update of HW tailptr. > (based on earlier patch by Jesper) > > burst=1 is a default. It sends one packet with xmit_more=false > burst=2 sends one packet with xmit_more=true and > 2nd copy of the same packet with xmit_more=false > burst=3 sends two copies of the same packet with xmit_more=true and > 3rd copy with xmit_more=false > > Performance with ixgbe: > > usec 30: > burst=1 tx:9.2 Mpps > burst=2 tx:13.6 Mpps > burst=3 tx:14.5 Mpps full 10G line rate Perfect, full wirespeed! :-) > usec 1 (default): > burst=1,4,100 tx:3.9 Mpps Here you are being limited by the TX ring queue cleanup, being too slow. As desc here: http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html > usec 0: > burst=1 tx:4.9 Mpps > burst=2 tx:6.6 Mpps > burst=3 tx:7.9 Mpps > burst=4 tx:8.7 Mpps > burst=8 tx:10.3 Mpps > burst=128 tx:12.4 Mpps > > Cc: Jesper Dangaard Brouer > Signed-off-by: Alexei Starovoitov > --- Acked-by: Jesper Dangaard Brouer > tx queue size, irq affinity left in default. > pause frames are off. > > Nice to finally see line rate generated by one cpu Yes, > Comparing to Jesper patch this one amortizes the cost > of spin_lock and atomic_inc by doing HARD_TX_LOCK and > atomic_add(N) once across N packets. Nice additional optimizations :-) > net/core/pktgen.c | 33 ++++++++++++++++++++++++++++++--- > 1 file changed, 30 insertions(+), 3 deletions(-) > > diff --git a/net/core/pktgen.c b/net/core/pktgen.c > index 5c728aa..47557ba 100644 > --- a/net/core/pktgen.c > +++ b/net/core/pktgen.c > @@ -387,6 +387,7 @@ struct pktgen_dev { > u16 queue_map_min; > u16 queue_map_max; > __u32 skb_priority; /* skb priority field */ > + int burst; /* number of duplicated packets to burst */ > int node; /* Memory node */ > > #ifdef CONFIG_XFRM [...] > @@ -3299,7 +3313,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev) > { > struct net_device *odev = pkt_dev->odev; > struct netdev_queue *txq; > - int ret; > + int burst_cnt, ret; > + bool more; > > /* If device is offline, then don't send */ > if (unlikely(!netif_running(odev) || !netif_carrier_ok(odev))) { > @@ -3347,8 +3362,14 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev) > pkt_dev->last_ok = 0; > goto unlock; > } > - atomic_inc(&(pkt_dev->skb->users)); > - ret = netdev_start_xmit(pkt_dev->skb, odev, txq, false); > + atomic_add(pkt_dev->burst, &pkt_dev->skb->users); > + > + burst_cnt = 0; > + > +xmit_more: > + more = ++burst_cnt < pkt_dev->burst; > + > + ret = netdev_start_xmit(pkt_dev->skb, odev, txq, more); > > switch (ret) { > case NETDEV_TX_OK: > @@ -3356,6 +3377,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev) > pkt_dev->sofar++; > pkt_dev->seq_num++; > pkt_dev->tx_bytes += pkt_dev->last_pkt_size; > + if (more) > + goto xmit_more; I think this will break my VLAN hack mode, that allows me to shoot pktgen after the qdisc layer, but I'm okay with that, as I can just avoid using this new burst mode and then it will still work for me. > break; > case NET_XMIT_DROP: > case NET_XMIT_CN: > @@ -3374,6 +3397,9 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev) > atomic_dec(&(pkt_dev->skb->users)); > pkt_dev->last_ok = 0; > } > + > + if (unlikely(pkt_dev->burst - burst_cnt > 0)) > + atomic_sub(pkt_dev->burst - burst_cnt, &pkt_dev->skb->users); > unlock: > HARD_TX_UNLOCK(odev, txq); > -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer