All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	linux-kernel@vger.kernel.org, kvm list <kvm@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version
Date: Mon, 20 Jul 2020 07:45:50 -0400	[thread overview]
Message-ID: <20200720074545-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d4e29f0451f7551ee3a408ecfa40de2de2b8aa75.camel@redhat.com>

On Mon, Jul 20, 2020 at 01:16:47PM +0200, Eugenio Pérez wrote:
> 
> On Mon, Jul 20, 2020 at 11:27 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jul 16, 2020 at 07:16:27PM +0200, Eugenio Perez Martin wrote:
> > > On Fri, Jul 10, 2020 at 7:58 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Fri, Jul 10, 2020 at 07:39:26AM +0200, Eugenio Perez Martin wrote:
> > > > > > > How about playing with the batch size? Make it a mod parameter instead
> > > > > > > of the hard coded 64, and measure for all values 1 to 64 ...
> > > > > > 
> > > > > > Right, according to the test result, 64 seems to be too aggressive in
> > > > > > the case of TX.
> > > > > > 
> > > > > 
> > > > > Got it, thanks both!
> > > > 
> > > > In particular I wonder whether with batch size 1
> > > > we get same performance as without batching
> > > > (would indicate 64 is too aggressive)
> > > > or not (would indicate one of the code changes
> > > > affects performance in an unexpected way).
> > > > 
> > > > --
> > > > MST
> > > > 
> > > 
> > > Hi!
> > > 
> > > Varying batch_size as drivers/vhost/net.c:VHOST_NET_BATCH,
> > 
> > sorry this is not what I meant.
> > 
> > I mean something like this:
> > 
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 0b509be8d7b1..b94680e5721d 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -1279,6 +1279,10 @@ static void handle_rx_net(struct vhost_work *work)
> >         handle_rx(net);
> >  }
> > 
> > +MODULE_PARM_DESC(batch_num, "Number of batched descriptors. (offset from 64)");
> > +module_param(batch_num, int, 0644);
> > +static int batch_num = 0;
> > +
> >  static int vhost_net_open(struct inode *inode, struct file *f)
> >  {
> >         struct vhost_net *n;
> > @@ -1333,7 +1337,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> >                 vhost_net_buf_init(&n->vqs[i].rxq);
> >         }
> >         vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
> > -                      UIO_MAXIOV + VHOST_NET_BATCH,
> > +                      UIO_MAXIOV + VHOST_NET_BATCH + batch_num,
> >                        VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
> >                        NULL);
> > 
> > 
> > then you can try tweaking batching and playing with mod parameter without
> > recompiling.
> > 
> > 
> > VHOST_NET_BATCH affects lots of other things.
> > 
> 
> Ok, got it. Since they were aligned from the start, I thought it was a good idea to maintain them in-sync.
> 
> > > and testing
> > > the pps as previous mail says. This means that we have either only
> > > vhost_net batching (in base testing, like previously to apply this
> > > patch) or both batching sizes the same.
> > > 
> > > I've checked that vhost process (and pktgen) goes 100% cpu also.
> > > 
> > > For tx: Batching decrements always the performance, in all cases. Not
> > > sure why bufapi made things better the last time.
> > > 
> > > Batching makes improvements until 64 bufs, I see increments of pps but like 1%.
> > > 
> > > For rx: Batching always improves performance. It seems that if we
> > > batch little, bufapi decreases performance, but beyond 64, bufapi is
> > > much better. The bufapi version keeps improving until I set a batching
> > > of 1024. So I guess it is super good to have a bunch of buffers to
> > > receive.
> > > 
> > > Since with this test I cannot disable event_idx or things like that,
> > > what would be the next step for testing?
> > > 
> > > Thanks!
> > > 
> > > --
> > > Results:
> > > # Buf size: 1,16,32,64,128,256,512
> > > 
> > > # Tx
> > > # ===
> > > # Base
> > > 2293304.308,3396057.769,3540860.615,3636056.077,3332950.846,3694276.154,3689820
> > > # Batch
> > > 2286723.857,3307191.643,3400346.571,3452527.786,3460766.857,3431042.5,3440722.286
> > > # Batch + Bufapi
> > > 2257970.769,3151268.385,3260150.538,3379383.846,3424028.846,3433384.308,3385635.231,3406554.538
> > > 
> > > # Rx
> > > # ==
> > > # pktgen results (pps)
> > > 1223275,1668868,1728794,1769261,1808574,1837252,1846436
> > > 1456924,1797901,1831234,1868746,1877508,1931598,1936402
> > > 1368923,1719716,1794373,1865170,1884803,1916021,1975160
> > > 
> > > # Testpmd pps results
> > > 1222698.143,1670604,1731040.6,1769218,1811206,1839308.75,1848478.75
> > > 1450140.5,1799985.75,1834089.75,1871290,1880005.5,1934147.25,1939034
> > > 1370621,1721858,1796287.75,1866618.5,1885466.5,1918670.75,1976173.5,1988760.75,1978316
> > > 
> > > pktgen was run again for rx with 1024 and 2048 buf size, giving
> > > 1988760.75 and 1978316 pps. Testpmd goes the same way.
> > 
> > Don't really understand what does this data mean.
> > Which number of descs is batched for each run?
> > 
> 
> Sorry, I should have explained better. I will expand here, but feel free to skip it since we are going to discard the
> data anyway. Or to propose a better way to tell them.
> 
> Is a CSV with the values I've obtained, in pps, from pktgen and testpmd. This way is easy to plot them.
> 
> Maybe is easier as tables, if mail readers/gmail does not misalign them.
> 
> > > # Tx
> > > # ===
> 
> Base: With the previous code, not integrating any patch. testpmd is txonly mode, tap interface is XDP_DROP everything.
> We vary VHOST_NET_BATCH (1, 16, 32, ...). As Jason put in a previous mail:
> 
> TX: testpmd(txonly) -> virtio-user -> vhost_net -> XDP_DROP on TAP
> 
> 
>      1     |     16     |     32     |     64     |     128    |    256     |   512  |
> 2293304.308| 3396057.769| 3540860.615| 3636056.077| 3332950.846| 3694276.154| 3689820|
> 
> If we add the batching part of the series, but not the bufapi:
> 
>       1     |     16     |     32     |     64     |     128    |    256    |     512    |
> 2286723.857 | 3307191.643| 3400346.571| 3452527.786| 3460766.857| 3431042.5 | 3440722.286|
> 
> And if we add the bufapi part, i.e., all the series:
> 
>       1    |     16     |     32     |     64     |     128    |     256    |     512    |    1024
> 2257970.769| 3151268.385| 3260150.538| 3379383.846| 3424028.846| 3433384.308| 3385635.231| 3406554.538
> 
> For easier treatment, all in the same table:
> 
>      1      |     16      |     32      |      64     |     128     |    256      |   512      |    1024
> ------------+-------------+-------------+-------------+-------------+-------------+------------+------------
> 2293304.308 | 3396057.769 | 3540860.615 | 3636056.077 | 3332950.846 | 3694276.154 | 3689820    |
> 2286723.857 | 3307191.643 | 3400346.571 | 3452527.786 | 3460766.857 | 3431042.5   | 3440722.286|
> 2257970.769 | 3151268.385 | 3260150.538 | 3379383.846 | 3424028.846 | 3433384.308 | 3385635.231| 3406554.538
>  
> > > # Rx
> > > # ==
> 
> The rx tests are done with pktgen injecting packets in tap interface, and testpmd in rxonly forward mode. Again, each
> column is a different value of VHOST_NET_BATCH, and each row is base, +batching, and +buf_api:
> 
> > > # pktgen results (pps)
> 
> (Didn't record extreme cases like >512 bufs batching)
> 
>    1   |   16   |   32   |   64   |   128  |  256   |   512
> -------+--------+--------+--------+--------+--------+--------
> 1223275| 1668868| 1728794| 1769261| 1808574| 1837252| 1846436
> 1456924| 1797901| 1831234| 1868746| 1877508| 1931598| 1936402
> 1368923| 1719716| 1794373| 1865170| 1884803| 1916021| 1975160
> 
> > > # Testpmd pps results
> 
>       1     |     16     |     32     |     64    |    128    |    256     |    512     |    1024    |   2048
> ------------+------------+------------+-----------+-----------+------------+------------+------------+---------
> 1222698.143 | 1670604    | 1731040.6  | 1769218   | 1811206   | 1839308.75 | 1848478.75 |
> 1450140.5   | 1799985.75 | 1834089.75 | 1871290   | 1880005.5 | 1934147.25 | 1939034    |
> 1370621     | 1721858    | 1796287.75 | 1866618.5 | 1885466.5 | 1918670.75 | 1976173.5  | 1988760.75 | 1978316
> 
> The last extreme cases (>512 bufs batched) were recorded just for the bufapi case.
> 
> Does that make sense now?
> 
> Thanks!

yes, thanks!

  reply	other threads:[~2020-07-20 11:45 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-11 11:34 [PATCH RFC v8 00/11] vhost: ring format independence Michael S. Tsirkin
2020-06-11 11:34 ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 01/11] vhost: option to fetch descriptors through an independent struct Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 15:22   ` Konrad Rzeszutek Wilk
2020-06-11 15:22     ` Konrad Rzeszutek Wilk
2020-06-15 12:28     ` Eugenio Perez Martin
2020-06-19 18:07       ` Eugenio Perez Martin
2020-06-19 18:25         ` Eugenio Perez Martin
2020-06-22  9:07         ` Jason Wang
2020-06-22 10:44           ` Eugenio Perez Martin
2020-06-22 15:55         ` Michael S. Tsirkin
2020-06-22 16:11           ` Eugenio Perez Martin
2020-06-22 16:29             ` Michael S. Tsirkin
2020-06-23 16:15               ` Eugenio Perez Martin
2020-07-01 10:43                 ` Eugenio Perez Martin
2020-07-01 11:11                   ` Michael S. Tsirkin
2020-07-01 11:11                     ` Michael S. Tsirkin
2020-07-01 12:56                     ` Eugenio Perez Martin
2020-07-01 12:39                   ` Jason Wang
2020-07-01 13:04                     ` Eugenio Perez Martin
2020-07-01 14:09                       ` Jason Wang
2020-07-09 16:46                         ` Eugenio Perez Martin
2020-07-09 17:37                           ` Michael S. Tsirkin
2020-07-10  3:56                             ` Jason Wang
2020-07-10  5:39                               ` Eugenio Perez Martin
2020-07-10  5:58                                 ` Michael S. Tsirkin
2020-07-16 17:16                                   ` Eugenio Perez Martin
2020-07-20  8:55                                     ` Jason Wang
2020-07-20 13:07                                       ` Eugenio Perez Martin
2020-07-20  9:27                                     ` Michael S. Tsirkin
2020-07-20 11:16                                       ` Eugenio Pérez
2020-07-20 11:45                                         ` Michael S. Tsirkin [this message]
2020-07-21  2:55                                         ` Jason Wang
2020-07-29 18:37                                           ` Eugenio Perez Martin
2020-07-10  6:44                                 ` Jason Wang
2020-06-17  3:19   ` Jason Wang
2020-06-19 17:56     ` Eugenio Perez Martin
2020-06-22 16:00     ` Michael S. Tsirkin
2020-06-23  2:51       ` Jason Wang
2020-06-23  7:00         ` Eugenio Perez Martin
2020-06-23  7:15           ` Jason Wang
2020-06-23  8:25           ` Michael S. Tsirkin
2020-06-23 15:54             ` Eugenio Perez Martin
2020-06-11 11:34 ` [PATCH RFC v8 03/11] vhost/net: pass net specific struct pointer Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-15 16:08   ` Eugenio Perez Martin
2020-06-11 11:34 ` [PATCH RFC v8 04/11] vhost: reorder functions Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 05/11] vhost: format-independent API for used buffers Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-15 16:11   ` Eugenio Perez Martin
2020-06-11 11:34 ` [PATCH RFC v8 06/11] vhost/net: convert to new API: heads->bufs Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 07/11] vhost/net: avoid iov length math Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 08/11] vhost/test: convert to the buf API Michael S. Tsirkin
2020-06-11 11:34   ` Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 09/11] vhost/scsi: switch to buf APIs Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 10/11] vhost/vsock: switch to the buf API Michael S. Tsirkin
2020-06-11 11:34 ` [PATCH RFC v8 11/11] vhost: drop head based APIs Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200720074545-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.