From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41111) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7nOm-0007WD-64 for qemu-devel@nongnu.org; Fri, 09 Aug 2013 10:10:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V7nOg-0006ug-Uz for qemu-devel@nongnu.org; Fri, 09 Aug 2013 10:10:36 -0400 Received: from mail-ie0-f176.google.com ([209.85.223.176]:49977) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7nOg-0006uY-Po for qemu-devel@nongnu.org; Fri, 09 Aug 2013 10:10:30 -0400 Received: by mail-ie0-f176.google.com with SMTP id 9so2961062iec.35 for ; Fri, 09 Aug 2013 07:10:29 -0700 (PDT) From: Anthony Liguori In-Reply-To: <87ob97nz7x.fsf@rustcorp.com.au> References: <1375938949-22622-1-git-send-email-rusty@rustcorp.com.au> <1375938949-22622-2-git-send-email-rusty@rustcorp.com.au> <87li4cgvh1.fsf@codemonkey.ws> <87ob97nz7x.fsf@rustcorp.com.au> Date: Fri, 09 Aug 2013 09:10:26 -0500 Message-ID: <8761vflzu5.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Rusty Russell , qemu-devel@nongnu.org Rusty Russell writes: > Anthony Liguori writes: >> I suspect this is a premature optimization. With a weak function called >> directly in the accessors below, I suspect you would see no measurable >> performance overhead compared to this approach. >> >> It's all very predictable so the CPU should do a decent job optimizing >> the if () away. > > Perhaps. I was leery of introducing performance regressions, but the > actual I/O tends to dominate anyway. > > So I tested this, by adding the patch (below) and benchmarking > qemu-system-i386 on my laptop before and after. > > Setup: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz > (Performance cpu governer enabled) > Guest: virtio user net, virtio block on raw file, 1 CPU, 512MB RAM. > (Qemu run under eatmydata to eliminate syncs) FYI, cache=unsafe is equivalent to using eatmydata. > First test: ping -f -c 10000 -q 10.0.2.0 (100 times) > (Ping chosen since packets stay in qemu's user net code) > > BEFORE: > MIN: 824ms > MAX: 914ms > AVG: 876.95ms > STDDEV: 16ms > > AFTER: > MIN: 872ms > MAX: 933ms > AVG: 904.35ms > STDDEV: 15ms I can reproduce this although I also see a larger standard deviation. BEFORE: MIN: 496 MAX: 1055 AVG: 873.22 STDEV: 136.88 AFTER: MIN: 494 MAX: 1456 AVG: 947.77 STDEV: 150.89 In my datasets, the stdev is higher in the after case implying that there is more variation. Indeed, the MIN is pretty much the same. GCC is inlining the functions, I'm still surprised that it's measurable at all. At any rate, I think the advantage of not increasing the amount of target specific code outweighs the performance difference here. As you said, if there is real I/O, the differences isn't noticable. Regards, Anthony Liguori