qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
  • * Re: [Qemu-devel] QEMU event loop optimizations
           [not found] <20190326131822.GD15011@stefanha-x1.localdomain>
           [not found] ` <55751c00-0854-ea4d-75b5-ab82b4eeb70d@redhat.com>
    @ 2019-04-05 16:29 ` Sergio Lopez
      2019-04-05 16:29   ` Sergio Lopez
      2019-04-08  8:29   ` Stefan Hajnoczi
      1 sibling, 2 replies; 10+ messages in thread
    From: Sergio Lopez @ 2019-04-05 16:29 UTC (permalink / raw)
      To: Stefan Hajnoczi; +Cc: Sergio Lopez, qemu-devel, Paolo Bonzini
    
    [-- Attachment #1: Type: text/plain, Size: 2656 bytes --]
    
    
    Stefan Hajnoczi writes:
    
    > Hi Sergio,
    > Here are the forgotten event loop optimizations I mentioned:
    >
    >   https://github.com/stefanha/qemu/commits/event-loop-optimizations
    >
    > The goal was to eliminate or reorder syscalls so that useful work (like
    > executing BHs) occurs as soon as possible after an event is detected.
    >
    > I remember that these optimizations only shave off a handful of
    > microseconds, so they aren't a huge win.  They do become attractive on
    > fast SSDs with <10us read/write latency.
    >
    > These optimizations are aggressive and there is a possibility of
    > introducing regressions.
    >
    > If you have time to pick up this work, try benchmarking each commit
    > individually so performance changes are attributed individually.
    > There's no need to send them together in a single patch series, the
    > changes are quite independent.
    
    It took me a while to find a way to get meaningful numbers to evaluate
    those optimizations. The problem is that here (Xeon E5-2640 v3 and EPYC
    7351P) the cost of event_notifier_set() is just ~0.4us when the code
    path is hot, and it's hard differentiating it from the noise.
    
    To do so, I've used a patched kernel with a naive io_poll implementation
    for virtio_blk [1], an also patched QEMU with poll-inflight [2] (just to
    be sure we're polling) and ran the test on semi-isolated cores
    (nohz_full + rcu_nocbs + systemd_isolation) with idle siblings. The
    storage is simulated by null_blk with "completion_nsec=0 no_sched=1
    irqmode=0".
    
    # fio --time_based --runtime=30 --rw=randread --name=randread \
     --filename=/dev/vdb --direct=1 --ioengine=pvsync2 --iodepth=1 --hipri=1
    
    | avg_lat (us) | master | qbsn* |
    |   run1       | 11.32  | 10.96 |
    |   run2       | 11.37  | 10.79 |
    |   run3       | 11.42  | 10.67 |
    |   run4       | 11.32  | 11.06 |
    |   run5       | 11.42  | 11.19 |
    |   run6       | 11.42  | 10.91 |
     * patched with aio: add optimized qemu_bh_schedule_nested() API
    
    Even though there's still some variance in the numbers, the 0.4us
    improvement can be clearly appreciated.
    
    I haven't tested the other 3 patches, as their optimizations only have
    effect when the event loop is not running in polling mode. Without
    polling, we get an additional overhead of, at least, 10us, in addition
    to a lot of noise, due to both direct costs (ppoll()...) and indirect
    ones (re-scheduling and TLB/cache pollution), so I don't think we can
    reliable benchmark them. Probably their impact won't be significant
    either, due to the costs I've just mentioned.
    
    Sergio.
    
    [1] https://github.com/slp/linux/commit/d369b37db3e298933e8bb88c6eeacff07f39bc13
    [2] https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg00447.html
    
    [-- Attachment #2: signature.asc --]
    [-- Type: application/pgp-signature, Size: 832 bytes --]
    
    ^ permalink raw reply	[flat|nested] 10+ messages in thread

  • end of thread, other threads:[~2019-04-08 10:43 UTC | newest]
    
    Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <20190326131822.GD15011@stefanha-x1.localdomain>
         [not found] ` <55751c00-0854-ea4d-75b5-ab82b4eeb70d@redhat.com>
    2019-04-02 16:18   ` [Qemu-devel] QEMU event loop optimizations Kevin Wolf
    2019-04-02 16:25     ` Paolo Bonzini
    2019-04-05 16:33   ` Sergio Lopez
    2019-04-05 16:33     ` Sergio Lopez
    2019-04-08 10:42     ` Paolo Bonzini
    2019-04-08 10:42       ` Paolo Bonzini
    2019-04-05 16:29 ` Sergio Lopez
    2019-04-05 16:29   ` Sergio Lopez
    2019-04-08  8:29   ` Stefan Hajnoczi
    2019-04-08  8:29     ` Stefan Hajnoczi
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox;
    as well as URLs for NNTP newsgroup(s).