[Qemu-devel] Re: KVM call minutes for Sept 21

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Chris Wright <chrisw@redhat.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: [Qemu-devel] Re: KVM call minutes for Sept 21
Date: Tue, 21 Sep 2010 13:23:43 -0500	[thread overview]
Message-ID: <4C98F82F.9000801@codemonkey.ws> (raw)
In-Reply-To: <20100921180506.GI28009@x200.localdomain>

On 09/21/2010 01:05 PM, Chris Wright wrote:
> Nested VMX
> - looking for forward progress and better collaboration between the
>    Intel and IBM teams
> - needs more review (not a new issue)
> - use cases
> - work todo
>    - merge baseline patch
>      - looks pretty good
>      - review is finding mostly small things at this point
>      - need some correctness verification (both review from Intel and testing)
>    - need a test suite
>      - test suite harness will help here
>        - a few dozen nested SVM tests are there, can follow for nested VMX
>    - nested EPT
>    - optimize (reduce vmreads and vmwrites)
> - has long term maintan
>
> Hotplug
> - command...guest may or may not respond
> - guest can't be trusted to be direct part of request/response loop
> - solve at QMP level
> - human monitor issues (multiple successive commands to complete a
>    single unplug)
>    - should be a GUI interface design decision, human monitor is not a
>      good design point
>      - digression into GUI interface
>    

The way this works IRL is:

1) Administrator presses a physical button.  This sends an ACPI 
notification to the guest.

2) The guest makes a decision about how to handle APCI notification.

3) To initiate unplug, the guest disables the device and performs an 
operation to indicate to the PCI bus that the device is unloaded.

4) Step (3) causes an LED (usually near the button in 1) to change colors

5) Administrator then physically removes the device.

So we need at least a QMP command to perform step (1).  Since (3) can 
occur independently of (1), it should be an async notification.  
device_del should only perform step (5).

A management tool needs to:

pci_unplug_request <slot>
/* wait for PCI_UNPLUGGED event */
device_del <slot>
netdev_del <backend>

> Drive caching
> - need to formalize the meanings in terms of data integrity guarantees
> - guest write cache (does it directly reflect the host write cache?)
>    - live migration, underlying block dev changes, so need to decouple the two
> - O_DIRECT + O_DSYNC
>    - O_DSYNC needed based on whether disk cache is available
>    - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
>    - how to manage w/out needing to flush every write, slow
> - perhaps start with O_DIRECT on raw, non-sparse files only?
> - backend needs to open backing store matching to guests disk cache state
> - O_DIRECT itself has inconsistent integrity guarantees
>    - works well with fully allocated file, depedent on disk cache disable
>      (or fs specific flushing)
> - filesystem specific warnings (ext4 w/ barriers on, brtfs)
> - need to be able to open w/ O_DSYNC depending on guets's write cache mode
> - make write cache visible to guest (need a knob for this)
> - qemu default is cache=writethrough, do we need to revisit that?
> - just present user with option whether or not to use host page cache
> - allow guest OS to choose disk write cache setting
>    - set up host backend accordingly
> - be nice preserve write cache settings over boot (outgrowing cmos storage)
> - maybe some host fs-level optimization possible
>    - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
> - conclusion
>    - one direct user tunable, "use host page cache or not"
>    - one guest OS tunable, "enable disk cache"
>    

IOW, a qdev 'write-cache=on|off' property and a blockdev 'direct=on|off' 
property.  For completeness, a blockdev 'unsafe=on|off' property.

Open flags are:

write-cache=on, direct=on    O_DIRECT
write-cache=off, direct=on    O_DIRECT | O_DSYNC
write-cache=on, direct=off    0
write-cache=off, direct=off    O_DSYNC

It's still unclear what our default mode will be.

The problem is, O_DSYNC has terrible performance on ext4 when barrier=1.

write-cache=on,direct=off is a bad default because if you do a simple 
performance test, you'll get better than native and that upsets people.

write-cache=off,direct=off is a bad default because ext4's default 
config sucks with this.

likewise, write-cache=off, direct=on is a bad default for the same reason.

Regards,

Anthonny Liguori

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

     prev parent reply	other threads:[~2010-09-21 18:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-21 18:05 [Qemu-devel] KVM call minutes for Sept 21 Chris Wright
2010-09-21 18:23 ` Anthony Liguori [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C98F82F.9000801@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=chrisw@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).