All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Bligh <alex@alex.org.uk>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	qemu-devel@nongnu.org, xen-devel <xen-devel@lists.xen.org>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Alex Bligh <alex@alex.org.uk>
Subject: Re: [Qemu-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
Date: Mon, 18 Mar 2013 16:53:51 +0000	[thread overview]
Message-ID: <861AFE1A9C44444FD8BAEE16@Ximines.local> (raw)
In-Reply-To: <51473E82.1020806@redhat.com>

Paolo,

--On 18 March 2013 17:19:14 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote:

> I remembered this incorrectly, sorry.  It's not from a previous run,
> it's from the beginning of this run.  See
> http://wiki.qemu.org/Migration/Storage for more information.
>
> A VM has a disk backed by NFS. It runs on node A, at which point pages
> are introduced to the page cache. It then migrates to node B, which
> entails starting the VM on node B while it is still running on node A.
> Closing has yet to happen on node A, but the file is already open on
> node B; anything that is cached on node B will never be invalidated.
>
> Thus, any changes done to the disk on node A during migration may not
> become visible on node B.

This might be a difference between Xen and KVM. On Xen migration is
made to a server in a paused state, and it's only unpaused when
the migration to B is complete. There's a sort of extra handshake at
the end.

I believe what's happening is that libxl_domain_suspend when
called with LIBXL_SUSPEND_LIVE will do a final fsync()/fdatasync()
at the end, then await a migrate_receiver_ready message, and only
when that has been received will it send a migrate_permission_to_go
message which unpauses the domain. Before that, I don't believe the
disk is read (I may be wrong about that). The sending code is in
migrate_domain() in xl_cmdimpl.c, and the receiving code is in
migrate_receive() (same file). On xen at least, I don't think
the VM is ever started on node B whilst it is still running on node
A.

>> I've no problem if xl or libvirt or whatever error or warn. My usage
>> is API based, rather than xl / libvirt based.
>
> What makes libvirt not an API (just like libxl)?

Nothing, just I'm using the QMP API and the libxl API. I'm just saying
whether libvirt or xl warn or error makes no difference to me.

>>> If libxl does migration without O_DIRECT, then that's a bug in libxl.
>>> What about blkback?  IIRC it uses bios, so it also bypasses the page
>>> cache.
>>
>> Possibly a bug in xl rather than libxl, but as no emulated devices
>> use O_DIRECT, that bug is already there, and isn't in QEMU.
>
> blkback is the in-kernel PV device, it's not an emulated device.

I mean that an emulated device will already not use O_DIRECT.
So if you are right about live migrate being unsafe without O_DIRECT,
it's already unsafe for emulated devices.

>>>> Stefano did ack the patch, and for a one line change it's been
>>>> through a pretty extensive discussion on xen-devel ...
>>>
>>> It may be a one-line change, but it completely changes the paths that
>>> I/O goes through.  Apparently the discussion was not enough.
>>
>> What would you suggest?
>
> Nothing except fixing the bug in the kernel.

I have already posted patches for that, as Ian Campbell did in 2008,
but no one seems particularly interested. Be my guest in trying to
get them adopted. That's quite obviously the long term solution.

In the mean time, however, there is a need to run Xen on kernels with
long term support. Not being able to run Xen in a stable manner is
not an acceptable position.

> No one has yet explained
> why blkback is not susceptible to the same bug.

I would guess it will be if it uses O_DIRECT or whatever the in kernel
equivalent is, unless it's doing a copy of the guest pages prior to the
write being marked as complete.

I can't claim to be familiar with blkback, but I presume this would
require a similar fix elsewhere.

-- 
Alex Bligh

WARNING: multiple messages have this Message-ID (diff)
From: Alex Bligh <alex@alex.org.uk>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	qemu-devel@nongnu.org, xen-devel <xen-devel@lists.xen.org>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Alex Bligh <alex@alex.org.uk>
Subject: Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
Date: Mon, 18 Mar 2013 16:53:51 +0000	[thread overview]
Message-ID: <861AFE1A9C44444FD8BAEE16@Ximines.local> (raw)
In-Reply-To: <51473E82.1020806@redhat.com>

Paolo,

--On 18 March 2013 17:19:14 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote:

> I remembered this incorrectly, sorry.  It's not from a previous run,
> it's from the beginning of this run.  See
> http://wiki.qemu.org/Migration/Storage for more information.
>
> A VM has a disk backed by NFS. It runs on node A, at which point pages
> are introduced to the page cache. It then migrates to node B, which
> entails starting the VM on node B while it is still running on node A.
> Closing has yet to happen on node A, but the file is already open on
> node B; anything that is cached on node B will never be invalidated.
>
> Thus, any changes done to the disk on node A during migration may not
> become visible on node B.

This might be a difference between Xen and KVM. On Xen migration is
made to a server in a paused state, and it's only unpaused when
the migration to B is complete. There's a sort of extra handshake at
the end.

I believe what's happening is that libxl_domain_suspend when
called with LIBXL_SUSPEND_LIVE will do a final fsync()/fdatasync()
at the end, then await a migrate_receiver_ready message, and only
when that has been received will it send a migrate_permission_to_go
message which unpauses the domain. Before that, I don't believe the
disk is read (I may be wrong about that). The sending code is in
migrate_domain() in xl_cmdimpl.c, and the receiving code is in
migrate_receive() (same file). On xen at least, I don't think
the VM is ever started on node B whilst it is still running on node
A.

>> I've no problem if xl or libvirt or whatever error or warn. My usage
>> is API based, rather than xl / libvirt based.
>
> What makes libvirt not an API (just like libxl)?

Nothing, just I'm using the QMP API and the libxl API. I'm just saying
whether libvirt or xl warn or error makes no difference to me.

>>> If libxl does migration without O_DIRECT, then that's a bug in libxl.
>>> What about blkback?  IIRC it uses bios, so it also bypasses the page
>>> cache.
>>
>> Possibly a bug in xl rather than libxl, but as no emulated devices
>> use O_DIRECT, that bug is already there, and isn't in QEMU.
>
> blkback is the in-kernel PV device, it's not an emulated device.

I mean that an emulated device will already not use O_DIRECT.
So if you are right about live migrate being unsafe without O_DIRECT,
it's already unsafe for emulated devices.

>>>> Stefano did ack the patch, and for a one line change it's been
>>>> through a pretty extensive discussion on xen-devel ...
>>>
>>> It may be a one-line change, but it completely changes the paths that
>>> I/O goes through.  Apparently the discussion was not enough.
>>
>> What would you suggest?
>
> Nothing except fixing the bug in the kernel.

I have already posted patches for that, as Ian Campbell did in 2008,
but no one seems particularly interested. Be my guest in trying to
get them adopted. That's quite obviously the long term solution.

In the mean time, however, there is a need to run Xen on kernels with
long term support. Not being able to run Xen in a stable manner is
not an acceptable position.

> No one has yet explained
> why blkback is not susceptible to the same bug.

I would guess it will be if it uses O_DIRECT or whatever the in kernel
equivalent is, unless it's doing a copy of the guest pages prior to the
write being marked as complete.

I can't claim to be familiar with blkback, but I presume this would
require a similar fix elsewhere.

-- 
Alex Bligh

  reply	other threads:[~2013-03-18 16:54 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-18 12:18 [Qemu-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
2013-03-18 12:18 ` Alex Bligh
2013-03-18 13:03 ` [Qemu-devel] " Stefan Hajnoczi
2013-03-18 13:03 ` Stefan Hajnoczi
2013-03-18 13:19   ` Alex Bligh
2013-03-18 13:32 ` Paolo Bonzini
2013-03-18 13:49   ` Alex Bligh
2013-03-18 14:05     ` Paolo Bonzini
2013-03-18 14:30       ` Alex Bligh
2013-03-18 14:30       ` Alex Bligh
2013-03-18 14:49         ` Paolo Bonzini
2013-03-18 14:49         ` [Qemu-devel] " Paolo Bonzini
2013-03-18 15:40           ` Alex Bligh
2013-03-18 16:19             ` Paolo Bonzini
2013-03-18 16:53               ` Alex Bligh [this message]
2013-03-18 16:53                 ` Alex Bligh
2013-03-18 17:38                 ` [Qemu-devel] " George Dunlap
2013-03-18 17:38                   ` George Dunlap
2013-03-18 17:47                   ` [Qemu-devel] " Alex Bligh
2013-03-18 17:47                     ` Alex Bligh
2013-03-18 18:00                   ` [Qemu-devel] " Paolo Bonzini
2013-03-19 10:06                     ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-03-19 10:43                       ` Paolo Bonzini
2013-03-19 10:43                       ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2013-03-19 10:51                         ` George Dunlap
2013-03-19 11:14                           ` Paolo Bonzini
2013-03-19 11:14                           ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2013-03-19 11:21                             ` George Dunlap
2013-03-19 11:21                               ` George Dunlap
2013-03-19 15:12                               ` George Dunlap
2013-03-19 15:12                               ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-03-19 15:29                                 ` George Dunlap
2013-03-19 15:29                                 ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-03-19 19:15                                   ` Alex Bligh
2013-03-20 10:24                                     ` [Qemu-devel] " Stefano Stabellini
2013-03-20 10:24                                     ` [Qemu-devel] [Xen-devel] " Stefano Stabellini
2013-03-20 10:37                                       ` [Qemu-devel] " George Dunlap
2013-03-20 10:37                                       ` [Qemu-devel] [Xen-devel] " George Dunlap
2013-03-20 11:08                                         ` Paolo Bonzini
2013-03-20 11:08                                         ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2013-03-20 11:20                                           ` Alex Bligh
2013-03-20 11:20                                           ` [Qemu-devel] [Xen-devel] " Alex Bligh
2013-03-20 11:57                                       ` David Scott
2013-03-20 11:57                                         ` [Qemu-devel] " David Scott
2013-03-19 19:15                                   ` Alex Bligh
2013-03-19 11:44                             ` [Qemu-devel] [Xen-devel] " Alex Bligh
2013-03-19 11:49                               ` Paolo Bonzini
2013-03-19 11:49                               ` Paolo Bonzini
2013-03-19 11:44                             ` Alex Bligh
2013-03-19 10:51                         ` George Dunlap
2013-03-19 15:13                         ` [Qemu-devel] [Xen-devel] " Stefano Stabellini
2013-03-19 16:53                           ` Paolo Bonzini
2013-03-19 16:53                           ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2013-03-19 17:03                             ` Stefano Stabellini
2013-03-19 17:03                             ` Stefano Stabellini
2013-03-20  8:33                           ` [Qemu-devel] [Xen-devel] " Alex Bligh
2013-03-20  8:33                             ` Alex Bligh
2013-03-20  9:26                             ` Paolo Bonzini
2013-03-20  9:26                             ` [Qemu-devel] [Xen-devel] " Paolo Bonzini
2013-03-29 17:19                               ` Stefano Stabellini
2013-03-29 17:19                                 ` Stefano Stabellini
2013-03-31 19:53                                 ` Alex Bligh
2013-03-31 19:53                                 ` [Qemu-devel] [Xen-devel] " Alex Bligh
2013-04-01 15:32                                   ` [Qemu-devel] [PATCH] [RFC] Xen PV backend: Move call to bdrv_new from blk_init to blk_connect Alex Bligh
2013-04-01 15:32                                     ` Alex Bligh
2013-04-01 15:44                                     ` [Qemu-devel] " Stefano Stabellini
2013-04-01 20:56                                       ` Alex Bligh
2013-04-02 11:08                                         ` Stefano Stabellini
2013-04-02 11:08                                         ` [Qemu-devel] " Stefano Stabellini
2013-04-05 10:31                                           ` [PATCHv2 1/2] " Alex Bligh
2013-04-05 10:31                                           ` [Qemu-devel] " Alex Bligh
2013-04-05 10:31                                             ` [PATCHv2 2/2] Xen PV backend: Disable use of O_DIRECT by default as it results in crashes Alex Bligh
2013-04-05 10:31                                             ` [Qemu-devel] " Alex Bligh
2013-04-05 14:22                                             ` [Qemu-devel] [PATCHv2 1/2] Xen PV backend: Move call to bdrv_new from blk_init to blk_connect Stefano Stabellini
2013-04-05 15:42                                               ` [PATCHv3 " Alex Bligh
2013-04-05 15:42                                               ` [Qemu-devel] " Alex Bligh
2013-04-05 15:42                                                 ` [Qemu-devel] [PATCHv3 2/2] Xen PV backend: Disable use of O_DIRECT by default as it results in crashes Alex Bligh
2013-04-05 15:42                                                 ` Alex Bligh
2013-04-05 15:43                                               ` [Qemu-devel] [PATCHv2 1/2] Xen PV backend: Move call to bdrv_new from blk_init to blk_connect Alex Bligh
2013-04-05 15:43                                               ` Alex Bligh
2013-04-05 14:22                                             ` Stefano Stabellini
2013-04-05 10:32                                           ` [Qemu-devel] [PATCHv2 1/2] Xen PV backend (for qemu-upstream-4.2-testing): " Alex Bligh
2013-04-05 10:32                                             ` [Qemu-devel] [PATCHv2 2/2] Xen PV backend (for qemu-upstream-4.2-testing): Disable use of O_DIRECT by default as it results in crashes Alex Bligh
2013-04-05 10:32                                             ` Alex Bligh
2013-04-05 10:32                                           ` [PATCHv2 1/2] Xen PV backend (for qemu-upstream-4.2-testing): Move call to bdrv_new from blk_init to blk_connect Alex Bligh
2013-04-05 10:34                                           ` [Qemu-devel] [PATCH] [RFC] Xen PV backend: " Alex Bligh
2013-04-05 10:34                                           ` Alex Bligh
2013-04-01 20:56                                       ` Alex Bligh
2013-04-01 15:44                                     ` Stefano Stabellini
2013-04-01 16:35                                   ` [Qemu-devel] [Xen-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
2013-04-01 16:35                                     ` Alex Bligh
2013-03-19 15:13                         ` Stefano Stabellini
2013-03-19 10:06                     ` George Dunlap
2013-03-18 18:00                   ` Paolo Bonzini
2013-03-18 16:19             ` Paolo Bonzini
2013-03-18 15:40           ` Alex Bligh
2013-03-18 14:05     ` [Qemu-devel] " Paolo Bonzini
2013-03-18 13:49   ` Alex Bligh
2013-03-18 13:32 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=861AFE1A9C44444FD8BAEE16@Ximines.local \
    --to=alex@alex.org.uk \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=anthony@codemonkey.ws \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.