qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] migration question: disk images on nfs server
@ 2014-02-07  4:35 Alexey Kardashevskiy
  2014-02-07  7:46 ` Orit Wasserman
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-07  4:35 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

Hi!

I have yet another problem with migration. Or NFS.

There is one NFS server and 2 test POWER8 machines. There is a shared NFS
folder on the server, mounted to both test hosts. There is an qcow2 image
(abc.qcow2) in that shared folder.

We start a guest with this abc.qcow2 on the test machine #1. And start
another guest on the test machine #2 with "-incoming ..." and same abc.qcow2.

Now we start migration. In most cases it goes fine. But if we put some load
on machine #1, the destination guest sometime crashes.

I blame out-of-sync NFS on the test machines. I looked a bit further in
QEMU and could not find a spot where it would fflush(abc.qcow2) or close it
or do any other sync so it is up to the host NFS mountpoint to decide when
to sync and it definitely does not get a clue when to do this.

I do not really understand why the abc.qcow2 image is still open, should
not it be closed after migration succeeded?

What do I miss here? Should we switch from NFS to GlusterFS (is it always
syncronized)? Or if we want NFS, should we just boot our guests with
"root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in network
disks? Thanks!



-- 
Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07  4:35 [Qemu-devel] migration question: disk images on nfs server Alexey Kardashevskiy
@ 2014-02-07  7:46 ` Orit Wasserman
  2014-02-07  9:41   ` Marcin Gibuła
  2014-02-07 12:10   ` Alexey Kardashevskiy
  0 siblings, 2 replies; 11+ messages in thread
From: Orit Wasserman @ 2014-02-07  7:46 UTC (permalink / raw)
  To: qemu-devel

On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote:
> Hi!
>
> I have yet another problem with migration. Or NFS.
>
> There is one NFS server and 2 test POWER8 machines. There is a shared NFS
> folder on the server, mounted to both test hosts. There is an qcow2 image
> (abc.qcow2) in that shared folder.
>
> We start a guest with this abc.qcow2 on the test machine #1. And start
> another guest on the test machine #2 with "-incoming ..." and same abc.qcow2.
>
> Now we start migration. In most cases it goes fine. But if we put some load
> on machine #1, the destination guest sometime crashes.
>
> I blame out-of-sync NFS on the test machines. I looked a bit further in
> QEMU and could not find a spot where it would fflush(abc.qcow2) or close it
> or do any other sync so it is up to the host NFS mountpoint to decide when
> to sync and it definitely does not get a clue when to do this.
>
> I do not really understand why the abc.qcow2 image is still open, should
> not it be closed after migration succeeded?
>
> What do I miss here? Should we switch from NFS to GlusterFS (is it always
> syncronized)? Or if we want NFS, should we just boot our guests with
> "root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in network
> disks? Thanks!
>

For NFS you need to use the sync mount option to force the NFS client to sync to
server on writes.

Orit
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07  7:46 ` Orit Wasserman
@ 2014-02-07  9:41   ` Marcin Gibuła
  2014-02-07 12:26     ` Paolo Bonzini
  2014-02-07 12:10   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-02-07  9:41 UTC (permalink / raw)
  To: qemu-devel

> For NFS you need to use the sync mount option to force the NFS client to
> sync to
> server on writes.

Isn't opening with O_DIRECT enough? (for linux nfs client at least)

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07  7:46 ` Orit Wasserman
  2014-02-07  9:41   ` Marcin Gibuła
@ 2014-02-07 12:10   ` Alexey Kardashevskiy
  2014-02-07 12:47     ` Orit Wasserman
  1 sibling, 1 reply; 11+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-07 12:10 UTC (permalink / raw)
  To: Orit Wasserman, qemu-devel

On 07.02.2014 18:46, Orit Wasserman wrote:
> On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> I have yet another problem with migration. Or NFS.
>>
>> There is one NFS server and 2 test POWER8 machines. There is a shared NFS
>> folder on the server, mounted to both test hosts. There is an qcow2 image
>> (abc.qcow2) in that shared folder.
>>
>> We start a guest with this abc.qcow2 on the test machine #1. And start
>> another guest on the test machine #2 with "-incoming ..." and same
>> abc.qcow2.
>>
>> Now we start migration. In most cases it goes fine. But if we put some
>> load
>> on machine #1, the destination guest sometime crashes.
>>
>> I blame out-of-sync NFS on the test machines. I looked a bit further in
>> QEMU and could not find a spot where it would fflush(abc.qcow2) or
>> close it
>> or do any other sync so it is up to the host NFS mountpoint to decide
>> when
>> to sync and it definitely does not get a clue when to do this.
>>
>> I do not really understand why the abc.qcow2 image is still open, should
>> not it be closed after migration succeeded?
>>
>> What do I miss here? Should we switch from NFS to GlusterFS (is it always
>> syncronized)? Or if we want NFS, should we just boot our guests with
>> "root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in
>> network
>> disks? Thanks!
>>
> 
> For NFS you need to use the sync mount option to force the NFS client to
> sync to
> server on writes.

So there is no any kind of sync in QEMU after migration finished,
correct? Looks too mucn to enforce "sync" option for NFS as we really
need it for once.


-- 
With best regards

Alexey Kardashevskiy -- icq: 52150396

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07  9:41   ` Marcin Gibuła
@ 2014-02-07 12:26     ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-02-07 12:26 UTC (permalink / raw)
  To: Marcin Gibuła, qemu-devel

Il 07/02/2014 10:41, Marcin Gibuła ha scritto:
>> For NFS you need to use the sync mount option to force the NFS client to
>> sync to
>> server on writes.
> 
> Isn't opening with O_DIRECT enough? (for linux nfs client at least)

Yeah, the man page says:

       If neither sync nor async is specified (or
       if the async option is specified), the NFS client delays sending appli‐
       cation writes to the server until any of these events occur:

              Memory pressure forces reclamation of system memory resources.

              An  application  flushes  file  data  explicitly  with  sync(2),
              msync(2), or fsync(3).

              An application closes a file with close(2).

              The file is locked/unlocked via fcntl(2).

       In other words, under normal circumstances, data written by an applica‐
       tion may not immediately appear on the server that hosts the file.

QEMU does flush file data with fsync(3).  It's not the first time I hear about
needing the sync option though.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 12:10   ` Alexey Kardashevskiy
@ 2014-02-07 12:47     ` Orit Wasserman
  2014-02-07 12:54       ` Marcin Gibuła
  2014-02-08  8:30       ` Kevin Wolf
  0 siblings, 2 replies; 11+ messages in thread
From: Orit Wasserman @ 2014-02-07 12:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel

On 02/07/2014 02:10 PM, Alexey Kardashevskiy wrote:
> On 07.02.2014 18:46, Orit Wasserman wrote:
>> On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote:
>>> Hi!
>>>
>>> I have yet another problem with migration. Or NFS.
>>>
>>> There is one NFS server and 2 test POWER8 machines. There is a shared NFS
>>> folder on the server, mounted to both test hosts. There is an qcow2 image
>>> (abc.qcow2) in that shared folder.
>>>
>>> We start a guest with this abc.qcow2 on the test machine #1. And start
>>> another guest on the test machine #2 with "-incoming ..." and same
>>> abc.qcow2.
>>>
>>> Now we start migration. In most cases it goes fine. But if we put some
>>> load
>>> on machine #1, the destination guest sometime crashes.
>>>
>>> I blame out-of-sync NFS on the test machines. I looked a bit further in
>>> QEMU and could not find a spot where it would fflush(abc.qcow2) or
>>> close it
>>> or do any other sync so it is up to the host NFS mountpoint to decide
>>> when
>>> to sync and it definitely does not get a clue when to do this.
>>>
>>> I do not really understand why the abc.qcow2 image is still open, should
>>> not it be closed after migration succeeded?
>>>
>>> What do I miss here? Should we switch from NFS to GlusterFS (is it always
>>> syncronized)? Or if we want NFS, should we just boot our guests with
>>> "root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in
>>> network
>>> disks? Thanks!
>>>
>>
>> For NFS you need to use the sync mount option to force the NFS client to
>> sync to
>> server on writes.
>
> So there is no any kind of sync in QEMU after migration finished,
> correct? Looks too mucn to enforce "sync" option for NFS as we really
> need it for once.
>

It is more a NFS issue, if you have a file in NFS that two users in
two different host are accessing (one at least write to it) you will need to enforce the "sync" option.
Even if you flush all the data and close the file the NFS client can still
have cached data that it didn't sync to the server.

>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 12:47     ` Orit Wasserman
@ 2014-02-07 12:54       ` Marcin Gibuła
  2014-02-07 13:36         ` Orit Wasserman
  2014-02-08  8:30       ` Kevin Wolf
  1 sibling, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-02-07 12:54 UTC (permalink / raw)
  To: Orit Wasserman, Alexey Kardashevskiy, qemu-devel

> It is more a NFS issue, if you have a file in NFS that two users in
> two different host are accessing (one at least write to it) you will
> need to enforce the "sync" option.
> Even if you flush all the data and close the file the NFS client can still
> have cached data that it didn't sync to the server.

Do you know if is applies to linux O_DIRECT writes as well?

 From comment in fs/nfs/direct.c:

* When an application requests uncached I/O, all read and write requests
* are made directly to the server; data stored or fetched via these
* requests is not cached in the Linux page cache.  The client does not
* correct unaligned requests from applications.  All requested bytes are
* held on permanent storage before a direct write system call returns to
* an application.



-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 12:54       ` Marcin Gibuła
@ 2014-02-07 13:36         ` Orit Wasserman
  2014-02-07 13:44           ` Marcin Gibuła
  0 siblings, 1 reply; 11+ messages in thread
From: Orit Wasserman @ 2014-02-07 13:36 UTC (permalink / raw)
  To: Marcin Gibuła, Alexey Kardashevskiy, qemu-devel

On 02/07/2014 02:54 PM, Marcin Gibuła wrote:
>> It is more a NFS issue, if you have a file in NFS that two users in
>> two different host are accessing (one at least write to it) you will
>> need to enforce the "sync" option.
>> Even if you flush all the data and close the file the NFS client can still
>> have cached data that it didn't sync to the server.
>
> Do you know if is applies to linux O_DIRECT writes as well?
>

 From the man of open:

        The behaviour of O_DIRECT with NFS will differ from local
        filesystems.  Older kernels, or kernels configured in certain ways,
        may not support this combination.  The NFS protocol does not support
        passing the flag to the server, so O_DIRECT I/O will bypass the page
        cache only on the client; the server may still cache the I/O.  The
        client asks the server to make the I/O synchronous to preserve the
        synchronous semantics of O_DIRECT.  Some servers will perform poorly
        under these circumstances, especially if the I/O size is small.  Some
        servers may also be configured to lie to clients about the I/O having
        reached stable storage; this will avoid the performance penalty at
        some risk to data integrity in the event of server power failure.
        The Linux NFS client places no alignment restrictions on O_DIRECT
        I/O.
  
To summaries it depends on your kernel (NFS client).


>  From comment in fs/nfs/direct.c:
>
> * When an application requests uncached I/O, all read and write requests
> * are made directly to the server; data stored or fetched via these
> * requests is not cached in the Linux page cache.  The client does not
> * correct unaligned requests from applications.  All requested bytes are
> * held on permanent storage before a direct write system call returns to
> * an application.
>

>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 13:36         ` Orit Wasserman
@ 2014-02-07 13:44           ` Marcin Gibuła
  2014-02-07 13:57             ` Orit Wasserman
  0 siblings, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-02-07 13:44 UTC (permalink / raw)
  To: Orit Wasserman, Alexey Kardashevskiy, qemu-devel

On 07.02.2014 14:36, Orit Wasserman wrote:
>> Do you know if is applies to linux O_DIRECT writes as well?
>>
>
>  From the man of open:
>
>         The behaviour of O_DIRECT with NFS will differ from local
>         filesystems.  Older kernels, or kernels configured in certain ways,
>         may not support this combination.  The NFS protocol does not
> support
>         passing the flag to the server, so O_DIRECT I/O will bypass the
> page
>         cache only on the client; the server may still cache the I/O.  The
>         client asks the server to make the I/O synchronous to preserve the
>         synchronous semantics of O_DIRECT.  Some servers will perform
> poorly
>         under these circumstances, especially if the I/O size is small.
> Some
>         servers may also be configured to lie to clients about the I/O
> having
>         reached stable storage; this will avoid the performance penalty at
>         some risk to data integrity in the event of server power failure.
>         The Linux NFS client places no alignment restrictions on O_DIRECT
>         I/O.
>
> To summaries it depends on your kernel (NFS client).

So, assuming new kernel (where nfs O_DIRECT translates to no cache at 
client side) and cache coherent server, is it enough or is 'sync' mount 
(or O_SYNC flag) still required for some reason?

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 13:44           ` Marcin Gibuła
@ 2014-02-07 13:57             ` Orit Wasserman
  0 siblings, 0 replies; 11+ messages in thread
From: Orit Wasserman @ 2014-02-07 13:57 UTC (permalink / raw)
  To: Marcin Gibuła, Alexey Kardashevskiy, qemu-devel

On 02/07/2014 03:44 PM, Marcin Gibuła wrote:
> On 07.02.2014 14:36, Orit Wasserman wrote:
>>> Do you know if is applies to linux O_DIRECT writes as well?
>>>
>>
>>  From the man of open:
>>
>>         The behaviour of O_DIRECT with NFS will differ from local
>>         filesystems.  Older kernels, or kernels configured in certain ways,
>>         may not support this combination.  The NFS protocol does not
>> support
>>         passing the flag to the server, so O_DIRECT I/O will bypass the
>> page
>>         cache only on the client; the server may still cache the I/O.  The
>>         client asks the server to make the I/O synchronous to preserve the
>>         synchronous semantics of O_DIRECT.  Some servers will perform
>> poorly
>>         under these circumstances, especially if the I/O size is small.
>> Some
>>         servers may also be configured to lie to clients about the I/O
>> having
>>         reached stable storage; this will avoid the performance penalty at
>>         some risk to data integrity in the event of server power failure.
>>         The Linux NFS client places no alignment restrictions on O_DIRECT
>>         I/O.
>>
>> To summaries it depends on your kernel (NFS client).
>
> So, assuming new kernel (where nfs O_DIRECT translates to no cache at client side) and cache coherent server, is it enough or is 'sync' mount (or O_SYNC flag) still required for some reason?
>

I think is should be enough.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] migration question: disk images on nfs server
  2014-02-07 12:47     ` Orit Wasserman
  2014-02-07 12:54       ` Marcin Gibuła
@ 2014-02-08  8:30       ` Kevin Wolf
  1 sibling, 0 replies; 11+ messages in thread
From: Kevin Wolf @ 2014-02-08  8:30 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: Alexey Kardashevskiy, qemu-devel

Am 07.02.2014 um 13:47 hat Orit Wasserman geschrieben:
> On 02/07/2014 02:10 PM, Alexey Kardashevskiy wrote:
> >On 07.02.2014 18:46, Orit Wasserman wrote:
> >>On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote:
> >>>Hi!
> >>>
> >>>I have yet another problem with migration. Or NFS.
> >>>
> >>>There is one NFS server and 2 test POWER8 machines. There is a shared NFS
> >>>folder on the server, mounted to both test hosts. There is an qcow2 image
> >>>(abc.qcow2) in that shared folder.
> >>>
> >>>We start a guest with this abc.qcow2 on the test machine #1. And start
> >>>another guest on the test machine #2 with "-incoming ..." and same
> >>>abc.qcow2.
> >>>
> >>>Now we start migration. In most cases it goes fine. But if we put some
> >>>load
> >>>on machine #1, the destination guest sometime crashes.
> >>>
> >>>I blame out-of-sync NFS on the test machines. I looked a bit further in
> >>>QEMU and could not find a spot where it would fflush(abc.qcow2) or
> >>>close it
> >>>or do any other sync so it is up to the host NFS mountpoint to decide
> >>>when
> >>>to sync and it definitely does not get a clue when to do this.
> >>>
> >>>I do not really understand why the abc.qcow2 image is still open, should
> >>>not it be closed after migration succeeded?
> >>>
> >>>What do I miss here? Should we switch from NFS to GlusterFS (is it always
> >>>syncronized)? Or if we want NFS, should we just boot our guests with
> >>>"root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in
> >>>network
> >>>disks? Thanks!
> >>>
> >>
> >>For NFS you need to use the sync mount option to force the NFS client to
> >>sync to
> >>server on writes.
> >
> >So there is no any kind of sync in QEMU after migration finished,
> >correct? Looks too mucn to enforce "sync" option for NFS as we really
> >need it for once.
> >
> 
> It is more a NFS issue, if you have a file in NFS that two users in
> two different host are accessing (one at least write to it) you will need to enforce the "sync" option.
> Even if you flush all the data and close the file the NFS client can still
> have cached data that it didn't sync to the server.

Are you sure? This is news to me. qemu does do an fsync() and that
should be enough to get the data flushed to the NFS server.

There may be another problem, though: The destination host may have stale
data in the local page cache, and qemu can't force it to drop that cache
and reload the new data from the server. In this case you need to use
cache=none (i.e. O_DIRECT). I believe this isn't strictly necessary for
NFS and the file system driver will take care of it, though we certainly
recommend cache=none with live migration.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-02-08  8:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-07  4:35 [Qemu-devel] migration question: disk images on nfs server Alexey Kardashevskiy
2014-02-07  7:46 ` Orit Wasserman
2014-02-07  9:41   ` Marcin Gibuła
2014-02-07 12:26     ` Paolo Bonzini
2014-02-07 12:10   ` Alexey Kardashevskiy
2014-02-07 12:47     ` Orit Wasserman
2014-02-07 12:54       ` Marcin Gibuła
2014-02-07 13:36         ` Orit Wasserman
2014-02-07 13:44           ` Marcin Gibuła
2014-02-07 13:57             ` Orit Wasserman
2014-02-08  8:30       ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).