* Fix migration issue when the destination is loaded
@ 2009-07-15 15:03 Dor Laor
2009-07-16 9:39 ` Mark McLoughlin
0 siblings, 1 reply; 4+ messages in thread
From: Dor Laor @ 2009-07-15 15:03 UTC (permalink / raw)
To: kvm-devel
[-- Attachment #1: Type: text/plain, Size: 0 bytes --]
[-- Attachment #2: 0001-Fix-migration-issue-when-the-destination-is-loaded.patch --]
[-- Type: text/plain, Size: 1702 bytes --]
>From abbe3b57af6a28bb81e5fb8b09b10802a8ccc3fe Mon Sep 17 00:00:00 2001
From: Dor Laor <dor@redhat.com>
Date: Wed, 15 Jul 2009 17:53:16 +0300
Subject: [PATCH] Fix migration issue when the destination is loaded
If the migration socket is full, we get EAGAIN for the write.
The set_fd_handler2 defers the write for later on. The function
tries to wake up the iothread by qemu_kvm_notify_work.
Since this happens in a loop, multiple times, the pipe that emulates eventfd
becomes full and we get a deadlock.
It is solved by checking for write-readiness using select.
Note that I check for select only for full 8 byte write and not
for partial writes. This is because we'll break the reader otherwise.
Signed-off-by: Dor Laor <dor@redhat.com>
---
qemu-kvm.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index ed7e466..0ea67a7 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2094,12 +2094,28 @@ void qemu_kvm_notify_work(void)
uint64_t value = 1;
char buffer[8];
size_t offset = 0;
+ fd_set wfds;
+ struct timeval tv;
+ int retval;
if (io_thread_fd == -1)
return;
memcpy(buffer, &value, sizeof(value));
+ FD_ZERO(&wfds);
+ FD_SET(io_thread_fd, &wfds);
+ tv.tv_sec = tv.tv_usec = 0;
+ retval = select(io_thread_fd + 1, NULL, &wfds, NULL, &tv);
+ if (retval == -1) {
+ fprintf(stderr, "failed to notify io thread due to select error\n");
+ return;
+ } else if (retval == 0)
+ /* We probably ponded this pipe too much and it is full now */
+ return;
+
+ assert(FD_ISSET(io_thread_fd, &wfds));
+
while (offset < 8) {
ssize_t len;
--
1.6.2.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Fix migration issue when the destination is loaded
2009-07-15 15:03 Fix migration issue when the destination is loaded Dor Laor
@ 2009-07-16 9:39 ` Mark McLoughlin
2009-07-16 12:00 ` Dor Laor
0 siblings, 1 reply; 4+ messages in thread
From: Mark McLoughlin @ 2009-07-16 9:39 UTC (permalink / raw)
To: dlaor; +Cc: kvm-devel
Hi Dor,
On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
> If the migration socket is full, we get EAGAIN for the write.
> The set_fd_handler2 defers the write for later on. The function
> tries to wake up the iothread by qemu_kvm_notify_work.
> Since this happens in a loop, multiple times, the pipe that emulates
> eventfd becomes full and we get a deadlock.
I'm not sure I follow:
- You're seeing qemu_kvm_notify_work() being called many times
- The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(),
main_loop_break()
- This happens when write() in migrate_fd_put_buffer() returns EAGAIN
because the socket buffer has filled up
Correct?
That sounds like migrate_fd_put_buffer() is being called repeatedly
while we know the socket isn't writable?
Shouldn't the buffered file could stop attempting to call put_buffer()
until it has been notified that the underlying fd is writable?
Cheers,
Mark.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Fix migration issue when the destination is loaded
2009-07-16 9:39 ` Mark McLoughlin
@ 2009-07-16 12:00 ` Dor Laor
2009-07-16 15:20 ` Mark McLoughlin
0 siblings, 1 reply; 4+ messages in thread
From: Dor Laor @ 2009-07-16 12:00 UTC (permalink / raw)
To: Mark McLoughlin; +Cc: kvm-devel
On 07/16/2009 12:39 PM, Mark McLoughlin wrote:
> Hi Dor,
>
> On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
>> If the migration socket is full, we get EAGAIN for the write.
>> The set_fd_handler2 defers the write for later on. The function
>> tries to wake up the iothread by qemu_kvm_notify_work.
>> Since this happens in a loop, multiple times, the pipe that emulates
>> eventfd becomes full and we get a deadlock.
>
> I'm not sure I follow:
>
> - You're seeing qemu_kvm_notify_work() being called many times
>
> - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(),
> main_loop_break()
>
> - This happens when write() in migrate_fd_put_buffer() returns EAGAIN
> because the socket buffer has filled up
>
> Correct?
>
> That sounds like migrate_fd_put_buffer() is being called repeatedly
> while we know the socket isn't writable?
>
> Shouldn't the buffered file could stop attempting to call put_buffer()
> until it has been notified that the underlying fd is writable?
There are two fds here:
The one returning EAGAIN, is the migration socket. That's why
migrate_fd_put_buffer is called and a call back is rescheduled by
qemu_set_fd_handler2.
In the procedure of qemu_set_fd_handler2, the main_loop_break is called.
It needs to notify the iothread. It does is by qemu_eventfd, since it is
being emulated on older kernels, we use a pipe.
The pipe fd is the one that blocks.
I though of setting it to non-blocking but we might get partial writes
with a non blocking fd (in theory) too.
>
> Cheers,
> Mark.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Fix migration issue when the destination is loaded
2009-07-16 12:00 ` Dor Laor
@ 2009-07-16 15:20 ` Mark McLoughlin
0 siblings, 0 replies; 4+ messages in thread
From: Mark McLoughlin @ 2009-07-16 15:20 UTC (permalink / raw)
To: dlaor; +Cc: kvm-devel
On Thu, 2009-07-16 at 15:00 +0300, Dor Laor wrote:
> On 07/16/2009 12:39 PM, Mark McLoughlin wrote:
> > Hi Dor,
> >
> > On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
> >> If the migration socket is full, we get EAGAIN for the write.
> >> The set_fd_handler2 defers the write for later on. The function
> >> tries to wake up the iothread by qemu_kvm_notify_work.
> >> Since this happens in a loop, multiple times, the pipe that emulates
> >> eventfd becomes full and we get a deadlock.
> >
> > I'm not sure I follow:
> >
> > - You're seeing qemu_kvm_notify_work() being called many times
> >
> > - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(),
> > main_loop_break()
> >
> > - This happens when write() in migrate_fd_put_buffer() returns EAGAIN
> > because the socket buffer has filled up
> >
> > Correct?
> >
> > That sounds like migrate_fd_put_buffer() is being called repeatedly
> > while we know the socket isn't writable?
> >
> > Shouldn't the buffered file could stop attempting to call put_buffer()
> > until it has been notified that the underlying fd is writable?
>
> There are two fds here:
> The one returning EAGAIN, is the migration socket. That's why
> migrate_fd_put_buffer is called and a call back is rescheduled by
> qemu_set_fd_handler2.
>
> In the procedure of qemu_set_fd_handler2, the main_loop_break is called.
> It needs to notify the iothread. It does is by qemu_eventfd, since it is
> being emulated on older kernels, we use a pipe.
>
> The pipe fd is the one that blocks.
> I though of setting it to non-blocking but we might get partial writes
> with a non blocking fd (in theory) too.
Yes, but if the pipe fd is blocking, that means we're writing a lot of
events to it, which means the write to the migration socket is failing
with EAGAIN a lot.
As soon as we get EAGAIN on the migration socket, we should stop trying
to write to socket until the we get notification that its writable.
If we did that, we'd only be writing a very small number of events to
the pipe and we wouldn't block.
Cheers,
Mark.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-07-16 15:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-15 15:03 Fix migration issue when the destination is loaded Dor Laor
2009-07-16 9:39 ` Mark McLoughlin
2009-07-16 12:00 ` Dor Laor
2009-07-16 15:20 ` Mark McLoughlin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).