From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KY06i-0001x5-V9 for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:21 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KY06g-0001v6-JH for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:20 -0400 Received: from [199.232.76.173] (port=56456 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KY06g-0001ur-BY for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:18 -0400 Received: from wr-out-0506.google.com ([64.233.184.232]:43973) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KY06f-0001PN-KH for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:18 -0400 Received: by wr-out-0506.google.com with SMTP id c46so2140150wra.18 for ; Tue, 26 Aug 2008 08:05:15 -0700 (PDT) Message-ID: <48B41B7E.40708@codemonkey.ws> Date: Tue, 26 Aug 2008 10:04:30 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Xen-devel] Re: [Qemu-devel] [PATCH 01/13] Handle terminating signals. References: <1219336054-15919-1-git-send-email-kraxel@redhat.com> <1219336054-15919-2-git-send-email-kraxel@redhat.com> <18611.56975.584280.471257@mariner.uk.xensource.com> <48B3F411.2020306@redhat.com> <18611.63711.631859.280983@mariner.uk.xensource.com> <48B4027C.1000008@codemonkey.ws> <18612.1900.73781.314743@mariner.uk.xensource.com> In-Reply-To: <18612.1900.73781.314743@mariner.uk.xensource.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ian Jackson Cc: xen-devel@lists.xensource.com, qemu-devel@nongnu.org, Gerd Hoffmann Ian Jackson wrote: > Anthony Liguori writes ("[Xen-devel] Re: [Qemu-devel] [PATCH 01/13] Handle terminating signals."): > >> The race I know of is that you may get an aio signal completion before >> select but after you've already qemu_aio_poll()'d. In practice, we only >> sleep for 10ms at a time in select() so the race is handled by that. If >> we wanted to increase the amount of time we slept, we would have to >> handle this race. >> > > Yes. And, 10ms is too long anyway for reasonable performance. During > my merge with upstream I found that the qemu aio functionality (which > was done quite differently to the old xen ioemu) caused a severe > performance regression under some conditions because of this race. > Yeah, noticed that too, especially with qcow2. >> In KVM, we sleep for 1s in select() and use signalfd() to receive the >> aio notifications. For older hosts, we emulate signalfd using a thread >> and the pipe-to-self trick. >> > > Why does it need a thread ? You can just write to the pipe in the > signal handler. I'll post my code. > It's a little less perfect. Your signal handler can write() to the pipe but what happens if you get EAGAIN? So what we do in KVM is use sigwait() within a separate thread. We don't set O_NONBLOCK so the thread blocks if the pipe fills up which is exactly the semantics you would want. Below is our implementation. I'll queue up to push this change into QEMU after I finish with the migration patches. Regards, Anthony Liguori #include #include struct sigfd_compat_info { sigset_t mask; int fd; }; static void *sigwait_compat(void *opaque) { struct sigfd_compat_info *info = opaque; int err; sigset_t all; sigfillset(&all); sigprocmask(SIG_BLOCK, &all, NULL); do { siginfo_t siginfo; err = sigwaitinfo(&info->mask, &siginfo); if (err == -1 && errno == EINTR) { err = 0; continue; } if (err > 0) { char buffer[128]; size_t offset = 0; memcpy(buffer, &err, sizeof(err)); while (offset < sizeof(buffer)) { ssize_t len; len = write(info->fd, buffer + offset, sizeof(buffer) - offset); if (len == -1 && errno == EINTR) continue; if (len <= 0) { err = -1; break; } offset += len; } } } while (err >= 0); return NULL; } static int kvm_signalfd_compat(const sigset_t *mask) { pthread_attr_t attr; pthread_t tid; struct sigfd_compat_info *info; int fds[2]; info = malloc(sizeof(*info)); if (info == NULL) { errno = ENOMEM; return -1; } if (pipe(fds) == -1) { free(info); return -1; } memcpy(&info->mask, mask, sizeof(*mask)); info->fd = fds[1]; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); pthread_create(&tid, &attr, sigwait_compat, info); pthread_attr_destroy(&attr); return fds[0]; } int kvm_signalfd(const sigset_t *mask) { #if defined(SYS_signalfd) int ret; ret = syscall(SYS_signalfd, -1, mask, _NSIG / 8); if (!(ret == -1 && errno == ENOSYS)) return ret; #endif return kvm_signalfd_compat(mask); } > Ian. >