From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KY06i-0001x5-V9
	for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:21 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KY06g-0001v6-JH
	for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:20 -0400
Received: from [199.232.76.173] (port=56456 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KY06g-0001ur-BY
	for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:18 -0400
Received: from wr-out-0506.google.com ([64.233.184.232]:43973)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1KY06f-0001PN-KH
	for qemu-devel@nongnu.org; Tue, 26 Aug 2008 11:05:18 -0400
Received: by wr-out-0506.google.com with SMTP id c46so2140150wra.18
	for <qemu-devel@nongnu.org>; Tue, 26 Aug 2008 08:05:15 -0700 (PDT)
Message-ID: <48B41B7E.40708@codemonkey.ws>
Date: Tue, 26 Aug 2008 10:04:30 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Xen-devel] Re: [Qemu-devel] [PATCH 01/13] Handle terminating
	signals.
References: <1219336054-15919-1-git-send-email-kraxel@redhat.com>	<1219336054-15919-2-git-send-email-kraxel@redhat.com>	<m2n.s.1KWGbo-001Nr9@chiark.greenend.org.uk>	<18611.56975.584280.471257@mariner.uk.xensource.com>	<48B3F411.2020306@redhat.com>	<18611.63711.631859.280983@mariner.uk.xensource.com>	<48B4027C.1000008@codemonkey.ws>
	<18612.1900.73781.314743@mariner.uk.xensource.com>
In-Reply-To: <18612.1900.73781.314743@mariner.uk.xensource.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: xen-devel@lists.xensource.com, qemu-devel@nongnu.org, Gerd Hoffmann <kraxel@redhat.com>

Ian Jackson wrote:
> Anthony Liguori writes ("[Xen-devel] Re: [Qemu-devel] [PATCH 01/13] Handle terminating signals."):
>   
>> The race I know of is that you may get an aio signal completion before 
>> select but after you've already qemu_aio_poll()'d.  In practice, we only 
>> sleep for 10ms at a time in select() so the race is handled by that.  If 
>> we wanted to increase the amount of time we slept, we would have to 
>> handle this race.
>>     
>
> Yes.  And, 10ms is too long anyway for reasonable performance.  During
> my merge with upstream I found that the qemu aio functionality (which
> was done quite differently to the old xen ioemu) caused a severe
> performance regression under some conditions because of this race.
>   

Yeah, noticed that too, especially with qcow2.

>> In KVM, we sleep for 1s in select() and use signalfd() to receive the 
>> aio notifications.  For older hosts, we emulate signalfd using a thread 
>> and the pipe-to-self trick.
>>     
>
> Why does it need a thread ?  You can just write to the pipe in the
> signal handler.  I'll post my code.
>   

It's a little less perfect.  Your signal handler can write() to the pipe 
but what happens if you get EAGAIN?

So what we do in KVM is use sigwait() within a separate thread.  We 
don't set O_NONBLOCK so the thread blocks if the pipe fills up which is 
exactly the semantics you would want.

Below is our implementation.  I'll queue up to push this change into 
QEMU after I finish with the migration patches.

Regards,

Anthony Liguori

#include <sys/syscall.h>
#include <pthread.h>

struct sigfd_compat_info
{
    sigset_t mask;
    int fd;
};

static void *sigwait_compat(void *opaque)
{
    struct sigfd_compat_info *info = opaque;
    int err;
    sigset_t all;

    sigfillset(&all);
    sigprocmask(SIG_BLOCK, &all, NULL);

    do {
    siginfo_t siginfo;

    err = sigwaitinfo(&info->mask, &siginfo);
    if (err == -1 && errno == EINTR) {
            err = 0;
            continue;
        }

    if (err > 0) {
        char buffer[128];
        size_t offset = 0;

        memcpy(buffer, &err, sizeof(err));
        while (offset < sizeof(buffer)) {
        ssize_t len;

        len = write(info->fd, buffer + offset,
                sizeof(buffer) - offset);
        if (len == -1 && errno == EINTR)
            continue;

        if (len <= 0) {
            err = -1;
            break;
        }

        offset += len;
        }
    }
    } while (err >= 0);

    return NULL;
}

static int kvm_signalfd_compat(const sigset_t *mask)
{
    pthread_attr_t attr;
    pthread_t tid;
    struct sigfd_compat_info *info;
    int fds[2];

    info = malloc(sizeof(*info));
    if (info == NULL) {
    errno = ENOMEM;
    return -1;
    }

    if (pipe(fds) == -1) {
    free(info);
    return -1;
    }

    memcpy(&info->mask, mask, sizeof(*mask));
    info->fd = fds[1];

    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);

    pthread_create(&tid, &attr, sigwait_compat, info);

    pthread_attr_destroy(&attr);

    return fds[0];
}

int kvm_signalfd(const sigset_t *mask)
{
#if defined(SYS_signalfd)
    int ret;

    ret = syscall(SYS_signalfd, -1, mask, _NSIG / 8);
    if (!(ret == -1 && errno == ENOSYS))
    return ret;
#endif

    return kvm_signalfd_compat(mask);
}


> Ian.
>