From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:48318)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1SfI9H-0002mO-JN
	for qemu-devel@nongnu.org; Thu, 14 Jun 2012 18:04:17 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1SfI9F-0005Mf-Hn
	for qemu-devel@nongnu.org; Thu, 14 Jun 2012 18:04:15 -0400
Received: from mx1.redhat.com ([209.132.183.28]:8289)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1SfI9F-0005MS-8X
	for qemu-devel@nongnu.org; Thu, 14 Jun 2012 18:04:13 -0400
From: Juan Quintela <quintela@redhat.com>
In-Reply-To: <77d2fc4b1c99e1520a7fbe873e73119ce9463592.1338802192.git.yamahata@valinux.co.jp>
	(Isaku Yamahata's message of "Mon, 4 Jun 2012 18:57:37 +0900")
References: <cover.1338802190.git.yamahata@valinux.co.jp>
	<cover.1338802190.git.yamahata@valinux.co.jp>
	<77d2fc4b1c99e1520a7fbe873e73119ce9463592.1338802192.git.yamahata@valinux.co.jp>
Date: Thu, 14 Jun 2012 23:34:09 +0200
Message-ID: <87vcitshm6.fsf@elfo.mitica>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [PATCH v2 35/41] postcopy: introduce helper
	functions for postcopy
Reply-To: quintela@redhat.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com, kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, stefanha@gmail.com, t.hirofuchi@aist.go.jp, dlaor@redhat.com, qemu-devel@nongnu.org, mdroth@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, owasserm@redhat.com, avi@redhat.com, pbonzini@redhat.com

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> +//#define DEBUG_UMEM
> +#ifdef DEBUG_UMEM
> +#include <sys/syscall.h>
> +#define DPRINTF(format, ...)                                            \
> +    do {                                                                \
> +        printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid),    \
> +               __func__, __LINE__, ## __VA_ARGS__);                     \
> +    } while (0)

This should be in a header file that is linux specific?  And (at least
on my systems) gettid is already defined on glibc.


> +#else
> +#define DPRINTF(format, ...)    do { } while (0)
> +#endif


> +
> +#define DEV_UMEM        "/dev/umem"
> +
> +UMem *umem_new(void *hostp, size_t size)
> +{
> +    struct umem_init uinit = {
> +        .size = size,
> +    };
> +    UMem *umem;
> +
> +    assert((size % getpagesize()) == 0);
> +    umem = g_new(UMem, 1);
> +    umem->fd = open(DEV_UMEM, O_RDWR);
> +    if (umem->fd < 0) {
> +        perror("can't open "DEV_UMEM);
> +        abort();

Can we return one error insntead of abort?  the same for the rest of the
file aborts.


> +size_t umem_pages_size(uint64_t nr)
> +{
> +    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);

Can we make sure that the pgoffs field is aligned?  I know that as it is
now it is aligned, but better to be sure?

> +}
> +
> +static void umem_write_cmd(int fd, uint8_t cmd)
> +{
> +    DPRINTF("write cmd %c\n", cmd);
> +
> +    for (;;) {
> +        ssize_t ret = write(fd, &cmd, 1);
> +        if (ret == -1) {
> +            if (errno == EINTR) {
> +                continue;
> +            } else if (errno == EPIPE) {
> +                perror("pipe");
> +                DPRINTF("write cmd %c %zd %d: pipe is closed\n",
> +                        cmd, ret, errno);
> +                break;
> +            }


Grr, we don't have a function that writes does a "safe_write".  The most
similar thing in qemu looks to be send_all().

> +
> +            perror("pipe");

Can we make a different perror() message than previous error?

> +            DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno);
> +            abort();
> +        }
> +
> +        break;
> +    }
> +}
> +
> +static void umem_read_cmd(int fd, uint8_t expect)
> +{
> +    uint8_t cmd;
> +    for (;;) {
> +        ssize_t ret = read(fd, &cmd, 1);
> +        if (ret == -1) {
> +            if (errno == EINTR) {
> +                continue;
> +            }
> +            perror("pipe");
> +            DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno);
> +            abort();
> +        }
> +
> +        if (ret == 0) {
> +            DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret);
> +            abort();
> +        }
> +
> +        break;
> +    }
> +
> +    DPRINTF("read cmd %c\n", cmd);
> +    if (cmd != expect) {
> +        DPRINTF("cmd %c expect %d\n", cmd, expect);
> +        abort();

Ouch.  If we receive garbage, we just exit?

I really think that we should implement error handling.

> +    }
> +}
> +
> +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
> +{
> +    int ret;
> +    uint64_t nr;
> +    size_t size;
> +    struct umem_pages *pages;
> +
> +    ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset);
> +    *offset += sizeof(nr);
> +    DPRINTF("ret %d nr %ld\n", ret, nr);
> +    if (ret != sizeof(nr) || nr == 0) {
> +        return NULL;
> +    }
> +
> +    size = umem_pages_size(nr);
> +    pages = g_malloc(size);

Just thinking about this.  Couldn't we just decide on a "big enough"
buffer, and never send anything bigger than that?  That would remove the
need to have to malloc()/free() a buffer for each reception?


> +/* qemu side handler */
> +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
> +                                                int *offset)
> +{
> +    uint64_t i;
> +    int page_shift = ffs(getpagesize()) - 1;
> +    struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
> +    if (pages == NULL) {
> +        return NULL;
> +    }
> +
> +    for (i = 0; i < pages->nr; i++) {
> +        ram_addr_t addr = pages->pgoffs[i] << page_shift;
> +
> +        /* make pages present by forcibly triggering page fault. */
> +        volatile uint8_t *ram = qemu_get_ram_ptr(addr);
> +        uint8_t dummy_read = ram[0];
> +        (void)dummy_read;   /* suppress unused variable warning */
> +    }
> +
> +    /*
> +     * Very Linux implementation specific.
> +     * Make it sure that other thread doesn't fault on the above virtual
> +     * address. (More exactly other thread doesn't call fault handler with
> +     * the offset.)
> +     * the fault handler is called with mmap_sem read locked.
> +     * madvise() does down/up_write(mmap_sem)
> +     */
> +    qemu_madvise(NULL, 0, MADV_NORMAL);

If it is linux specific, should be inside CONFIG_LINUX ifdef, or a
function hided on some header.

Talking about looking, what protects that no other thread enters this
function before this one calls madvise?   Or I am losing something obvious?

> +
> +struct umem_pages {
> +    uint64_t nr;
> +    uint64_t pgoffs[0];
> +};
> +

QEMU really likes typedefs for structs.

Later, Juan.