Re: [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Juan Quintela <quintela@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
	 Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	 James Houghton <jthoughton@google.com>,
	 "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE
Date: Wed, 01 Feb 2023 20:24:40 +0100	[thread overview]
Message-ID: <87edr9w5mv.fsf@secure.mitica> (raw)
In-Reply-To: <20230117220914.2062125-21-peterx@redhat.com> (Peter Xu's message of "Tue, 17 Jan 2023 17:09:13 -0500")

Peter Xu <peterx@redhat.com> wrote:
> Teach QEMU to be able to handle page faults using UFFDIO_CONTINUE for
> hugetlbfs double mapped ranges.
>
> To copy the data, we need to use the mirror buffer created per ramblock by
> a raw memcpy(), then we can kick the faulted threads using UFFDIO_CONTINUE
> by installing the pgtables.
>
> Move trace_postcopy_place_page(host) upper so that it'll dump something for
> either UFFDIO_COPY or UFFDIO_CONTINUE.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

> ---
>  migration/postcopy-ram.c | 55 ++++++++++++++++++++++++++++++++++++++--
>  migration/trace-events   |  4 +--
>  2 files changed, 55 insertions(+), 4 deletions(-)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 8a2259581e..c4bd338e22 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1350,6 +1350,43 @@ int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
>      return 0;
>  }
>  
> +/* Returns the mirror_host addr for a specific host address in ramblock */
> +static inline void *migration_ram_get_mirror_addr(RAMBlock *rb, void *host)
> +{
> +    return (void *)((__u64)rb->host_mirror + ((__u64)host - (__u64)rb->host));

This is gross :-(
I hate this C miss-feature.

What about:
    return (char *)rb->host_mirror + (char*)host - (char*)rb->host;

But I don't know if it (much) clearer.  And no, I don't remember if we
ever need more parents.

gcc used to do "the right" thing on void * arithmetic, but it is not in
the standard, and I don't know what is worse.

> +}
> +
> +static int
> +qemu_uffd_continue(MigrationIncomingState *mis, RAMBlock *rb, void *host,
> +                   void *from)
> +{
> +    void *mirror_addr = migration_ram_get_mirror_addr(rb, host);
> +    /* Doublemap uses small host page size */
> +    uint64_t psize = qemu_real_host_page_size();
> +    struct uffdio_continue req;
> +
> +    /*
> +     * Copy data first into the mirror host pointer; we can't directly copy
> +     * data into rb->host because otherwise our thread will get trapped too.
> +     */
> +    memcpy(mirror_addr, from, psize);
> +
> +    /* Kick off the faluted threads to fetch data from the page cache
                       ^^^^^^^
> */

Faulted

> +    req.range.start = (__u64)host;
> +    req.range.len = psize;
> +    req.mode = 0;
> +	if (ioctl(mis->userfault_fd, UFFDIO_CONTINUE, &req)) {
> +        error_report("%s: UFFDIO_CONTINUE failed for start=%p"
> +                     " len=0x%"PRIx64": %s\n", __func__, host,
> +                     psize, strerror(-req.mapped));
> +        return req.mapped;
> +    }
> +
> +    postcopy_mark_received(mis, rb, host, psize / qemu_target_page_size());
> +
> +    return 0;
> +}
> +
>  /*
>   * Place a host page (from) at (host) atomically
>   * returns 0 on success
> @@ -1359,6 +1396,18 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  {
>      size_t pagesize = migration_ram_pagesize(rb);
>  
> +    trace_postcopy_place_page(rb->idstr, (uint8_t *)host - rb->host, host);
> +
> +    if (postcopy_use_minor_fault(rb)) {
> +        /*
> +         * If minor fault used, we use UFFDIO_CONTINUE instead.
> +         *
> +         * TODO: support shared uffds (e.g. vhost-user). Currently we're
> +         * skipping them.
> +         */
> +        return qemu_uffd_continue(mis, rb, host, from);
> +    }
> +
>      /* copy also acks to the kernel waking the stalled thread up
>       * TODO: We can inhibit that ack and only do it if it was requested
>       * which would be slightly cheaper, but we'd have to be careful
> @@ -1372,7 +1421,6 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>          return -e;
>      }
>  
> -    trace_postcopy_place_page(host);
>      return postcopy_notify_shared_wake(rb,
>                                         qemu_ram_block_host_offset(rb, host));
>  }
> @@ -1385,10 +1433,13 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>                               RAMBlock *rb)
>  {
>      size_t pagesize = migration_ram_pagesize(rb);
> -    trace_postcopy_place_page_zero(host);
> +    trace_postcopy_place_page_zero(rb->idstr, (uint8_t *)host - rb->host, host);

It is me, or to be standard compliant, you need to cast also rb->host?


>      /* Normal RAMBlocks can zero a page using UFFDIO_ZEROPAGE
>       * but it's not available for everything (e.g. hugetlbpages)
> +     *
> +     * NOTE: when hugetlb double-map enabled, then this ramblock will never
> +     * have RAM_UF_ZEROPAGE, so it'll always go to postcopy_place_page().
>       */
>      if (qemu_ram_is_uf_zeroable(rb)) {
>          if (qemu_ufd_copy_ioctl(mis, host, NULL, pagesize, rb)) {
> diff --git a/migration/trace-events b/migration/trace-events
> index 6b418a0e9e..7baf235d22 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -265,8 +265,8 @@ postcopy_discard_send_range(const char *ramblock, unsigned long start, unsigned
>  postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=0x%zx length=0x%zx"
>  postcopy_init_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=0x%zx length=0x%zx"
>  postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=0x%zx length=0x%zx"
> -postcopy_place_page(void *host_addr) "host=%p"
> -postcopy_place_page_zero(void *host_addr) "host=%p"
> +postcopy_place_page(const char *id, size_t offset, void *host_addr) "id=%s offset=0x%zx host=%p"
> +postcopy_place_page_zero(const char *id, size_t offset, void *host_addr) "id=%s offset=0x%zx host=%p"
>  postcopy_ram_enable_notify(void) ""
>  mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %u, cpu: %d, already_received: %d"
>  mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"

I think that you can split the part of the patch that changes the
traces.  But again, it is up to you.

next prev parent reply	other threads:[~2023-02-01 19:25 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 22:08 [PATCH RFC 00/21] migration: Support hugetlb doublemaps Peter Xu
2023-01-17 22:08 ` [PATCH RFC 01/21] update linux headers Peter Xu
2023-01-17 22:08 ` [PATCH RFC 02/21] util: Include osdep.h first in util/mmap-alloc.c Peter Xu
2023-01-18 12:00   ` Dr. David Alan Gilbert
2023-01-25  0:19   ` Philippe Mathieu-Daudé
2023-01-30  4:57   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 03/21] physmem: Add qemu_ram_is_hugetlb() Peter Xu
2023-01-18 12:02   ` Dr. David Alan Gilbert
2023-01-30  5:00   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 04/21] madvise: Include linux/mman.h under linux-headers/ Peter Xu
2023-01-18 12:08   ` Dr. David Alan Gilbert
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 05/21] madvise: Add QEMU_MADV_SPLIT Peter Xu
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 06/21] madvise: Add QEMU_MADV_COLLAPSE Peter Xu
2023-01-18 18:51   ` Dr. David Alan Gilbert
2023-01-18 20:21     ` Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 07/21] ramblock: Cache file offset for file-backed ramblocks Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 08/21] ramblock: Cache the length to do file mmap() on ramblocks Peter Xu
2023-01-23 18:51   ` Dr. David Alan Gilbert
2023-01-24 20:28     ` Peter Xu
2023-01-30  5:05   ` Juan Quintela
2023-01-30 22:07     ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 09/21] ramblock: Add RAM_READONLY Peter Xu
2023-01-23 19:42   ` Dr. David Alan Gilbert
2023-01-30  5:06   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 10/21] ramblock: Add ramblock_file_map() Peter Xu
2023-01-24 10:06   ` Dr. David Alan Gilbert
2023-01-24 20:47     ` Peter Xu
2023-01-25  9:24       ` Dr. David Alan Gilbert
2023-01-25 14:46         ` Peter Xu
2023-01-30  5:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 11/21] migration: Add hugetlb-doublemap cap Peter Xu
2023-01-24 12:45   ` Dr. David Alan Gilbert
2023-01-24 21:15     ` Peter Xu
2023-01-30  5:13   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 12/21] migration: Introduce page size for-migration-only Peter Xu
2023-01-24 13:20   ` Dr. David Alan Gilbert
2023-01-24 21:36     ` Peter Xu
2023-01-24 22:03       ` Peter Xu
2023-01-30  5:17   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 13/21] migration: Add migration_ram_pagesize_largest() Peter Xu
2023-01-24 17:34   ` Dr. David Alan Gilbert
2023-01-30  5:19   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 14/21] migration: Map hugetlbfs ramblocks twice, and pre-allocate Peter Xu
2023-01-25 14:25   ` Dr. David Alan Gilbert
2023-01-30  5:24   ` Juan Quintela
2023-01-30 22:35     ` Peter Xu
2023-02-01 18:53       ` Juan Quintela
2023-02-06 21:40         ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 15/21] migration: Teach qemu about minor faults and doublemap Peter Xu
2023-01-30  5:45   ` Juan Quintela
2023-01-30 22:50     ` Peter Xu
2023-02-01 18:55       ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 16/21] migration: Enable doublemap with MADV_SPLIT Peter Xu
2023-02-01 18:59   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 17/21] migration: Rework ram discard logic for hugetlb double-map Peter Xu
2023-02-01 19:03   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 18/21] migration: Allow postcopy_register_shared_ufd() to fail Peter Xu
2023-02-01 19:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 19/21] migration: Add postcopy_mark_received() Peter Xu
2023-02-01 19:10   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE Peter Xu
2023-02-01 19:24   ` Juan Quintela [this message]
2023-02-01 19:52     ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 21/21] migration: Collapse huge pages again after postcopy finished Peter Xu
2023-02-01 19:49   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edr9w5mv.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.