qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH RFC 11/21] migration: Add hugetlb-doublemap cap
Date: Tue, 24 Jan 2023 16:15:37 -0500	[thread overview]
Message-ID: <Y9BKeZjQG53X5kZb@x1n> (raw)
In-Reply-To: <Y8/S8g4s42RCBTEV@work-vm>

On Tue, Jan 24, 2023 at 12:45:38PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Add a new cap to allow mapping hugetlbfs backed RAMs in small page sizes.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks.

> 
> although, I'm curious if the protocol actually changes

Yes it does.

It differs not in the form of a changed header or any frame definitions,
but in the format of how huge pages are sent.  The old binary can only send
a huge page by sending all the small pages sequentially starting from index
0 to index N_HUGE-1; while the new binary can send the huge page out of
order.  For the latter it's the same as when huge page is not used.

> or whether a doublepage enabled destination would work with an unmodified
> source?

This is an interesting question.

I would expect old -> new work as usual, because the page frames are not
modified so the dest node will just see pages being migrated in a
sequential manner.  The latency of page request will be the same as old
binary though because even if dest host can handle small pages it won't be
able to get asap on the pages it wants - src host decides which page to
send.

Meanwhile new -> old shouldn't work I think as described above, because the
dest host should see weird things happening, e.g., a huge page was sent not
starting fron index 0 but index X (0<X<N_HUGE-1).  It should quickly bail
out assuming there's something wrong.

> I guess potentially you can get away without the dirty clearing
> of the partially sent hugepages that the source normally does.

Good point. It's actually more relevant to the other patch later on
reworking the discard logic.  I kept it as-is for majorly two reasons:

 1) It is still not 100% confirmed on how MADV_DONTNEED should behave on
    HGM enabled memory ranges where huge pages used to be mapped.  It's
    part of the discussion upstream on the kernel patchset.  I think it's
    settling, but in the current series I kept it in a form so it'll work
    in all cases.

 2) Not dirtying the partially sent huge pages can always reduce small
    pages being migrated, but it can also change the content of discard
    messages due to the frame format of MIG_CMD_POSTCOPY_RAM_DISCARD, in
    that we can have a lot more scattered ranges, so a lot more messaging
    can be needed.  While when with the existing logic, since we'll always
    re-dirty the partial sent pages, the ranges are more likely to be
    efficient.
    
        * CMD_POSTCOPY_RAM_DISCARD consist of:
        *      byte   version (0)
        *      byte   Length of name field (not including 0)
        *  n x byte   RAM block name
        *      byte   0 terminator (just for safety)
        *  n x        Byte ranges within the named RAMBlock
        *      be64   Start of the range
        *      be64   Length

I think 1) may not hold as the kernel series evolves, so it may not be true
anymore.  2) may still be true, but I think worth some testing (especially
on 1G pages) to see how it could interfere the discard procedure.  Maybe it
won't be as bad as I think.  Even if it could, we can evaluate the tradeoff
between "slower discard sync" and "less page need to send".  E.g., we can
consider changing the frame layout by boosting postcopy_ram_discard_version.

I'll take a note on this one and provide more update in the next version.

-- 
Peter Xu



  reply	other threads:[~2023-01-24 21:16 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 22:08 [PATCH RFC 00/21] migration: Support hugetlb doublemaps Peter Xu
2023-01-17 22:08 ` [PATCH RFC 01/21] update linux headers Peter Xu
2023-01-17 22:08 ` [PATCH RFC 02/21] util: Include osdep.h first in util/mmap-alloc.c Peter Xu
2023-01-18 12:00   ` Dr. David Alan Gilbert
2023-01-25  0:19   ` Philippe Mathieu-Daudé
2023-01-30  4:57   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 03/21] physmem: Add qemu_ram_is_hugetlb() Peter Xu
2023-01-18 12:02   ` Dr. David Alan Gilbert
2023-01-30  5:00   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 04/21] madvise: Include linux/mman.h under linux-headers/ Peter Xu
2023-01-18 12:08   ` Dr. David Alan Gilbert
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 05/21] madvise: Add QEMU_MADV_SPLIT Peter Xu
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 06/21] madvise: Add QEMU_MADV_COLLAPSE Peter Xu
2023-01-18 18:51   ` Dr. David Alan Gilbert
2023-01-18 20:21     ` Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 07/21] ramblock: Cache file offset for file-backed ramblocks Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 08/21] ramblock: Cache the length to do file mmap() on ramblocks Peter Xu
2023-01-23 18:51   ` Dr. David Alan Gilbert
2023-01-24 20:28     ` Peter Xu
2023-01-30  5:05   ` Juan Quintela
2023-01-30 22:07     ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 09/21] ramblock: Add RAM_READONLY Peter Xu
2023-01-23 19:42   ` Dr. David Alan Gilbert
2023-01-30  5:06   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 10/21] ramblock: Add ramblock_file_map() Peter Xu
2023-01-24 10:06   ` Dr. David Alan Gilbert
2023-01-24 20:47     ` Peter Xu
2023-01-25  9:24       ` Dr. David Alan Gilbert
2023-01-25 14:46         ` Peter Xu
2023-01-30  5:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 11/21] migration: Add hugetlb-doublemap cap Peter Xu
2023-01-24 12:45   ` Dr. David Alan Gilbert
2023-01-24 21:15     ` Peter Xu [this message]
2023-01-30  5:13   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 12/21] migration: Introduce page size for-migration-only Peter Xu
2023-01-24 13:20   ` Dr. David Alan Gilbert
2023-01-24 21:36     ` Peter Xu
2023-01-24 22:03       ` Peter Xu
2023-01-30  5:17   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 13/21] migration: Add migration_ram_pagesize_largest() Peter Xu
2023-01-24 17:34   ` Dr. David Alan Gilbert
2023-01-30  5:19   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 14/21] migration: Map hugetlbfs ramblocks twice, and pre-allocate Peter Xu
2023-01-25 14:25   ` Dr. David Alan Gilbert
2023-01-30  5:24   ` Juan Quintela
2023-01-30 22:35     ` Peter Xu
2023-02-01 18:53       ` Juan Quintela
2023-02-06 21:40         ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 15/21] migration: Teach qemu about minor faults and doublemap Peter Xu
2023-01-30  5:45   ` Juan Quintela
2023-01-30 22:50     ` Peter Xu
2023-02-01 18:55       ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 16/21] migration: Enable doublemap with MADV_SPLIT Peter Xu
2023-02-01 18:59   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 17/21] migration: Rework ram discard logic for hugetlb double-map Peter Xu
2023-02-01 19:03   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 18/21] migration: Allow postcopy_register_shared_ufd() to fail Peter Xu
2023-02-01 19:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 19/21] migration: Add postcopy_mark_received() Peter Xu
2023-02-01 19:10   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE Peter Xu
2023-02-01 19:24   ` Juan Quintela
2023-02-01 19:52     ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 21/21] migration: Collapse huge pages again after postcopy finished Peter Xu
2023-02-01 19:49   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9BKeZjQG53X5kZb@x1n \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=lsoaresp@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).