qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	qemu-devel@nongnu.org, armbru@redhat.com,
	"Juan Quintela" <quintela@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Claudio Fontana" <cfontana@suse.de>,
	"Nikolay Borisov" <nborisov@suse.com>
Subject: Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore
Date: Mon, 6 Nov 2023 16:00:33 -0500	[thread overview]
Message-ID: <ZUlT8aIc+QdTIRnR@x1n> (raw)
In-Reply-To: <871qd3dnqs.fsf@suse.de>

On Mon, Nov 06, 2023 at 10:18:03AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Nov 01, 2023 at 02:28:24PM +0000, Daniel P. Berrangé wrote:
> >> On Wed, Nov 01, 2023 at 10:21:07AM -0400, Peter Xu wrote:
> >> > On Wed, Nov 01, 2023 at 09:26:46AM +0000, Daniel P. Berrangé wrote:
> >> > > On Tue, Oct 31, 2023 at 03:03:50PM -0400, Peter Xu wrote:
> >> > > > On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
> >> > > > > >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
> >> > > > > >> +{
> >> > > > > >> +    g_autofree unsigned long *bitmap = NULL;
> >> > > > > >> +    struct FixedRamHeader header;
> >> > > > > >> +    size_t bitmap_size;
> >> > > > > >> +    long num_pages;
> >> > > > > >> +    int ret = 0;
> >> > > > > >> +
> >> > > > > >> +    ret = fixed_ram_read_header(f, &header);
> >> > > > > >> +    if (ret < 0) {
> >> > > > > >> +        error_report("Error reading fixed-ram header");
> >> > > > > >> +        return -EINVAL;
> >> > > > > >> +    }
> >> > > > > >> +
> >> > > > > >> +    block->pages_offset = header.pages_offset;
> >> > > > > >
> >> > > > > > Do you think it is worth sanity checking that 'pages_offset' is aligned
> >> > > > > > in some way.
> >> > > > > >
> >> > > > > > It is nice that we have flexibility to change the alignment in future
> >> > > > > > if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
> >> > > > > > check htere. Perhaps we could at least sanity check for alignment at
> >> > > > > > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
> >> > > > > >
> >> > > > > 
> >> > > > > I don't see why not. I'll add it.
> >> > > > 
> >> > > > Is there any explanation on why that 1MB offset, and how the number is
> >> > > > chosen?  Thanks,
> >> > > 
> >> > > The fixed-ram format is anticipating the use of O_DIRECT.
> >> > > 
> >> > > With O_DIRECT both the buffers in memory, and the file handle offset
> >> > > have alignment requirements. The buffer alignments are usually page
> >> > > sized, and QEMU RAM blocks will trivially satisfy those.
> >> > > 
> >> > > The file handle offset alignment varies per filesystem. While you can
> >> > > query the alignment for the FS holding the file with statx(), that is
> >> > > not appropriate todo. If a user saves/restores QEMU state to file, we
> >> > > must assume there is a chance the user will copy the saved state to a
> >> > > different filesystem.
> >> > > 
> >> > > IOW, we want alignment to satisfy the likely worst case.
> >> > > 
> >> > > Picking 1 MB is a nice round number that is large enough that it is
> >> > > almost certainly going to satisfy any filesystem alignment. In fact
> >> > > it is likely massive overkill. None the less 1 MB is also still tiny
> >> > 
> >> > Is that calculated by something like max of possible host (small) page
> >> > sizes?  I've no idea what's it for all archs, the max small page size I'm
> >> > aware of is 64K, but I don't know a lot archs.
> >> 
> >> It wasn't anything as precise as that. It is literally just "1MB" looks
> >> large enough that we don't need to spend time to investigate per arch
> >> page sizes.
> >
> > IMHO we need that precision on reasoning and document it, even if not on
> > the exact number we prefer, which can be prone to change later.  Otherwise
> > that value will be a pure magic soon after a few years or even less, it'll
> > be more of a challenge later to figure things out.
> >
> >> 
> >> Having said that I'm now having slight self-doubt wrt huge pages, though
> >> I swear we investigated it last year when first discussing this feature.
> >> The guest memory will of course already be suitably aligned, but I'm
> >> wondering if the filesystem I/O places any offset alignment constraints
> >> related to non-default page size.
> >
> > AFAIU direct IO is about pinning the IO buffers, playing the role of fs
> > cache instead.  If my understanding is correct, huge pages shouldn't be a
> > problem for such pinning, because it's legal to pin partial of a huge page.
> >
> > After the partial huge pages pinned, they should be treated as normal fs
> > buffers when doing block IO.  And then the offset of file should, per my
> > understanding, not relevant to what is the type of backend of that user
> > buffer anymore that triggers read()/write().
> >
> > But maybe I missed something, if so that will need to be part of
> > documentation of 1MB magic value, IMHO.  We may want to double check with
> > that by doing fixed-ram migration on e.g. 1GB hugetlb memory-backend-file
> > with 1MB file offset per-ramblock.
> 
> Does anyone have any indication that we need to relate the aligment to
> the page size? All I find online points to device block size being the
> limiting factor for filesystems. There's also raw_probe_alignment() at
> file-posix.c which seems to check up to 4k and recommend to disable
> O_DIRECT if an alignment is not found.

Right, it should be more relevant to block size.

> 
> Note that we shouldn't have any problems changing the alignment we
> choose since we have a pointer to the start of the aligned region which
> goes along with the fixed-ram header. We could even do some probing like
> the block layer does if we wanted.

Having 1MB offset is fine, especially as you said we keep recv side fetch
that in the headers always.

Perhaps we make that 1MB a macro, and add a comment explaining whatever we
have on understanding how that 1MB come from?  I think we can also
reference raw_probe_alignment() in the comment.

Meanwhile now I'm wondering whether that 1MB is too conservative.  Only a
problem if there can be tons of ramblocks (e.g. 1000 ramblock means
1MB*1000=1G for the headers).  I can't think of any, though..  We can
definitely leave that for later.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-11-06 21:01 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23 20:35 [PATCH v2 00/29] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 01/29] tests/qtest: migration events Fabiano Rosas
2023-10-25  9:44   ` Thomas Huth
2023-10-25 10:14   ` Daniel P. Berrangé
2023-10-25 13:21   ` Fabiano Rosas
2023-10-25 13:33     ` Steven Sistare
2023-10-23 20:35 ` [PATCH v2 02/29] tests/qtest: Move QTestMigrationState to libqtest Fabiano Rosas
2023-10-25 10:17   ` Daniel P. Berrangé
2023-10-25 13:19     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 03/29] tests/qtest: Allow waiting for migration events Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 04/29] migration: Return the saved state from global_state_store Fabiano Rosas
2023-10-25 10:19   ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 05/29] migration: Introduce global_state_store_once Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 06/29] migration: Add auto-pause capability Fabiano Rosas
2023-10-24  5:25   ` Markus Armbruster
2023-10-24 18:12     ` Fabiano Rosas
2023-10-25  5:33       ` Markus Armbruster
2023-10-25  8:48   ` Daniel P. Berrangé
2023-10-25 13:57     ` Fabiano Rosas
2023-10-25 14:20       ` Daniel P. Berrangé
2023-10-25 14:58         ` Peter Xu
2023-10-25 15:25           ` Daniel P. Berrangé
2023-10-25 15:36             ` Peter Xu
2023-10-25 15:40               ` Daniel P. Berrangé
2023-10-25 17:20                 ` Peter Xu
2023-10-25 17:31                   ` Daniel P. Berrangé
2023-10-25 19:28                     ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 07/29] migration: Run "file:" migration with a stopped VM Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 08/29] tests/qtest: File migration auto-pause tests Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 10/29] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-10-24  8:18   ` Daniel P. Berrangé
2023-10-24 19:06     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 11/29] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 12/29] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-10-25 10:22   ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 13/29] migration: fixed-ram: Add URI compatibility check Fabiano Rosas
2023-10-25 10:27   ` Daniel P. Berrangé
2023-10-31 16:06   ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 14/29] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-10-24  5:33   ` Markus Armbruster
2023-10-24 18:35     ` Fabiano Rosas
2023-10-25  6:18       ` Markus Armbruster
2023-10-23 20:35 ` [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing migration Fabiano Rosas
2023-10-25  9:39   ` Daniel P. Berrangé
2023-10-25 14:03     ` Fabiano Rosas
2023-11-01 15:23     ` Peter Xu
2023-11-01 15:52       ` Daniel P. Berrangé
2023-11-01 16:24         ` Peter Xu
2023-11-01 16:37           ` Daniel P. Berrangé
2023-11-01 17:30             ` Peter Xu
2023-10-31 16:52   ` Peter Xu
2023-10-31 17:33     ` Fabiano Rosas
2023-10-31 17:59       ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-10-25  9:43   ` Daniel P. Berrangé
2023-10-25 14:07     ` Fabiano Rosas
2023-10-31 19:03       ` Peter Xu
2023-11-01  9:26         ` Daniel P. Berrangé
2023-11-01 14:21           ` Peter Xu
2023-11-01 14:28             ` Daniel P. Berrangé
2023-11-01 15:00               ` Peter Xu
2023-11-06 13:18                 ` Fabiano Rosas
2023-11-06 21:00                   ` Peter Xu [this message]
2023-11-07  9:02                     ` Daniel P. Berrangé
2023-10-31 19:09   ` Peter Xu
2023-10-31 20:00     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 17/29] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 18/29] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 19/29] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-10-25  9:52   ` Daniel P. Berrangé
2023-10-25 14:12     ` Fabiano Rosas
2023-10-25 14:23       ` Daniel P. Berrangé
2023-10-25 15:00         ` Fabiano Rosas
2023-10-25 15:26           ` Daniel P. Berrangé
2023-10-31 20:11   ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 20/29] migration/multifd: Add incoming " Fabiano Rosas
2023-10-25 10:29   ` Daniel P. Berrangé
2023-10-25 14:18     ` Fabiano Rosas
2023-10-31 21:28   ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 21/29] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-10-31 22:10   ` Peter Xu
2023-10-31 23:18     ` Fabiano Rosas
2023-11-01 15:55       ` Peter Xu
2023-11-01 17:20         ` Fabiano Rosas
2023-11-01 17:35           ` Peter Xu
2023-11-01 18:14             ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 22/29] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-10-24  8:50   ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 23/29] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-11-01 14:29   ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 24/29] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2023-10-25  9:09   ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 25/29] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-10-25  9:23   ` Daniel P. Berrangé
2023-10-25 14:21     ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 26/29] migration/multifd: Support incoming " Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 27/29] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 28/29] migration: Add direct-io parameter Fabiano Rosas
2023-10-24  5:41   ` Markus Armbruster
2023-10-24 19:32     ` Fabiano Rosas
2023-10-25  6:23       ` Markus Armbruster
2023-10-25  8:44       ` Daniel P. Berrangé
2023-10-25 14:32         ` Fabiano Rosas
2023-10-25 14:43           ` Daniel P. Berrangé
2023-10-25 17:30             ` Fabiano Rosas
2023-10-25 17:45               ` Daniel P. Berrangé
2023-10-25 18:10                 ` Fabiano Rosas
2023-10-30 22:51             ` Fabiano Rosas
2023-10-31  9:03               ` Daniel P. Berrangé
2023-10-31 13:05                 ` Fabiano Rosas
2023-10-31 13:45                   ` Daniel P. Berrangé
2023-10-31 14:33                     ` Fabiano Rosas
2023-10-31 15:22                       ` Daniel P. Berrangé
2023-10-31 15:52                         ` Fabiano Rosas
2023-10-31 15:58                           ` Daniel P. Berrangé
2023-10-31 19:05                             ` Fabiano Rosas
2023-11-01  9:30                               ` Daniel P. Berrangé
2023-11-01 12:16                                 ` Fabiano Rosas
2023-11-01 12:23                                   ` Daniel P. Berrangé
2023-11-01 12:30                                     ` Fabiano Rosas
2023-10-24  8:33   ` Daniel P. Berrangé
2023-10-24 19:06     ` Fabiano Rosas
2023-10-25  9:07   ` Daniel P. Berrangé
2023-10-25 14:48     ` Fabiano Rosas
2023-10-25 15:22       ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 29/29] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-10-25  9:25   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUlT8aIc+QdTIRnR@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=nborisov@suse.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).