From: Fabiano Rosas <farosas@suse.de>
To: "Peter Xu" <peterx@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com,
Juan Quintela <quintela@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
Claudio Fontana <cfontana@suse.de>,
Nikolay Borisov <nborisov@suse.com>
Subject: Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore
Date: Mon, 06 Nov 2023 10:18:03 -0300 [thread overview]
Message-ID: <871qd3dnqs.fsf@suse.de> (raw)
In-Reply-To: <ZUJn/5GuRXcudsiF@x1n>
Peter Xu <peterx@redhat.com> writes:
> On Wed, Nov 01, 2023 at 02:28:24PM +0000, Daniel P. Berrangé wrote:
>> On Wed, Nov 01, 2023 at 10:21:07AM -0400, Peter Xu wrote:
>> > On Wed, Nov 01, 2023 at 09:26:46AM +0000, Daniel P. Berrangé wrote:
>> > > On Tue, Oct 31, 2023 at 03:03:50PM -0400, Peter Xu wrote:
>> > > > On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
>> > > > > >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>> > > > > >> +{
>> > > > > >> + g_autofree unsigned long *bitmap = NULL;
>> > > > > >> + struct FixedRamHeader header;
>> > > > > >> + size_t bitmap_size;
>> > > > > >> + long num_pages;
>> > > > > >> + int ret = 0;
>> > > > > >> +
>> > > > > >> + ret = fixed_ram_read_header(f, &header);
>> > > > > >> + if (ret < 0) {
>> > > > > >> + error_report("Error reading fixed-ram header");
>> > > > > >> + return -EINVAL;
>> > > > > >> + }
>> > > > > >> +
>> > > > > >> + block->pages_offset = header.pages_offset;
>> > > > > >
>> > > > > > Do you think it is worth sanity checking that 'pages_offset' is aligned
>> > > > > > in some way.
>> > > > > >
>> > > > > > It is nice that we have flexibility to change the alignment in future
>> > > > > > if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
>> > > > > > check htere. Perhaps we could at least sanity check for alignment at
>> > > > > > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
>> > > > > >
>> > > > >
>> > > > > I don't see why not. I'll add it.
>> > > >
>> > > > Is there any explanation on why that 1MB offset, and how the number is
>> > > > chosen? Thanks,
>> > >
>> > > The fixed-ram format is anticipating the use of O_DIRECT.
>> > >
>> > > With O_DIRECT both the buffers in memory, and the file handle offset
>> > > have alignment requirements. The buffer alignments are usually page
>> > > sized, and QEMU RAM blocks will trivially satisfy those.
>> > >
>> > > The file handle offset alignment varies per filesystem. While you can
>> > > query the alignment for the FS holding the file with statx(), that is
>> > > not appropriate todo. If a user saves/restores QEMU state to file, we
>> > > must assume there is a chance the user will copy the saved state to a
>> > > different filesystem.
>> > >
>> > > IOW, we want alignment to satisfy the likely worst case.
>> > >
>> > > Picking 1 MB is a nice round number that is large enough that it is
>> > > almost certainly going to satisfy any filesystem alignment. In fact
>> > > it is likely massive overkill. None the less 1 MB is also still tiny
>> >
>> > Is that calculated by something like max of possible host (small) page
>> > sizes? I've no idea what's it for all archs, the max small page size I'm
>> > aware of is 64K, but I don't know a lot archs.
>>
>> It wasn't anything as precise as that. It is literally just "1MB" looks
>> large enough that we don't need to spend time to investigate per arch
>> page sizes.
>
> IMHO we need that precision on reasoning and document it, even if not on
> the exact number we prefer, which can be prone to change later. Otherwise
> that value will be a pure magic soon after a few years or even less, it'll
> be more of a challenge later to figure things out.
>
>>
>> Having said that I'm now having slight self-doubt wrt huge pages, though
>> I swear we investigated it last year when first discussing this feature.
>> The guest memory will of course already be suitably aligned, but I'm
>> wondering if the filesystem I/O places any offset alignment constraints
>> related to non-default page size.
>
> AFAIU direct IO is about pinning the IO buffers, playing the role of fs
> cache instead. If my understanding is correct, huge pages shouldn't be a
> problem for such pinning, because it's legal to pin partial of a huge page.
>
> After the partial huge pages pinned, they should be treated as normal fs
> buffers when doing block IO. And then the offset of file should, per my
> understanding, not relevant to what is the type of backend of that user
> buffer anymore that triggers read()/write().
>
> But maybe I missed something, if so that will need to be part of
> documentation of 1MB magic value, IMHO. We may want to double check with
> that by doing fixed-ram migration on e.g. 1GB hugetlb memory-backend-file
> with 1MB file offset per-ramblock.
Does anyone have any indication that we need to relate the aligment to
the page size? All I find online points to device block size being the
limiting factor for filesystems. There's also raw_probe_alignment() at
file-posix.c which seems to check up to 4k and recommend to disable
O_DIRECT if an alignment is not found.
Note that we shouldn't have any problems changing the alignment we
choose since we have a pointer to the start of the aligned region which
goes along with the fixed-ram header. We could even do some probing like
the block layer does if we wanted.
next prev parent reply other threads:[~2023-11-06 13:19 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 20:35 [PATCH v2 00/29] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 01/29] tests/qtest: migration events Fabiano Rosas
2023-10-25 9:44 ` Thomas Huth
2023-10-25 10:14 ` Daniel P. Berrangé
2023-10-25 13:21 ` Fabiano Rosas
2023-10-25 13:33 ` Steven Sistare
2023-10-23 20:35 ` [PATCH v2 02/29] tests/qtest: Move QTestMigrationState to libqtest Fabiano Rosas
2023-10-25 10:17 ` Daniel P. Berrangé
2023-10-25 13:19 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 03/29] tests/qtest: Allow waiting for migration events Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 04/29] migration: Return the saved state from global_state_store Fabiano Rosas
2023-10-25 10:19 ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 05/29] migration: Introduce global_state_store_once Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 06/29] migration: Add auto-pause capability Fabiano Rosas
2023-10-24 5:25 ` Markus Armbruster
2023-10-24 18:12 ` Fabiano Rosas
2023-10-25 5:33 ` Markus Armbruster
2023-10-25 8:48 ` Daniel P. Berrangé
2023-10-25 13:57 ` Fabiano Rosas
2023-10-25 14:20 ` Daniel P. Berrangé
2023-10-25 14:58 ` Peter Xu
2023-10-25 15:25 ` Daniel P. Berrangé
2023-10-25 15:36 ` Peter Xu
2023-10-25 15:40 ` Daniel P. Berrangé
2023-10-25 17:20 ` Peter Xu
2023-10-25 17:31 ` Daniel P. Berrangé
2023-10-25 19:28 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 07/29] migration: Run "file:" migration with a stopped VM Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 08/29] tests/qtest: File migration auto-pause tests Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 10/29] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-10-24 8:18 ` Daniel P. Berrangé
2023-10-24 19:06 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 11/29] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 12/29] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-10-25 10:22 ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 13/29] migration: fixed-ram: Add URI compatibility check Fabiano Rosas
2023-10-25 10:27 ` Daniel P. Berrangé
2023-10-31 16:06 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 14/29] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-10-24 5:33 ` Markus Armbruster
2023-10-24 18:35 ` Fabiano Rosas
2023-10-25 6:18 ` Markus Armbruster
2023-10-23 20:35 ` [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing migration Fabiano Rosas
2023-10-25 9:39 ` Daniel P. Berrangé
2023-10-25 14:03 ` Fabiano Rosas
2023-11-01 15:23 ` Peter Xu
2023-11-01 15:52 ` Daniel P. Berrangé
2023-11-01 16:24 ` Peter Xu
2023-11-01 16:37 ` Daniel P. Berrangé
2023-11-01 17:30 ` Peter Xu
2023-10-31 16:52 ` Peter Xu
2023-10-31 17:33 ` Fabiano Rosas
2023-10-31 17:59 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-10-25 9:43 ` Daniel P. Berrangé
2023-10-25 14:07 ` Fabiano Rosas
2023-10-31 19:03 ` Peter Xu
2023-11-01 9:26 ` Daniel P. Berrangé
2023-11-01 14:21 ` Peter Xu
2023-11-01 14:28 ` Daniel P. Berrangé
2023-11-01 15:00 ` Peter Xu
2023-11-06 13:18 ` Fabiano Rosas [this message]
2023-11-06 21:00 ` Peter Xu
2023-11-07 9:02 ` Daniel P. Berrangé
2023-10-31 19:09 ` Peter Xu
2023-10-31 20:00 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 17/29] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 18/29] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 19/29] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-10-25 9:52 ` Daniel P. Berrangé
2023-10-25 14:12 ` Fabiano Rosas
2023-10-25 14:23 ` Daniel P. Berrangé
2023-10-25 15:00 ` Fabiano Rosas
2023-10-25 15:26 ` Daniel P. Berrangé
2023-10-31 20:11 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 20/29] migration/multifd: Add incoming " Fabiano Rosas
2023-10-25 10:29 ` Daniel P. Berrangé
2023-10-25 14:18 ` Fabiano Rosas
2023-10-31 21:28 ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 21/29] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-10-31 22:10 ` Peter Xu
2023-10-31 23:18 ` Fabiano Rosas
2023-11-01 15:55 ` Peter Xu
2023-11-01 17:20 ` Fabiano Rosas
2023-11-01 17:35 ` Peter Xu
2023-11-01 18:14 ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 22/29] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-10-24 8:50 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 23/29] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-11-01 14:29 ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 24/29] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2023-10-25 9:09 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 25/29] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-10-25 9:23 ` Daniel P. Berrangé
2023-10-25 14:21 ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 26/29] migration/multifd: Support incoming " Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 27/29] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 28/29] migration: Add direct-io parameter Fabiano Rosas
2023-10-24 5:41 ` Markus Armbruster
2023-10-24 19:32 ` Fabiano Rosas
2023-10-25 6:23 ` Markus Armbruster
2023-10-25 8:44 ` Daniel P. Berrangé
2023-10-25 14:32 ` Fabiano Rosas
2023-10-25 14:43 ` Daniel P. Berrangé
2023-10-25 17:30 ` Fabiano Rosas
2023-10-25 17:45 ` Daniel P. Berrangé
2023-10-25 18:10 ` Fabiano Rosas
2023-10-30 22:51 ` Fabiano Rosas
2023-10-31 9:03 ` Daniel P. Berrangé
2023-10-31 13:05 ` Fabiano Rosas
2023-10-31 13:45 ` Daniel P. Berrangé
2023-10-31 14:33 ` Fabiano Rosas
2023-10-31 15:22 ` Daniel P. Berrangé
2023-10-31 15:52 ` Fabiano Rosas
2023-10-31 15:58 ` Daniel P. Berrangé
2023-10-31 19:05 ` Fabiano Rosas
2023-11-01 9:30 ` Daniel P. Berrangé
2023-11-01 12:16 ` Fabiano Rosas
2023-11-01 12:23 ` Daniel P. Berrangé
2023-11-01 12:30 ` Fabiano Rosas
2023-10-24 8:33 ` Daniel P. Berrangé
2023-10-24 19:06 ` Fabiano Rosas
2023-10-25 9:07 ` Daniel P. Berrangé
2023-10-25 14:48 ` Fabiano Rosas
2023-10-25 15:22 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 29/29] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-10-25 9:25 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qd3dnqs.fsf@suse.de \
--to=farosas@suse.de \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=cfontana@suse.de \
--cc=leobras@redhat.com \
--cc=nborisov@suse.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).