qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: "Peter Xu" <peterx@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com,
	Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Claudio Fontana <cfontana@suse.de>,
	Nikolay Borisov <nborisov@suse.com>
Subject: Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore
Date: Mon, 06 Nov 2023 10:18:03 -0300	[thread overview]
Message-ID: <871qd3dnqs.fsf@suse.de> (raw)
In-Reply-To: <ZUJn/5GuRXcudsiF@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Wed, Nov 01, 2023 at 02:28:24PM +0000, Daniel P. Berrangé wrote:
>> On Wed, Nov 01, 2023 at 10:21:07AM -0400, Peter Xu wrote:
>> > On Wed, Nov 01, 2023 at 09:26:46AM +0000, Daniel P. Berrangé wrote:
>> > > On Tue, Oct 31, 2023 at 03:03:50PM -0400, Peter Xu wrote:
>> > > > On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
>> > > > > >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>> > > > > >> +{
>> > > > > >> +    g_autofree unsigned long *bitmap = NULL;
>> > > > > >> +    struct FixedRamHeader header;
>> > > > > >> +    size_t bitmap_size;
>> > > > > >> +    long num_pages;
>> > > > > >> +    int ret = 0;
>> > > > > >> +
>> > > > > >> +    ret = fixed_ram_read_header(f, &header);
>> > > > > >> +    if (ret < 0) {
>> > > > > >> +        error_report("Error reading fixed-ram header");
>> > > > > >> +        return -EINVAL;
>> > > > > >> +    }
>> > > > > >> +
>> > > > > >> +    block->pages_offset = header.pages_offset;
>> > > > > >
>> > > > > > Do you think it is worth sanity checking that 'pages_offset' is aligned
>> > > > > > in some way.
>> > > > > >
>> > > > > > It is nice that we have flexibility to change the alignment in future
>> > > > > > if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
>> > > > > > check htere. Perhaps we could at least sanity check for alignment at
>> > > > > > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
>> > > > > >
>> > > > > 
>> > > > > I don't see why not. I'll add it.
>> > > > 
>> > > > Is there any explanation on why that 1MB offset, and how the number is
>> > > > chosen?  Thanks,
>> > > 
>> > > The fixed-ram format is anticipating the use of O_DIRECT.
>> > > 
>> > > With O_DIRECT both the buffers in memory, and the file handle offset
>> > > have alignment requirements. The buffer alignments are usually page
>> > > sized, and QEMU RAM blocks will trivially satisfy those.
>> > > 
>> > > The file handle offset alignment varies per filesystem. While you can
>> > > query the alignment for the FS holding the file with statx(), that is
>> > > not appropriate todo. If a user saves/restores QEMU state to file, we
>> > > must assume there is a chance the user will copy the saved state to a
>> > > different filesystem.
>> > > 
>> > > IOW, we want alignment to satisfy the likely worst case.
>> > > 
>> > > Picking 1 MB is a nice round number that is large enough that it is
>> > > almost certainly going to satisfy any filesystem alignment. In fact
>> > > it is likely massive overkill. None the less 1 MB is also still tiny
>> > 
>> > Is that calculated by something like max of possible host (small) page
>> > sizes?  I've no idea what's it for all archs, the max small page size I'm
>> > aware of is 64K, but I don't know a lot archs.
>> 
>> It wasn't anything as precise as that. It is literally just "1MB" looks
>> large enough that we don't need to spend time to investigate per arch
>> page sizes.
>
> IMHO we need that precision on reasoning and document it, even if not on
> the exact number we prefer, which can be prone to change later.  Otherwise
> that value will be a pure magic soon after a few years or even less, it'll
> be more of a challenge later to figure things out.
>
>> 
>> Having said that I'm now having slight self-doubt wrt huge pages, though
>> I swear we investigated it last year when first discussing this feature.
>> The guest memory will of course already be suitably aligned, but I'm
>> wondering if the filesystem I/O places any offset alignment constraints
>> related to non-default page size.
>
> AFAIU direct IO is about pinning the IO buffers, playing the role of fs
> cache instead.  If my understanding is correct, huge pages shouldn't be a
> problem for such pinning, because it's legal to pin partial of a huge page.
>
> After the partial huge pages pinned, they should be treated as normal fs
> buffers when doing block IO.  And then the offset of file should, per my
> understanding, not relevant to what is the type of backend of that user
> buffer anymore that triggers read()/write().
>
> But maybe I missed something, if so that will need to be part of
> documentation of 1MB magic value, IMHO.  We may want to double check with
> that by doing fixed-ram migration on e.g. 1GB hugetlb memory-backend-file
> with 1MB file offset per-ramblock.

Does anyone have any indication that we need to relate the aligment to
the page size? All I find online points to device block size being the
limiting factor for filesystems. There's also raw_probe_alignment() at
file-posix.c which seems to check up to 4k and recommend to disable
O_DIRECT if an alignment is not found.

Note that we shouldn't have any problems changing the alignment we
choose since we have a pointer to the start of the aligned region which
goes along with the fixed-ram header. We could even do some probing like
the block layer does if we wanted.


  reply	other threads:[~2023-11-06 13:19 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23 20:35 [PATCH v2 00/29] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 01/29] tests/qtest: migration events Fabiano Rosas
2023-10-25  9:44   ` Thomas Huth
2023-10-25 10:14   ` Daniel P. Berrangé
2023-10-25 13:21   ` Fabiano Rosas
2023-10-25 13:33     ` Steven Sistare
2023-10-23 20:35 ` [PATCH v2 02/29] tests/qtest: Move QTestMigrationState to libqtest Fabiano Rosas
2023-10-25 10:17   ` Daniel P. Berrangé
2023-10-25 13:19     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 03/29] tests/qtest: Allow waiting for migration events Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 04/29] migration: Return the saved state from global_state_store Fabiano Rosas
2023-10-25 10:19   ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 05/29] migration: Introduce global_state_store_once Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 06/29] migration: Add auto-pause capability Fabiano Rosas
2023-10-24  5:25   ` Markus Armbruster
2023-10-24 18:12     ` Fabiano Rosas
2023-10-25  5:33       ` Markus Armbruster
2023-10-25  8:48   ` Daniel P. Berrangé
2023-10-25 13:57     ` Fabiano Rosas
2023-10-25 14:20       ` Daniel P. Berrangé
2023-10-25 14:58         ` Peter Xu
2023-10-25 15:25           ` Daniel P. Berrangé
2023-10-25 15:36             ` Peter Xu
2023-10-25 15:40               ` Daniel P. Berrangé
2023-10-25 17:20                 ` Peter Xu
2023-10-25 17:31                   ` Daniel P. Berrangé
2023-10-25 19:28                     ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 07/29] migration: Run "file:" migration with a stopped VM Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 08/29] tests/qtest: File migration auto-pause tests Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 10/29] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-10-24  8:18   ` Daniel P. Berrangé
2023-10-24 19:06     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 11/29] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 12/29] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-10-25 10:22   ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 13/29] migration: fixed-ram: Add URI compatibility check Fabiano Rosas
2023-10-25 10:27   ` Daniel P. Berrangé
2023-10-31 16:06   ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 14/29] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-10-24  5:33   ` Markus Armbruster
2023-10-24 18:35     ` Fabiano Rosas
2023-10-25  6:18       ` Markus Armbruster
2023-10-23 20:35 ` [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing migration Fabiano Rosas
2023-10-25  9:39   ` Daniel P. Berrangé
2023-10-25 14:03     ` Fabiano Rosas
2023-11-01 15:23     ` Peter Xu
2023-11-01 15:52       ` Daniel P. Berrangé
2023-11-01 16:24         ` Peter Xu
2023-11-01 16:37           ` Daniel P. Berrangé
2023-11-01 17:30             ` Peter Xu
2023-10-31 16:52   ` Peter Xu
2023-10-31 17:33     ` Fabiano Rosas
2023-10-31 17:59       ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-10-25  9:43   ` Daniel P. Berrangé
2023-10-25 14:07     ` Fabiano Rosas
2023-10-31 19:03       ` Peter Xu
2023-11-01  9:26         ` Daniel P. Berrangé
2023-11-01 14:21           ` Peter Xu
2023-11-01 14:28             ` Daniel P. Berrangé
2023-11-01 15:00               ` Peter Xu
2023-11-06 13:18                 ` Fabiano Rosas [this message]
2023-11-06 21:00                   ` Peter Xu
2023-11-07  9:02                     ` Daniel P. Berrangé
2023-10-31 19:09   ` Peter Xu
2023-10-31 20:00     ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 17/29] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 18/29] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 19/29] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-10-25  9:52   ` Daniel P. Berrangé
2023-10-25 14:12     ` Fabiano Rosas
2023-10-25 14:23       ` Daniel P. Berrangé
2023-10-25 15:00         ` Fabiano Rosas
2023-10-25 15:26           ` Daniel P. Berrangé
2023-10-31 20:11   ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 20/29] migration/multifd: Add incoming " Fabiano Rosas
2023-10-25 10:29   ` Daniel P. Berrangé
2023-10-25 14:18     ` Fabiano Rosas
2023-10-31 21:28   ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 21/29] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-10-31 22:10   ` Peter Xu
2023-10-31 23:18     ` Fabiano Rosas
2023-11-01 15:55       ` Peter Xu
2023-11-01 17:20         ` Fabiano Rosas
2023-11-01 17:35           ` Peter Xu
2023-11-01 18:14             ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 22/29] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-10-24  8:50   ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 23/29] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-11-01 14:29   ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 24/29] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2023-10-25  9:09   ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 25/29] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-10-25  9:23   ` Daniel P. Berrangé
2023-10-25 14:21     ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 26/29] migration/multifd: Support incoming " Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 27/29] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 28/29] migration: Add direct-io parameter Fabiano Rosas
2023-10-24  5:41   ` Markus Armbruster
2023-10-24 19:32     ` Fabiano Rosas
2023-10-25  6:23       ` Markus Armbruster
2023-10-25  8:44       ` Daniel P. Berrangé
2023-10-25 14:32         ` Fabiano Rosas
2023-10-25 14:43           ` Daniel P. Berrangé
2023-10-25 17:30             ` Fabiano Rosas
2023-10-25 17:45               ` Daniel P. Berrangé
2023-10-25 18:10                 ` Fabiano Rosas
2023-10-30 22:51             ` Fabiano Rosas
2023-10-31  9:03               ` Daniel P. Berrangé
2023-10-31 13:05                 ` Fabiano Rosas
2023-10-31 13:45                   ` Daniel P. Berrangé
2023-10-31 14:33                     ` Fabiano Rosas
2023-10-31 15:22                       ` Daniel P. Berrangé
2023-10-31 15:52                         ` Fabiano Rosas
2023-10-31 15:58                           ` Daniel P. Berrangé
2023-10-31 19:05                             ` Fabiano Rosas
2023-11-01  9:30                               ` Daniel P. Berrangé
2023-11-01 12:16                                 ` Fabiano Rosas
2023-11-01 12:23                                   ` Daniel P. Berrangé
2023-11-01 12:30                                     ` Fabiano Rosas
2023-10-24  8:33   ` Daniel P. Berrangé
2023-10-24 19:06     ` Fabiano Rosas
2023-10-25  9:07   ` Daniel P. Berrangé
2023-10-25 14:48     ` Fabiano Rosas
2023-10-25 15:22       ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 29/29] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-10-25  9:25   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qd3dnqs.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=leobras@redhat.com \
    --cc=nborisov@suse.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).