From: Maxim Levitsky <mlevitsk@redhat.com>
To: Eric Blake <eblake@redhat.com>, Peter Lieven <pl@kamp.de>,
qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Jan Kara <jack@suse.cz>,
qemu-block@nongnu.org,
"Darrick J . Wong" <darrick.wong@oracle.com>,
Max Reitz <mreitz@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 2/2] qemu-img: align next status sector on destination alignment.
Date: Thu, 18 Mar 2021 11:57:07 +0200 [thread overview]
Message-ID: <a34aa05dad2117ff4f067ba6bfa811823e5b35c0.camel@redhat.com> (raw)
In-Reply-To: <621d739815b25f8b6bc1b8cdd266a89a20bdd97a.camel@redhat.com>
On Thu, 2020-11-12 at 17:04 +0200, Maxim Levitsky wrote:
> On Thu, 2020-11-12 at 07:45 -0600, Eric Blake wrote:
> > On 11/12/20 6:40 AM, Peter Lieven wrote:
> >
> > > > /*
> > > > - * Avoid that s->sector_next_status becomes unaligned to the source
> > > > - * request alignment and/or cluster size to avoid unnecessary read
> > > > - * cycles.
> > > > + * Avoid that s->sector_next_status becomes unaligned to the
> > > > + * source/destination request alignment and/or cluster size to avoid
> > > > + * unnecessary read/write cycles.
> > > > */
> > > > - tail = (sector_num - src_cur_offset + n) % s->src_alignment[src_cur];
> > > > + alignment = MAX(s->src_alignment[src_cur], s->alignment);
> > > > + assert(is_power_of_2(alignment));
> > > > +
> > > > + tail = (sector_num - src_cur_offset + n) % alignment;
> > > > if (n > tail) {
> > > > n -= tail;
> > > > }
> > >
> > > I was also considering including the s->alignment when adding this chance. However, you need the least common multiple of both alignments, not the maximum, otherwise
> > >
> > > you might get misaligned to either source or destination.
> > >
> > >
> > > Why exactly do you need the power of two requirement?
> >
> > The power of two requirement ensures that you h ave the least common
> > multiple of both alignments ;)
> -
> Yes, and in addition to that both alignments are already power of two because we align them
> to it. The assert I added is just in case.
>
> This is how we calculate the destination alignment:
>
> s.alignment = MAX(pow2floor(s.min_sparse),
> DIV_ROUND_UP(out_bs->bl.request_alignment,
> BDRV_SECTOR_SIZE));
>
>
> This is how we calculate the source alignments (it is possible to have several source files)
>
> s.src_alignment[bs_i] = DIV_ROUND_UP(src_bs->bl.request_alignment,
> BDRV_SECTOR_SIZE);
> if (!bdrv_get_info(src_bs, &bdi)) {
> s.src_alignment[bs_i] = MAX(s.src_alignment[bs_i], bdi.cluster_size / BDRV_SECTOR_SIZE);
> }
>
>
> The bl.request_alignment comment mentions that it must be power of 2,
> and cluster sizes are also very likely to be power of 2 in all drivers
> we support. An assert for s.src_alignment won't hurt though.
>
>
> Note though that we don't really read the discard alignment.
> We have max_pdiscard, and pdiscard_alignment, but max_pdiscard
> is only used to split 'too large' discard requests, and pdiscard_alignment
> is advisory and used only in a couple of places.
> Neither are set by file-posix.
>
> We do have request_alignment, which file-posix tries to align to the logical block
> size which is still often 512 for backward compatibility on many devices (even nvme)
>
> Now both source and destination alignments in qemu-img convert are based on request_aligment
> and on min_sparse (which is by default 4K, controlled by qemu-img -S parameter
> (which otherwise controls the minimum number of blocks to be zero, to consider
> discarding them in convert)
>
>
> This means that there is no guarantee to have 4K alignment, and besides,
> with sufficiently fragmented source file, the bdrv_block_status_above
> can return arbitrary short and unaligned extents which can't be aligned,
> thus as I said this patch alone can't guarantee that we won't have
> write and discard sharing the same page.
>
> Another thing that can be discussed is why is_allocated_sectors was patched
> to convert short discards to writes. Probably because underlying hardware
> ignores them or something? In theory we should detect this and fail
> back to regular zeroing in this case.
> Again though, while in case of converting an empty file, this is the only
> source of writes, and eliminating it, also 'fixes' this case, with sufficiently
> fragmented source file we can even without this code get a write and discard
> sharing a page.
>
>
> Best regards,
> Maxim Levitsky
>
> > However, there ARE devices in practice that have advertised a
> > non-power-of-2 discard granularity, such as 15MiB (see commits 430b26a8,
> > 63188c24). Which means you probably don't want to assert power-of-2,
> > and in turn need to worry about least common multiple.
> >
Any update on this patch?
Best regards,
Maxim Levitsky
next prev parent reply other threads:[~2021-03-18 9:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 15:39 [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-11 15:39 ` [PATCH 1/2] file-posix: allow -EBUSY errors during write zeros on raw block devices Maxim Levitsky
2020-11-16 14:48 ` Kevin Wolf
2021-01-07 12:44 ` Maxim Levitsky
2020-11-11 15:39 ` [PATCH 2/2] qemu-img: align next status sector on destination alignment Maxim Levitsky
2020-11-12 12:40 ` Peter Lieven
2020-11-12 13:45 ` Eric Blake
2020-11-12 15:04 ` Maxim Levitsky
2021-03-18 9:57 ` Maxim Levitsky [this message]
2020-11-11 15:44 ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-12 11:19 ` Jan Kara
2020-11-12 12:00 ` Jan Kara
2020-11-12 22:08 ` Darrick J. Wong
2020-12-07 17:23 ` Maxim Levitsky
2020-11-12 15:38 ` Maxim Levitsky
2020-11-13 10:07 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a34aa05dad2117ff4f067ba6bfa811823e5b35c0.camel@redhat.com \
--to=mlevitsk@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=eblake@redhat.com \
--cc=jack@suse.cz \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).