Re: [PATCH 2/2] qemu-img: align next status sector on destination alignment.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Maxim Levitsky <mlevitsk@redhat.com>
To: Eric Blake <eblake@redhat.com>, Peter Lieven <pl@kamp.de>,
	 qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Jan Kara <jack@suse.cz>,
	qemu-block@nongnu.org,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Max Reitz <mreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 2/2] qemu-img: align next status sector on destination alignment.
Date: Thu, 18 Mar 2021 11:57:07 +0200	[thread overview]
Message-ID: <a34aa05dad2117ff4f067ba6bfa811823e5b35c0.camel@redhat.com> (raw)
In-Reply-To: <621d739815b25f8b6bc1b8cdd266a89a20bdd97a.camel@redhat.com>

On Thu, 2020-11-12 at 17:04 +0200, Maxim Levitsky wrote:
> On Thu, 2020-11-12 at 07:45 -0600, Eric Blake wrote:
> > On 11/12/20 6:40 AM, Peter Lieven wrote:
> > 
> > > >          /*
> > > > -         * Avoid that s->sector_next_status becomes unaligned to the source
> > > > -         * request alignment and/or cluster size to avoid unnecessary read
> > > > -         * cycles.
> > > > +         * Avoid that s->sector_next_status becomes unaligned to the
> > > > +         * source/destination request alignment and/or cluster size to avoid
> > > > +         * unnecessary read/write cycles.
> > > >           */
> > > > -        tail = (sector_num - src_cur_offset + n) % s->src_alignment[src_cur];
> > > > +        alignment = MAX(s->src_alignment[src_cur], s->alignment);
> > > > +        assert(is_power_of_2(alignment));
> > > > +
> > > > +        tail = (sector_num - src_cur_offset + n) % alignment;
> > > >          if (n > tail) {
> > > >              n -= tail;
> > > >          }
> > > 
> > > I was also considering including the s->alignment when adding this chance. However, you need the least common multiple of both alignments, not the maximum, otherwise
> > > 
> > > you might get misaligned to either source or destination.
> > > 
> > > 
> > > Why exactly do you need the power of two requirement?
> > 
> > The power of two requirement ensures that you h ave the least common
> > multiple of both alignments ;)
> -
> Yes, and in addition to that both alignments are already power of two because we align them
> to it. The assert I added is just in case.
> 
> This is how we calculate the destination alignment:
>  
> s.alignment = MAX(pow2floor(s.min_sparse),
>                       DIV_ROUND_UP(out_bs->bl.request_alignment,
>                                    BDRV_SECTOR_SIZE));
>  
>  
> This is how we calculate the source alignments (it is possible to have several source files)
>  
> s.src_alignment[bs_i] = DIV_ROUND_UP(src_bs->bl.request_alignment,
>                                              BDRV_SECTOR_SIZE);
> if (!bdrv_get_info(src_bs, &bdi)) {
>     s.src_alignment[bs_i] = MAX(s.src_alignment[bs_i], bdi.cluster_size / BDRV_SECTOR_SIZE);
> }
> 
> 
> The bl.request_alignment comment mentions that it must be power of 2,
> and cluster sizes are also very likely to be power of 2 in all drivers
> we support. An assert for s.src_alignment won't hurt though.
> 
>  
> Note though that we don't really read the discard alignment.
> We have max_pdiscard, and pdiscard_alignment, but max_pdiscard
> is only used to split 'too large' discard requests, and pdiscard_alignment
> is advisory and used only in a couple of places.
> Neither are set by file-posix.
>  
> We do have request_alignment, which file-posix tries to align to the logical block
> size which is still often 512 for backward compatibility on many devices (even nvme)
>  
> Now both source and destination alignments in qemu-img convert are based on request_aligment
> and on min_sparse (which is by default 4K, controlled by qemu-img -S parameter
> (which otherwise controls the minimum number of blocks to be zero, to consider
> discarding them in convert)
>  
> 
> This means that there is no guarantee to have 4K alignment, and besides,
> with sufficiently fragmented source file, the bdrv_block_status_above
> can return arbitrary short and unaligned extents which can't be aligned, 
> thus as I said this patch alone can't guarantee that we won't have 
> write and discard sharing the same page.
>  
> Another thing that can be discussed is why is_allocated_sectors was patched
> to convert short discards to writes. Probably because underlying hardware
> ignores them or something? In theory we should detect this and fail
> back to regular zeroing in this case.
> Again though, while in case of converting an empty file, this is the only
> source of writes, and eliminating it, also 'fixes' this case, with sufficiently
> fragmented source file we can even without this code get a write and discard
> sharing a page.
> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> > However, there ARE devices in practice that have advertised a
> > non-power-of-2 discard granularity, such as 15MiB (see commits 430b26a8,
> > 63188c24).  Which means you probably don't want to assert power-of-2,
> > and in turn need to worry about least common multiple.
> > 

Any update on this patch?

Best regards,
	Maxim Levitsky

next prev parent reply	other threads:[~2021-03-18  9:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 15:39 [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-11 15:39 ` [PATCH 1/2] file-posix: allow -EBUSY errors during write zeros on raw block devices Maxim Levitsky
2020-11-16 14:48   ` Kevin Wolf
2021-01-07 12:44     ` Maxim Levitsky
2020-11-11 15:39 ` [PATCH 2/2] qemu-img: align next status sector on destination alignment Maxim Levitsky
2020-11-12 12:40   ` Peter Lieven
2020-11-12 13:45     ` Eric Blake
2020-11-12 15:04       ` Maxim Levitsky
2021-03-18  9:57         ` Maxim Levitsky [this message]
2020-11-11 15:44 ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-12 11:19   ` Jan Kara
2020-11-12 12:00     ` Jan Kara
2020-11-12 22:08       ` Darrick J. Wong
2020-12-07 17:23         ` Maxim Levitsky
2020-11-12 15:38     ` Maxim Levitsky
2020-11-13 10:07       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a34aa05dad2117ff4f067ba6bfa811823e5b35c0.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=eblake@redhat.com \
    --cc=jack@suse.cz \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).