Re: [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the allocation map

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Peter Lieven <pl@kamp.de>, qemu-block@nongnu.org
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the allocation map
Date: Tue, 24 May 2016 17:10:23 -0600	[thread overview]
Message-ID: <5744DF5F.3070600@redhat.com> (raw)
In-Reply-To: <1464079240-24362-1-git-send-email-pl@kamp.de>

[-- Attachment #1: Type: text/plain, Size: 6200 bytes --]

On 05/24/2016 02:40 AM, Peter Lieven wrote:
> until now the allocation map was used only as a hint if a cluster
> is allocated or not. If a block was not allocated (or Qemu had
> no info about the allocation status) a get_block_status call was
> issued to check the allocation status and possibly avoid
> a subsequent read of unallocated sectors. If a block known to be
> allocated the get_block_status call was omitted. In the other case
> a get_block_status call was issued before every read to avoid
> the necessity for a consistent allocation map. To avoid the
> potential overhead of calling get_block_status for each and
> every read request this took only place for the bigger requests.
> 
> This patch enhances this mechanism to cache the allocation
> status and avoid calling get_block_status for blocks where
> the allocation status has been queried before. This allows
> for bypassing the read request even for smaller requests and
> additionally omits calling get_block_status for known to be
> unallocated blocks.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---

> +static int iscsi_allocmap_init(IscsiLun *iscsilun, int open_flags)
>  {
> -    if (iscsilun->allocationmap == NULL) {
> -        return;
> +    iscsi_allocmap_free(iscsilun);
> +
> +    iscsilun->allocmap_size =
> +        DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
> +                     iscsilun->cluster_sectors);
> +

Computes: ceiling( (num_blocks * block_size / 512) / (cluster_size /
512) ); aka number of clusters.  But we don't independently track the
cluster size, so I don't see any simpler way of writing it, even if we
could be more efficient by not having to first scale through qemu's
sector size.

> +    iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
> +    if (!iscsilun->allocmap) {
> +        return -ENOMEM;
> +    }
> +
> +    if (open_flags & BDRV_O_NOCACHE) {
> +        /* in case that cache.direct = on all allocmap entries are
> +         * treated as invalid to force a relookup of the block
> +         * status on every read request */
> +        return 0;

Can we cache that we are opened BDRV_O_NOCACHE, so that we don't even
have to bother allocating allocmap when we know we are never changing
its bits?  In other words, can you swap this to be before the
bitmap_try_new()?

> +    }
> +
> +    iscsilun->allocmap_valid = bitmap_try_new(iscsilun->allocmap_size);
> +    if (!iscsilun->allocmap_valid) {
> +        /* if we are under memory pressure free the allocmap as well */
> +        iscsi_allocmap_free(iscsilun);
> +        return -ENOMEM;
>      }
> -    bitmap_set(iscsilun->allocationmap,
> -               sector_num / iscsilun->cluster_sectors,
> -               DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
> +
> +    return 0;
>  }
>  
> -static void iscsi_allocationmap_clear(IscsiLun *iscsilun, int64_t sector_num,
> -                                      int nb_sectors)
> +static void
> +iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
> +                      int nb_sectors, bool allocated, bool valid)
>  {
>      int64_t cluster_num, nb_clusters;
> -    if (iscsilun->allocationmap == NULL) {
> +
> +    if (iscsilun->allocmap == NULL) {
>          return;
>      }

Here, you are short-circuiting when there is no allocmap, but shouldn't
you also short-circuit if you are BDRV_O_NOCACHE?

Should you assert(!(allocated && !valid)) [or by deMorgan's,
assert(!allocated || valid)], to make sure we are only tracking 3 states
rather than 4?

>      cluster_num = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
>      nb_clusters = (sector_num + nb_sectors) / iscsilun->cluster_sectors
>                    - cluster_num;
> -    if (nb_clusters > 0) {
> -        bitmap_clear(iscsilun->allocationmap, cluster_num, nb_clusters);
> +    if (allocated) {
> +        bitmap_set(iscsilun->allocmap,
> +                   sector_num / iscsilun->cluster_sectors,
> +                   DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));

This says that if we have a sub-cluster request, we can round out to
cluster alignment (if our covered part of the cluster is allocated,
presumably the rest is allocated as well)...

> +    } else if (nb_clusters > 0) {
> +        bitmap_clear(iscsilun->allocmap, cluster_num, nb_clusters);

...while this says if we are marking something clear, we have to round
in (while we can trim the aligned middle, we should not mark any
unaligned head or tail as trimmed, in case they remain allocated due to
the unvisited sectors).  Still, it may be worth comments for why your
rounding between the two calls is different.

> +    }
> +
> +    if (iscsilun->allocmap_valid == NULL) {
> +        return;
> +    }

When do we ever have allocmap set but allocmap_valid clear?  Isn't it
easier to assume that both maps are present (we are tracking status) or
absent (we are BDRV_O_NOCACHE)?

> +    if (valid) {
> +        if (nb_clusters > 0) {
> +            bitmap_set(iscsilun->allocmap_valid, cluster_num, nb_clusters);
> +        }
> +    } else {
> +        bitmap_clear(iscsilun->allocmap_valid,
> +                     sector_num / iscsilun->cluster_sectors,
> +                     DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));

Here, the rounding is opposite - you round in for valid (leaving the
head and tail in an unknown state because you didn't visit the whole
cluster), and round out for clear (you don't bother checking whether
your action will change the state of the head and tail, so it's easier
to just mark that cluster as needing a refresh).  Again, comments might
help.  And if you are relying on allocmap_valid before consulting
allocmap, I don't know if it makes any sense to mark any bits of the
allocmap above if those bits will just be masked out by allocmap_valid,
so maybe the earlier code could unconditionally round in, instead of
rounding out for allocated and in for unallocated.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

next prev parent reply	other threads:[~2016-05-24 23:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24  8:40 [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the allocation map Peter Lieven
2016-05-24 23:10 ` Eric Blake [this message]
2016-05-30  6:33   ` Peter Lieven
2016-06-10  9:13     ` Peter Lieven
2016-06-13  9:53     ` Paolo Bonzini
2016-06-12  5:04 ` Fam Zheng
2016-06-13  9:45 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5744DF5F.3070600@redhat.com \
    --to=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).