From: Peter Lieven <pl@kamp.de>
To: Eric Blake <eblake@redhat.com>, qemu-block@nongnu.org
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the allocation map
Date: Fri, 10 Jun 2016 11:13:35 +0200 [thread overview]
Message-ID: <575A84BF.6030108@kamp.de> (raw)
In-Reply-To: <574BDEA0.2040708@kamp.de>
Am 30.05.2016 um 08:33 schrieb Peter Lieven:
> Am 25.05.2016 um 01:10 schrieb Eric Blake:
>> On 05/24/2016 02:40 AM, Peter Lieven wrote:
>>> until now the allocation map was used only as a hint if a cluster
>>> is allocated or not. If a block was not allocated (or Qemu had
>>> no info about the allocation status) a get_block_status call was
>>> issued to check the allocation status and possibly avoid
>>> a subsequent read of unallocated sectors. If a block known to be
>>> allocated the get_block_status call was omitted. In the other case
>>> a get_block_status call was issued before every read to avoid
>>> the necessity for a consistent allocation map. To avoid the
>>> potential overhead of calling get_block_status for each and
>>> every read request this took only place for the bigger requests.
>>>
>>> This patch enhances this mechanism to cache the allocation
>>> status and avoid calling get_block_status for blocks where
>>> the allocation status has been queried before. This allows
>>> for bypassing the read request even for smaller requests and
>>> additionally omits calling get_block_status for known to be
>>> unallocated blocks.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>> +static int iscsi_allocmap_init(IscsiLun *iscsilun, int open_flags)
>>> {
>>> - if (iscsilun->allocationmap == NULL) {
>>> - return;
>>> + iscsi_allocmap_free(iscsilun);
>>> +
>>> + iscsilun->allocmap_size =
>>> + DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
>>> + iscsilun->cluster_sectors);
>>> +
>> Computes: ceiling( (num_blocks * block_size / 512) / (cluster_size /
>> 512) ); aka number of clusters. But we don't independently track the
>> cluster size, so I don't see any simpler way of writing it, even if we
>> could be more efficient by not having to first scale through qemu's
>> sector size.
>>
>>> + iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
>>> + if (!iscsilun->allocmap) {
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + if (open_flags & BDRV_O_NOCACHE) {
>>> + /* in case that cache.direct = on all allocmap entries are
>>> + * treated as invalid to force a relookup of the block
>>> + * status on every read request */
>>> + return 0;
>> Can we cache that we are opened BDRV_O_NOCACHE, so that we don't even
>> have to bother allocating allocmap when we know we are never changing
>> its bits? In other words, can you swap this to be before the
>> bitmap_try_new()?
>
> The idea of the allocmap in cache.direct = on mode is that we can
> still speed up block jobs by skipping large unallocated areas. In this case
> the allocmap has only a hint character. If we don't know the status
> we issue a get_block_status request and verify the status. If its unallocated
> we return zeroes. If we new through an earlier get block status request
> that the area is allocated we can skip the useless get_block_status request.
> This is the old behaviour without this patch.
>
>>
>>> + }
>>> +
>>> + iscsilun->allocmap_valid = bitmap_try_new(iscsilun->allocmap_size);
>>> + if (!iscsilun->allocmap_valid) {
>>> + /* if we are under memory pressure free the allocmap as well */
>>> + iscsi_allocmap_free(iscsilun);
>>> + return -ENOMEM;
>>> }
>>> - bitmap_set(iscsilun->allocationmap,
>>> - sector_num / iscsilun->cluster_sectors,
>>> - DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
>>> +
>>> + return 0;
>>> }
>>> -static void iscsi_allocationmap_clear(IscsiLun *iscsilun, int64_t sector_num,
>>> - int nb_sectors)
>>> +static void
>>> +iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
>>> + int nb_sectors, bool allocated, bool valid)
>>> {
>>> int64_t cluster_num, nb_clusters;
>>> - if (iscsilun->allocationmap == NULL) {
>>> +
>>> + if (iscsilun->allocmap == NULL) {
>>> return;
>>> }
>> Here, you are short-circuiting when there is no allocmap, but shouldn't
>> you also short-circuit if you are BDRV_O_NOCACHE?
>>
>> Should you assert(!(allocated && !valid)) [or by deMorgan's,
>> assert(!allocated || valid)], to make sure we are only tracking 3 states
>> rather than 4?
>
> sure, we thats a good enhancement altough allocated and invalid doesn't
> hurt.
>
>>
>>> cluster_num = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
>>> nb_clusters = (sector_num + nb_sectors) / iscsilun->cluster_sectors
>>> - cluster_num;
>>> - if (nb_clusters > 0) {
>>> - bitmap_clear(iscsilun->allocationmap, cluster_num, nb_clusters);
>>> + if (allocated) {
>>> + bitmap_set(iscsilun->allocmap,
>>> + sector_num / iscsilun->cluster_sectors,
>>> + DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
>> This says that if we have a sub-cluster request, we can round out to
>> cluster alignment (if our covered part of the cluster is allocated,
>> presumably the rest is allocated as well)...
>>
>>> + } else if (nb_clusters > 0) {
>>> + bitmap_clear(iscsilun->allocmap, cluster_num, nb_clusters);
>> ...while this says if we are marking something clear, we have to round
>> in (while we can trim the aligned middle, we should not mark any
>> unaligned head or tail as trimmed, in case they remain allocated due to
>> the unvisited sectors). Still, it may be worth comments for why your
>> rounding between the two calls is different.
>
> Okay
>
>>
>>> + }
>>> +
>>> + if (iscsilun->allocmap_valid == NULL) {
>>> + return;
>>> + }
>> When do we ever have allocmap set but allocmap_valid clear? Isn't it
>> easier to assume that both maps are present (we are tracking status) or
>> absent (we are BDRV_O_NOCACHE)?
>
> Thats the current behaviour. See above.
Ping.
Peter
next prev parent reply other threads:[~2016-06-10 9:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-24 8:40 [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the allocation map Peter Lieven
2016-05-24 23:10 ` Eric Blake
2016-05-30 6:33 ` Peter Lieven
2016-06-10 9:13 ` Peter Lieven [this message]
2016-06-13 9:53 ` Paolo Bonzini
2016-06-12 5:04 ` Fam Zheng
2016-06-13 9:45 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=575A84BF.6030108@kamp.de \
--to=pl@kamp.de \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).