From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36955)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1b7GlH-0003yQ-1N
	for qemu-devel@nongnu.org; Mon, 30 May 2016 02:33:16 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1b7GlB-0003Ko-1H
	for qemu-devel@nongnu.org; Mon, 30 May 2016 02:33:13 -0400
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:50079 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1b7GlA-0003KE-NV
	for qemu-devel@nongnu.org; Mon, 30 May 2016 02:33:08 -0400
References: <1464079240-24362-1-git-send-email-pl@kamp.de>
	<5744DF5F.3070600@redhat.com>
From: Peter Lieven <pl@kamp.de>
Message-ID: <574BDEA0.2040708@kamp.de>
Date: Mon, 30 May 2016 08:33:04 +0200
MIME-Version: 1.0
In-Reply-To: <5744DF5F.3070600@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the
 allocation map
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, qemu-block@nongnu.org
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org

Am 25.05.2016 um 01:10 schrieb Eric Blake:
> On 05/24/2016 02:40 AM, Peter Lieven wrote:
>> until now the allocation map was used only as a hint if a cluster
>> is allocated or not. If a block was not allocated (or Qemu had
>> no info about the allocation status) a get_block_status call was
>> issued to check the allocation status and possibly avoid
>> a subsequent read of unallocated sectors. If a block known to be
>> allocated the get_block_status call was omitted. In the other case
>> a get_block_status call was issued before every read to avoid
>> the necessity for a consistent allocation map. To avoid the
>> potential overhead of calling get_block_status for each and
>> every read request this took only place for the bigger requests.
>>
>> This patch enhances this mechanism to cache the allocation
>> status and avoid calling get_block_status for blocks where
>> the allocation status has been queried before. This allows
>> for bypassing the read request even for smaller requests and
>> additionally omits calling get_block_status for known to be
>> unallocated blocks.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> +static int iscsi_allocmap_init(IscsiLun *iscsilun, int open_flags)
>>   {
>> -    if (iscsilun->allocationmap == NULL) {
>> -        return;
>> +    iscsi_allocmap_free(iscsilun);
>> +
>> +    iscsilun->allocmap_size =
>> +        DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
>> +                     iscsilun->cluster_sectors);
>> +
> Computes: ceiling( (num_blocks * block_size / 512) / (cluster_size /
> 512) ); aka number of clusters.  But we don't independently track the
> cluster size, so I don't see any simpler way of writing it, even if we
> could be more efficient by not having to first scale through qemu's
> sector size.
>
>> +    iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
>> +    if (!iscsilun->allocmap) {
>> +        return -ENOMEM;
>> +    }
>> +
>> +    if (open_flags & BDRV_O_NOCACHE) {
>> +        /* in case that cache.direct = on all allocmap entries are
>> +         * treated as invalid to force a relookup of the block
>> +         * status on every read request */
>> +        return 0;
> Can we cache that we are opened BDRV_O_NOCACHE, so that we don't even
> have to bother allocating allocmap when we know we are never changing
> its bits?  In other words, can you swap this to be before the
> bitmap_try_new()?

The idea of the allocmap in cache.direct = on mode is that we can
still speed up block jobs by skipping large unallocated areas. In this case
the allocmap has only a hint character. If we don't know the status
we issue a get_block_status request and verify the status. If its unallocated
we return zeroes. If we new through an earlier get block status request
that the area is allocated we can skip the useless get_block_status request.
This is the old behaviour without this patch.

>
>> +    }
>> +
>> +    iscsilun->allocmap_valid = bitmap_try_new(iscsilun->allocmap_size);
>> +    if (!iscsilun->allocmap_valid) {
>> +        /* if we are under memory pressure free the allocmap as well */
>> +        iscsi_allocmap_free(iscsilun);
>> +        return -ENOMEM;
>>       }
>> -    bitmap_set(iscsilun->allocationmap,
>> -               sector_num / iscsilun->cluster_sectors,
>> -               DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
>> +
>> +    return 0;
>>   }
>>   
>> -static void iscsi_allocationmap_clear(IscsiLun *iscsilun, int64_t sector_num,
>> -                                      int nb_sectors)
>> +static void
>> +iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
>> +                      int nb_sectors, bool allocated, bool valid)
>>   {
>>       int64_t cluster_num, nb_clusters;
>> -    if (iscsilun->allocationmap == NULL) {
>> +
>> +    if (iscsilun->allocmap == NULL) {
>>           return;
>>       }
> Here, you are short-circuiting when there is no allocmap, but shouldn't
> you also short-circuit if you are BDRV_O_NOCACHE?
>
> Should you assert(!(allocated && !valid)) [or by deMorgan's,
> assert(!allocated || valid)], to make sure we are only tracking 3 states
> rather than 4?

sure, we thats a good enhancement altough allocated and invalid doesn't
hurt.

>
>>       cluster_num = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
>>       nb_clusters = (sector_num + nb_sectors) / iscsilun->cluster_sectors
>>                     - cluster_num;
>> -    if (nb_clusters > 0) {
>> -        bitmap_clear(iscsilun->allocationmap, cluster_num, nb_clusters);
>> +    if (allocated) {
>> +        bitmap_set(iscsilun->allocmap,
>> +                   sector_num / iscsilun->cluster_sectors,
>> +                   DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
> This says that if we have a sub-cluster request, we can round out to
> cluster alignment (if our covered part of the cluster is allocated,
> presumably the rest is allocated as well)...
>
>> +    } else if (nb_clusters > 0) {
>> +        bitmap_clear(iscsilun->allocmap, cluster_num, nb_clusters);
> ...while this says if we are marking something clear, we have to round
> in (while we can trim the aligned middle, we should not mark any
> unaligned head or tail as trimmed, in case they remain allocated due to
> the unvisited sectors).  Still, it may be worth comments for why your
> rounding between the two calls is different.

Okay

>
>> +    }
>> +
>> +    if (iscsilun->allocmap_valid == NULL) {
>> +        return;
>> +    }
> When do we ever have allocmap set but allocmap_valid clear?  Isn't it
> easier to assume that both maps are present (we are tracking status) or
> absent (we are BDRV_O_NOCACHE)?

Thats the current behaviour. See above.

Peter