From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55275)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bBIVb-0001Oa-Ku
	for qemu-devel@nongnu.org; Fri, 10 Jun 2016 05:13:44 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1bBIVW-0004IF-IX
	for qemu-devel@nongnu.org; Fri, 10 Jun 2016 05:13:42 -0400
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:46689 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bBIVW-0004I6-86
	for qemu-devel@nongnu.org; Fri, 10 Jun 2016 05:13:38 -0400
Message-ID: <575A84BF.6030108@kamp.de>
Date: Fri, 10 Jun 2016 11:13:35 +0200
From: Peter Lieven <pl@kamp.de>
MIME-Version: 1.0
References: <1464079240-24362-1-git-send-email-pl@kamp.de>
	<5744DF5F.3070600@redhat.com> <574BDEA0.2040708@kamp.de>
In-Reply-To: <574BDEA0.2040708@kamp.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH V3] block/iscsi: allow caching of the
 allocation map
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, qemu-block@nongnu.org
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org

Am 30.05.2016 um 08:33 schrieb Peter Lieven:
> Am 25.05.2016 um 01:10 schrieb Eric Blake:
>> On 05/24/2016 02:40 AM, Peter Lieven wrote:
>>> until now the allocation map was used only as a hint if a cluster
>>> is allocated or not. If a block was not allocated (or Qemu had
>>> no info about the allocation status) a get_block_status call was
>>> issued to check the allocation status and possibly avoid
>>> a subsequent read of unallocated sectors. If a block known to be
>>> allocated the get_block_status call was omitted. In the other case
>>> a get_block_status call was issued before every read to avoid
>>> the necessity for a consistent allocation map. To avoid the
>>> potential overhead of calling get_block_status for each and
>>> every read request this took only place for the bigger requests.
>>>
>>> This patch enhances this mechanism to cache the allocation
>>> status and avoid calling get_block_status for blocks where
>>> the allocation status has been queried before. This allows
>>> for bypassing the read request even for smaller requests and
>>> additionally omits calling get_block_status for known to be
>>> unallocated blocks.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>> +static int iscsi_allocmap_init(IscsiLun *iscsilun, int open_flags)
>>>   {
>>> -    if (iscsilun->allocationmap == NULL) {
>>> -        return;
>>> +    iscsi_allocmap_free(iscsilun);
>>> +
>>> +    iscsilun->allocmap_size =
>>> +        DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
>>> +                     iscsilun->cluster_sectors);
>>> +
>> Computes: ceiling( (num_blocks * block_size / 512) / (cluster_size /
>> 512) ); aka number of clusters.  But we don't independently track the
>> cluster size, so I don't see any simpler way of writing it, even if we
>> could be more efficient by not having to first scale through qemu's
>> sector size.
>>
>>> +    iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
>>> +    if (!iscsilun->allocmap) {
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    if (open_flags & BDRV_O_NOCACHE) {
>>> +        /* in case that cache.direct = on all allocmap entries are
>>> +         * treated as invalid to force a relookup of the block
>>> +         * status on every read request */
>>> +        return 0;
>> Can we cache that we are opened BDRV_O_NOCACHE, so that we don't even
>> have to bother allocating allocmap when we know we are never changing
>> its bits?  In other words, can you swap this to be before the
>> bitmap_try_new()?
>
> The idea of the allocmap in cache.direct = on mode is that we can
> still speed up block jobs by skipping large unallocated areas. In this case
> the allocmap has only a hint character. If we don't know the status
> we issue a get_block_status request and verify the status. If its unallocated
> we return zeroes. If we new through an earlier get block status request
> that the area is allocated we can skip the useless get_block_status request.
> This is the old behaviour without this patch.
>
>>
>>> +    }
>>> +
>>> +    iscsilun->allocmap_valid = bitmap_try_new(iscsilun->allocmap_size);
>>> +    if (!iscsilun->allocmap_valid) {
>>> +        /* if we are under memory pressure free the allocmap as well */
>>> +        iscsi_allocmap_free(iscsilun);
>>> +        return -ENOMEM;
>>>       }
>>> -    bitmap_set(iscsilun->allocationmap,
>>> -               sector_num / iscsilun->cluster_sectors,
>>> -               DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
>>> +
>>> +    return 0;
>>>   }
>>>   -static void iscsi_allocationmap_clear(IscsiLun *iscsilun, int64_t sector_num,
>>> -                                      int nb_sectors)
>>> +static void
>>> +iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
>>> +                      int nb_sectors, bool allocated, bool valid)
>>>   {
>>>       int64_t cluster_num, nb_clusters;
>>> -    if (iscsilun->allocationmap == NULL) {
>>> +
>>> +    if (iscsilun->allocmap == NULL) {
>>>           return;
>>>       }
>> Here, you are short-circuiting when there is no allocmap, but shouldn't
>> you also short-circuit if you are BDRV_O_NOCACHE?
>>
>> Should you assert(!(allocated && !valid)) [or by deMorgan's,
>> assert(!allocated || valid)], to make sure we are only tracking 3 states
>> rather than 4?
>
> sure, we thats a good enhancement altough allocated and invalid doesn't
> hurt.
>
>>
>>>       cluster_num = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
>>>       nb_clusters = (sector_num + nb_sectors) / iscsilun->cluster_sectors
>>>                     - cluster_num;
>>> -    if (nb_clusters > 0) {
>>> -        bitmap_clear(iscsilun->allocationmap, cluster_num, nb_clusters);
>>> +    if (allocated) {
>>> +        bitmap_set(iscsilun->allocmap,
>>> +                   sector_num / iscsilun->cluster_sectors,
>>> +                   DIV_ROUND_UP(nb_sectors, iscsilun->cluster_sectors));
>> This says that if we have a sub-cluster request, we can round out to
>> cluster alignment (if our covered part of the cluster is allocated,
>> presumably the rest is allocated as well)...
>>
>>> +    } else if (nb_clusters > 0) {
>>> +        bitmap_clear(iscsilun->allocmap, cluster_num, nb_clusters);
>> ...while this says if we are marking something clear, we have to round
>> in (while we can trim the aligned middle, we should not mark any
>> unaligned head or tail as trimmed, in case they remain allocated due to
>> the unvisited sectors).  Still, it may be worth comments for why your
>> rounding between the two calls is different.
>
> Okay
>
>>
>>> +    }
>>> +
>>> +    if (iscsilun->allocmap_valid == NULL) {
>>> +        return;
>>> +    }
>> When do we ever have allocmap set but allocmap_valid clear?  Isn't it
>> easier to assume that both maps are present (we are tracking status) or
>> absent (we are BDRV_O_NOCACHE)?
>
> Thats the current behaviour. See above.

Ping.

Peter