qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org,
	ehabkost@redhat.com, crosa@redhat.com
Subject: Re: [PATCH v3 4/6] util: implement seqcache
Date: Fri, 12 Mar 2021 16:13:36 +0100	[thread overview]
Message-ID: <f53fc06c-38df-f9fe-e927-b4f1b9bd5263@redhat.com> (raw)
In-Reply-To: <f0acd8b3-4f43-1a37-b08c-27f710fb3a60@virtuozzo.com>

On 12.03.21 15:37, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2021 16:41, Max Reitz wrote:
>> On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
>>> Implement cache for small sequential unaligned writes, so that they may
>>> be cached until we get a complete cluster and then write it.
>>>
>>> The cache is intended to be used for backup to qcow2 compressed target
>>> opened in O_DIRECT mode, but can be reused for any similar (even not
>>> block-layer related) task.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   include/qemu/seqcache.h |  42 +++++
>>>   util/seqcache.c         | 361 ++++++++++++++++++++++++++++++++++++++++
>>>   MAINTAINERS             |   6 +
>>>   util/meson.build        |   1 +
>>>   4 files changed, 410 insertions(+)
>>>   create mode 100644 include/qemu/seqcache.h
>>>   create mode 100644 util/seqcache.c
>>
>> Looks quite good to me, thanks.  Nice explanations, too. :)
>>
>> The only design question I have is whether there’s a reason you’re 
>> using a list again instead of a hash table.  I suppose we do need the 
>> list anyway because of the next_flush iterator, so using a hash table 
>> would only complicate the implementation, but still.
> 
> Yes, it seems correct for flush iterator go in same order as writes 
> comes, so we need a list. We can add a hash table, it will only help on 
> read.. But for compressed cache in qcow2 we try to flush often enough, 
> so there should not be many clusters in the cache. So I think addition 
> of hash table may be done later if needed.

Sure.  The problem I see is that we’ll probably never reach the point of 
it really being needed. O:)

So I think it’s a question of now or never.

[...]

>>> + */
>>> +bool seqcache_get_next_flush(SeqCache *s, int64_t *offset, int64_t 
>>> *bytes,
>>> +                             uint8_t **buf, bool *unfinished)
>>
>> Could be “uint8_t *const *buf”, I suppose.  Don’t know how much the 
>> callers would hate that, though.
> 
> Will do. And actually I wrote quite big explanation but missed the fact 
> that caller don't get ownership on buf, it should be mentioned.

Great, thanks.

>>> +{
>>> +    Cluster *req = s->next_flush;
>>> +
>>> +    if (s->next_flush) {
>>> +        *unfinished = false;
>>> +        req = s->next_flush;
>>> +        s->next_flush = QSIMPLEQ_NEXT(req, entry);
>>> +        if (s->next_flush == s->cur_write) {
>>> +            s->next_flush = NULL;
>>> +        }
>>> +    } else if (s->cur_write && *unfinished) {
>>> +        req = s->cur_write;
>>
>> I was wondering whether flushing an unfinished cluster wouldn’t kind 
>> of finalize it, but I suppose the problem with that would be that you 
>> can’t add data to a finished cluster, which wouldn’t be that great if 
>> you’re just flushing the cache without wanting to drop it all.
>>
>> (The problem I see is that flushing it later will mean all the data 
>> that already has been written here will have to be rewritten.  Not 
>> that bad, I suppose.)
> 
> Yes that's all correct. Also there is additional strong reason: qcow2 
> depends on the fact that clusters become "finished" by defined rules: 
> only when it really finished up the the end or when qcow2 starts writing 
> another cluster.
> 
> For "finished" clusters with unaligned end we can safely align this end 
> up to some good alignment writing a bit more data than needed. It's safe 
> because tail of the cluster is never used. And we'll perform better with 
> aligned write avoiding RMW.
> 
> But when flushing "unfinished" cluster, we should write exactly what we 
> have in the cache, as there may happen parallel write to the same 
> cluster, which will continue the sequential process.

OK, thanks for the explanation.

Max



  reply	other threads:[~2021-03-12 15:15 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 17:35 [PATCH v3 0/6] qcow2: compressed write cache Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 1/6] block-jobs: flush target at the end of .run() Vladimir Sementsov-Ogievskiy
2021-03-11 16:57   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard Vladimir Sementsov-Ogievskiy
2021-03-11 19:58   ` Max Reitz
2021-03-12  9:09     ` Vladimir Sementsov-Ogievskiy
2021-03-12 11:17       ` Max Reitz
2021-03-12 12:32         ` Vladimir Sementsov-Ogievskiy
2021-03-12 12:42           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:01             ` Max Reitz
2021-03-12 12:46           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:10             ` Max Reitz
2021-03-12 15:24               ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:52                 ` Max Reitz
2021-03-12 16:03                   ` Vladimir Sementsov-Ogievskiy
2021-03-12 14:58           ` Max Reitz
2021-03-12 15:39             ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 4/6] util: implement seqcache Vladimir Sementsov-Ogievskiy
2021-03-12 13:41   ` Max Reitz
2021-03-12 14:37     ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:13       ` Max Reitz [this message]
2021-06-04 14:31   ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 5/6] block-coroutine-wrapper: allow non bdrv_ prefix Vladimir Sementsov-Ogievskiy
2021-03-12 16:53   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes Vladimir Sementsov-Ogievskiy
2021-03-12 18:15   ` Max Reitz
2021-03-12 18:43     ` Vladimir Sementsov-Ogievskiy
2021-03-15  9:58       ` Max Reitz
2021-03-15 14:40         ` Vladimir Sementsov-Ogievskiy
2021-03-16 12:25           ` Max Reitz
2021-03-16 17:48             ` Vladimir Sementsov-Ogievskiy
2021-03-17  8:09               ` Max Reitz
2021-03-12 18:45     ` Vladimir Sementsov-Ogievskiy
2021-03-29 20:18     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f53fc06c-38df-f9fe-e927-b4f1b9bd5263@redhat.com \
    --to=mreitz@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).