[Qemu-devel] Re: [PATCH v2 5/7] qed: Table, L2 cache, and cluster functions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	Avi Kivity <avi@redhat.com>,
	qemu-devel@nongnu.org, Christoph Hellwig <hch@lst.de>
Subject: [Qemu-devel] Re: [PATCH v2 5/7] qed: Table, L2 cache, and cluster functions
Date: Wed, 13 Oct 2010 14:41:36 +0100	[thread overview]
Message-ID: <20101013134135.GC8998@stefan-thinkpad.transitives.com> (raw)
In-Reply-To: <4CB47452.8050500@redhat.com>

On Tue, Oct 12, 2010 at 04:44:34PM +0200, Kevin Wolf wrote:
> Am 08.10.2010 17:48, schrieb Stefan Hajnoczi:
> > diff --git a/block/qed-cluster.c b/block/qed-cluster.c
> > new file mode 100644
> > index 0000000..af65e5a
> > --- /dev/null
> > +++ b/block/qed-cluster.c
> > @@ -0,0 +1,145 @@
> > +/*
> > + * QEMU Enhanced Disk Format Cluster functions
> > + *
> > + * Copyright IBM, Corp. 2010
> > + *
> > + * Authors:
> > + *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
> > + *  Anthony Liguori   <aliguori@us.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> 
> Hm, just noticed it here: COPYING is the text of the GPL, not LGPL. The
> same comment applies to all other QED files, too.

It should be COPYING.LIB, thanks for pointing this out.

> 
> > + *
> > + */
> > +
> > +#include "qed.h"
> > +
> > +/**
> > + * Count the number of contiguous data clusters
> > + *
> > + * @s:              QED state
> > + * @table:          L2 table
> > + * @index:          First cluster index
> > + * @n:              Maximum number of clusters
> > + * @offset:         Set to first cluster offset
> > + *
> > + * This function scans tables for contiguous allocated or free clusters.
> > + */
> > +static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
> > +                                                  QEDTable *table,
> > +                                                  unsigned int index,
> > +                                                  unsigned int n,
> > +                                                  uint64_t *offset)
> > +{
> > +    unsigned int end = MIN(index + n, s->table_nelems);
> > +    uint64_t last = table->offsets[index];
> > +    unsigned int i;
> > +
> > +    *offset = last;
> > +
> > +    for (i = index + 1; i < end; i++) {
> > +        if (last == 0) {
> > +            /* Counting free clusters */
> > +            if (table->offsets[i] != 0) {
> > +                break;
> > +            }
> > +        } else {
> > +            /* Counting allocated clusters */
> > +            if (table->offsets[i] != last + s->header.cluster_size) {
> > +                break;
> > +            }
> > +            last = table->offsets[i];
> > +        }
> > +    }
> > +    return i - index;
> > +}
> > +
> > +typedef struct {
> > +    BDRVQEDState *s;
> > +    uint64_t pos;
> > +    size_t len;
> > +
> > +    QEDRequest *request;
> > +
> > +    /* User callback */
> > +    QEDFindClusterFunc *cb;
> > +    void *opaque;
> > +} QEDFindClusterCB;
> > +
> > +static void qed_find_cluster_cb(void *opaque, int ret)
> > +{
> > +    QEDFindClusterCB *find_cluster_cb = opaque;
> > +    BDRVQEDState *s = find_cluster_cb->s;
> > +    QEDRequest *request = find_cluster_cb->request;
> > +    uint64_t offset = 0;
> > +    size_t len = 0;
> > +    unsigned int index;
> > +    unsigned int n;
> > +
> > +    if (ret) {
> > +        ret = QED_CLUSTER_ERROR;
> 
> Can ret be anything else here? If so, why would we return a more generic
> error value instead of passing down the original one?
> 
> [Okay, after having read more code, this is the place where we throw
> errno away. We shouldn't do that.]
> 
> I also wonder, if reading from the disk failed, is the errno value lost?
> 
> > +        goto out;
> > +    }
> > +
> > +    index = qed_l2_index(s, find_cluster_cb->pos);
> > +    n = qed_bytes_to_clusters(s,
> > +                              qed_offset_into_cluster(s, find_cluster_cb->pos) +
> > +                              find_cluster_cb->len);
> > +    n = qed_count_contiguous_clusters(s, request->l2_table->table,
> > +                                      index, n, &offset);
> > +
> > +    ret = offset ? QED_CLUSTER_FOUND : QED_CLUSTER_L2;
> > +    len = MIN(find_cluster_cb->len, n * s->header.cluster_size -
> > +              qed_offset_into_cluster(s, find_cluster_cb->pos));
> > +
> > +    if (offset && !qed_check_cluster_offset(s, offset)) {
> > +        ret = QED_CLUSTER_ERROR;
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
> > +    qemu_free(find_cluster_cb);
> > +}
> > +
> > +/**
> > + * Find the offset of a data cluster
> > + *
> > + * @s:          QED state
> > + * @pos:        Byte position in device
> > + * @len:        Number of bytes
> > + * @cb:         Completion function
> > + * @opaque:     User data for completion function
> > + */
> 
> If we add header comments (which I think we should), we shouldn't do
> them only pro forma, but try to make them actually useful, i.e. describe
> all inputs and outputs.
> 
> I'm reading this code for the first time and all these callbacks are
> really confusing. What I know is that in all the state that I pass (s
> and request) _something_ changes and in the cb is called with _some_
> parameters of which I don't know what they mean.
> 
> So a good first step would adding a description of the arguments to cb.
> At least in qed_read_l2_table, which actually does directly change the
> state, we should additionally state that it returns the new L2 table in
> request->l2_table. Things like this are not obvious if you didn't write
> the code.

Right, I will improve the doc comments for v3.

> 
> > +void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
> > +                      size_t len, QEDFindClusterFunc *cb, void *opaque)
> > +{
> > +    QEDFindClusterCB *find_cluster_cb;
> > +    uint64_t l2_offset;
> > +
> > +    /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
> > +     * so that a request acts on one L2 table at a time.
> > +     */
> > +    len = MIN(len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
> > +
> > +    l2_offset = s->l1_table->offsets[qed_l1_index(s, pos)];
> > +    if (!l2_offset) {
> > +        cb(opaque, QED_CLUSTER_L1, 0, len);
> > +        return;
> > +    }
> > +    if (!qed_check_table_offset(s, l2_offset)) {
> > +        cb(opaque, QED_CLUSTER_ERROR, 0, 0);
> > +        return;
> > +    }
> > +
> > +    find_cluster_cb = qemu_malloc(sizeof(*find_cluster_cb));
> > +    find_cluster_cb->s = s;
> > +    find_cluster_cb->pos = pos;
> > +    find_cluster_cb->len = len;
> > +    find_cluster_cb->cb = cb;
> > +    find_cluster_cb->opaque = opaque;
> > +    find_cluster_cb->request = request;
> > +
> > +    qed_read_l2_table(s, request, l2_offset,
> > +                      qed_find_cluster_cb, find_cluster_cb);
> > +}
> > diff --git a/block/qed-gencb.c b/block/qed-gencb.c
> > new file mode 100644
> > index 0000000..d389e12
> > --- /dev/null
> > +++ b/block/qed-gencb.c
> > @@ -0,0 +1,32 @@
> > +/*
> > + * QEMU Enhanced Disk Format
> > + *
> > + * Copyright IBM, Corp. 2010
> > + *
> > + * Authors:
> > + *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qed.h"
> > +
> > +void *gencb_alloc(size_t len, BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    GenericCB *gencb = qemu_malloc(len);
> > +    gencb->cb = cb;
> > +    gencb->opaque = opaque;
> > +    return gencb;
> > +}
> > +
> > +void gencb_complete(void *opaque, int ret)
> > +{
> > +    GenericCB *gencb = opaque;
> > +    BlockDriverCompletionFunc *cb = gencb->cb;
> > +    void *user_opaque = gencb->opaque;
> > +
> > +    qemu_free(gencb);
> > +    cb(user_opaque, ret);
> > +}
> > diff --git a/block/qed-l2-cache.c b/block/qed-l2-cache.c
> > new file mode 100644
> > index 0000000..3b2bf6e
> > --- /dev/null
> > +++ b/block/qed-l2-cache.c
> > @@ -0,0 +1,132 @@
> > +/*
> > + * QEMU Enhanced Disk Format L2 Cache
> > + *
> > + * Copyright IBM, Corp. 2010
> > + *
> > + * Authors:
> > + *  Anthony Liguori   <aliguori@us.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qed.h"
> > +
> > +/* Each L2 holds 2GB so this let's us fully cache a 100GB disk */
> > +#define MAX_L2_CACHE_SIZE 50
> > +
> > +/**
> > + * Initialize the L2 cache
> > + */
> > +void qed_init_l2_cache(L2TableCache *l2_cache,
> > +                       L2TableAllocFunc *alloc_l2_table,
> 
> What is this function pointer meant for? So far I can only see one call
> to qed_init_l2_cache(), so I guess this indirection is just in
> preparation for some future extension? Maybe add a comment?

I will review this function pointer and address with a comment: the
L2TableAllocFunc decouples qed-l2-cache.c from the qemu_blockalign() and
table sizing.  Maybe it's not worth having this if it just complicates
things.

> 
> > +                       void *alloc_l2_table_opaque)
> > +{
> > +    QTAILQ_INIT(&l2_cache->entries);
> > +    l2_cache->n_entries = 0;
> > +    l2_cache->alloc_l2_table = alloc_l2_table;
> > +    l2_cache->alloc_l2_table_opaque = alloc_l2_table_opaque;
> > +}
> > +
> > +/**
> > + * Free the L2 cache
> > + */
> > +void qed_free_l2_cache(L2TableCache *l2_cache)
> > +{
> > +    CachedL2Table *entry, *next_entry;
> > +
> > +    QTAILQ_FOREACH_SAFE(entry, &l2_cache->entries, node, next_entry) {
> > +        qemu_vfree(entry->table);
> > +        qemu_free(entry);
> > +    }
> > +}
> > +
> > +/**
> > + * Allocate an uninitialized entry from the cache
> > + *
> > + * The returned entry has a reference count of 1 and is owned by the caller.
> > + */
> > +CachedL2Table *qed_alloc_l2_cache_entry(L2TableCache *l2_cache)
> > +{
> > +    CachedL2Table *entry;
> > +
> > +    entry = qemu_mallocz(sizeof(*entry));
> > +    entry->table = l2_cache->alloc_l2_table(l2_cache->alloc_l2_table_opaque);
> > +    entry->ref++;
> > +
> > +    return entry;
> > +}
> 
> Hm, what references are counted by ref? Do you have more than one L2
> cache and an entry can be referenced by multiple of them?

I'll add comments describing how the cache works and is used.  I hope
this will make refcounts and caching clearer.

> 
> > +
> > +/**
> > + * Decrease an entry's reference count and free if necessary when the reference
> > + * count drops to zero.
> > + */
> > +void qed_unref_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *entry)
> > +{
> > +    if (!entry) {
> > +        return;
> > +    }
> > +
> > +    entry->ref--;
> > +    if (entry->ref == 0) {
> > +        qemu_vfree(entry->table);
> > +        qemu_free(entry);
> > +    }
> > +}
> 
> The l2_cache arguments looks unused. Do we need it?

It's not needed, will remove.

> 
> > +
> > +/**
> > + * Find an entry in the L2 cache.  This may return NULL and it's up to the
> > + * caller to satisfy the cache miss.
> > + *
> > + * For a cached entry, this function increases the reference count and returns
> > + * the entry.
> > + */
> > +CachedL2Table *qed_find_l2_cache_entry(L2TableCache *l2_cache, uint64_t offset)
> > +{
> > +    CachedL2Table *entry;
> > +
> > +    QTAILQ_FOREACH(entry, &l2_cache->entries, node) {
> > +        if (entry->offset == offset) {
> > +            entry->ref++;
> > +            return entry;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +/**
> > + * Commit an L2 cache entry into the cache.  This is meant to be used as part of
> > + * the process to satisfy a cache miss.  A caller would allocate an entry which
> > + * is not actually in the L2 cache and then once the entry was valid and
> > + * present on disk, the entry can be committed into the cache.
> > + *
> > + * Since the cache is write-through, it's important that this function is not
> > + * called until the entry is present on disk and the L1 has been updated to
> > + * point to the entry.
> > + *
> > + * N.B. This function steals a reference to the l2_table from the caller so the
> > + * caller must obtain a new reference by issuing a call to
> > + * qed_find_l2_cache_entry().
> > + */
> > +void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table)
> > +{
> > +    CachedL2Table *entry;
> > +
> > +    entry = qed_find_l2_cache_entry(l2_cache, l2_table->offset);
> > +    if (entry) {
> > +        qed_unref_l2_cache_entry(l2_cache, entry);
> 
> Maybe the qed_find_l2_cache_entry semantics isn't really the right one
> if we need to decrease the refcount here just because that function just
> increased it and we don't actually want that?
> 
> > +        qed_unref_l2_cache_entry(l2_cache, l2_table);
> > +        return;
> > +    }
> > +
> > +    if (l2_cache->n_entries >= MAX_L2_CACHE_SIZE) {
> > +        entry = QTAILQ_FIRST(&l2_cache->entries);
> > +        QTAILQ_REMOVE(&l2_cache->entries, entry, node);
> > +        l2_cache->n_entries--;
> > +        qed_unref_l2_cache_entry(l2_cache, entry);
> > +    }
> > +
> > +    l2_cache->n_entries++;
> > +    QTAILQ_INSERT_TAIL(&l2_cache->entries, l2_table, node);
> > +}
> 
> Okay, so the table has the right refcount because we steal a refcount
> from the caller, and if you don't reuse this, we explicitly unref it. Am
> I the only one to find this interface confusing?
> 
> > diff --git a/block/qed-table.c b/block/qed-table.c
> > new file mode 100644
> > index 0000000..ba6faf0
> > --- /dev/null
> > +++ b/block/qed-table.c
> > @@ -0,0 +1,316 @@
> > +/*
> > + * QEMU Enhanced Disk Format Table I/O
> > + *
> > + * Copyright IBM, Corp. 2010
> > + *
> > + * Authors:
> > + *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
> > + *  Anthony Liguori   <aliguori@us.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "trace.h"
> > +#include "qemu_socket.h" /* for EINPROGRESS on Windows */
> > +#include "qed.h"
> > +
> > +typedef struct {
> > +    GenericCB gencb;
> > +    BDRVQEDState *s;
> > +    QEDTable *table;
> > +
> > +    struct iovec iov;
> > +    QEMUIOVector qiov;
> > +} QEDReadTableCB;
> > +
> > +static void qed_read_table_cb(void *opaque, int ret)
> > +{
> > +    QEDReadTableCB *read_table_cb = opaque;
> > +    QEDTable *table = read_table_cb->table;
> > +    int noffsets = read_table_cb->iov.iov_len / sizeof(uint64_t);
> > +    int i;
> > +
> > +    /* Handle I/O error */
> > +    if (ret) {
> > +        goto out;
> > +    }
> > +
> > +    /* Byteswap offsets */
> > +    for (i = 0; i < noffsets; i++) {
> > +        table->offsets[i] = le64_to_cpu(table->offsets[i]);
> > +    }
> > +
> > +out:
> > +    /* Completion */
> > +    trace_qed_read_table_cb(read_table_cb->s, read_table_cb->table, ret);
> > +    gencb_complete(&read_table_cb->gencb, ret);
> > +}
> > +
> > +static void qed_read_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
> > +                           BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    QEDReadTableCB *read_table_cb = gencb_alloc(sizeof(*read_table_cb),
> > +                                                cb, opaque);
> > +    QEMUIOVector *qiov = &read_table_cb->qiov;
> > +    BlockDriverAIOCB *aiocb;
> > +
> > +    trace_qed_read_table(s, offset, table);
> > +
> > +    read_table_cb->s = s;
> > +    read_table_cb->table = table;
> > +    read_table_cb->iov.iov_base = table->offsets,
> > +    read_table_cb->iov.iov_len = s->header.cluster_size * s->header.table_size,
> > +
> > +    qemu_iovec_init_external(qiov, &read_table_cb->iov, 1);
> > +    aiocb = bdrv_aio_readv(s->bs->file, offset / BDRV_SECTOR_SIZE, qiov,
> > +                           read_table_cb->iov.iov_len / BDRV_SECTOR_SIZE,
> > +                           qed_read_table_cb, read_table_cb);
> > +    if (!aiocb) {
> > +        qed_read_table_cb(read_table_cb, -EIO);
> > +    }
> > +}
> > +
> > +typedef struct {
> > +    GenericCB gencb;
> > +    BDRVQEDState *s;
> > +    QEDTable *orig_table;
> > +    QEDTable *table;
> > +    bool flush;             /* flush after write? */
> > +
> > +    struct iovec iov;
> > +    QEMUIOVector qiov;
> > +} QEDWriteTableCB;
> > +
> > +static void qed_write_table_cb(void *opaque, int ret)
> > +{
> > +    QEDWriteTableCB *write_table_cb = opaque;
> > +
> > +    trace_qed_write_table_cb(write_table_cb->s,
> > +                              write_table_cb->orig_table, ret);
> > +
> > +    if (ret) {
> > +        goto out;
> > +    }
> > +
> > +    if (write_table_cb->flush) {
> > +        /* We still need to flush first */
> > +        write_table_cb->flush = false;
> > +        bdrv_aio_flush(write_table_cb->s->bs, qed_write_table_cb,
> > +                       write_table_cb);
> > +        return;
> > +    }
> > +
> > +out:
> > +    qemu_vfree(write_table_cb->table);
> > +    gencb_complete(&write_table_cb->gencb, ret);
> > +    return;
> > +}
> > +
> > +/**
> > + * Write out an updated part or all of a table
> > + *
> > + * @s:          QED state
> > + * @offset:     Offset of table in image file, in bytes
> > + * @table:      Table
> > + * @index:      Index of first element
> > + * @n:          Number of elements
> > + * @flush:      Whether or not to sync to disk
> > + * @cb:         Completion function
> > + * @opaque:     Argument for completion function
> > + */
> > +static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
> > +                            unsigned int index, unsigned int n, bool flush,
> > +                            BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    QEDWriteTableCB *write_table_cb;
> > +    BlockDriverAIOCB *aiocb;
> > +    unsigned int sector_mask = BDRV_SECTOR_SIZE / sizeof(uint64_t) - 1;
> > +    unsigned int start, end, i;
> > +    size_t len_bytes;
> > +
> > +    trace_qed_write_table(s, offset, table, index, n);
> > +
> > +    /* Calculate indices of the first and one after last elements */
> > +    start = index & ~sector_mask;
> > +    end = (index + n + sector_mask) & ~sector_mask;
> > +
> > +    len_bytes = (end - start) * sizeof(uint64_t);
> > +
> > +    write_table_cb = gencb_alloc(sizeof(*write_table_cb), cb, opaque);
> > +    write_table_cb->s = s;
> > +    write_table_cb->orig_table = table;
> > +    write_table_cb->flush = flush;
> > +    write_table_cb->table = qemu_blockalign(s->bs, len_bytes);
> > +    write_table_cb->iov.iov_base = write_table_cb->table->offsets;
> > +    write_table_cb->iov.iov_len = len_bytes;
> > +    qemu_iovec_init_external(&write_table_cb->qiov, &write_table_cb->iov, 1);
> > +
> > +    /* Byteswap table */
> > +    for (i = start; i < end; i++) {
> > +        uint64_t le_offset = cpu_to_le64(table->offsets[i]);
> > +        write_table_cb->table->offsets[i - start] = le_offset;
> > +    }
> > +
> > +    /* Adjust for offset into table */
> > +    offset += start * sizeof(uint64_t);
> > +
> > +    aiocb = bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
> > +                            &write_table_cb->qiov,
> > +                            write_table_cb->iov.iov_len / BDRV_SECTOR_SIZE,
> > +                            qed_write_table_cb, write_table_cb);
> > +    if (!aiocb) {
> > +        qed_write_table_cb(write_table_cb, -EIO);
> > +    }
> > +}
> > +
> > +/**
> > + * Propagate return value from async callback
> > + */
> > +static void qed_sync_cb(void *opaque, int ret)
> > +{
> > +    *(int *)opaque = ret;
> > +}
> > +
> > +int qed_read_l1_table_sync(BDRVQEDState *s)
> > +{
> > +    int ret = -EINPROGRESS;
> > +
> > +    async_context_push();
> > +
> > +    qed_read_table(s, s->header.l1_table_offset,
> > +                   s->l1_table, qed_sync_cb, &ret);
> > +    while (ret == -EINPROGRESS) {
> > +        qemu_aio_wait();
> > +    }
> > +
> > +    async_context_pop();
> > +
> > +    return ret;
> > +}
> > +
> > +void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
> > +                        BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    BLKDBG_EVENT(s->bs->file, BLKDBG_L1_UPDATE);
> > +    qed_write_table(s, s->header.l1_table_offset,
> > +                    s->l1_table, index, n, false, cb, opaque);
> > +}
> > +
> > +int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
> > +                            unsigned int n)
> > +{
> > +    int ret = -EINPROGRESS;
> > +
> > +    async_context_push();
> > +
> > +    qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
> > +    while (ret == -EINPROGRESS) {
> > +        qemu_aio_wait();
> > +    }
> > +
> > +    async_context_pop();
> > +
> > +    return ret;
> > +}
> > +
> > +typedef struct {
> > +    GenericCB gencb;
> > +    BDRVQEDState *s;
> > +    uint64_t l2_offset;
> > +    QEDRequest *request;
> > +} QEDReadL2TableCB;
> > +
> > +static void qed_read_l2_table_cb(void *opaque, int ret)
> > +{
> > +    QEDReadL2TableCB *read_l2_table_cb = opaque;
> > +    QEDRequest *request = read_l2_table_cb->request;
> > +    BDRVQEDState *s = read_l2_table_cb->s;
> > +    CachedL2Table *l2_table = request->l2_table;
> > +
> > +    if (ret) {
> > +        /* can't trust loaded L2 table anymore */
> > +        qed_unref_l2_cache_entry(&s->l2_cache, l2_table);
> > +        request->l2_table = NULL;
> 
> Is decreasing the refcount by one and clearing request->l2_table enough?
> Didn't we destroy it for all references? Unless, of course, there is at
> most one reference, but then the refcount is useless.
> 
> Hm, or do we just increase the refcount before the cache entry is
> actually used, and we shouldn't do that? Not sure I understand the
> purpose of this refcount thing yet.
> 
> > +    } else {
> > +        l2_table->offset = read_l2_table_cb->l2_offset;
> > +
> > +        qed_commit_l2_cache_entry(&s->l2_cache, l2_table);
> > +
> > +        /* This is guaranteed to succeed because we just committed the entry
> > +         * to the cache.
> > +         */
> > +        request->l2_table = qed_find_l2_cache_entry(&s->l2_cache,
> > +                                                    l2_table->offset);
> > +        assert(request->l2_table != NULL);
> > +    }
> > +
> > +    gencb_complete(&read_l2_table_cb->gencb, ret);
> > +}
> > +
> > +void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
> > +                       BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    QEDReadL2TableCB *read_l2_table_cb;
> > +
> > +    qed_unref_l2_cache_entry(&s->l2_cache, request->l2_table);
> > +
> > +    /* Check for cached L2 entry */
> > +    request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, offset);
> > +    if (request->l2_table) {
> > +        cb(opaque, 0);
> > +        return;
> > +    }
> > +
> > +    request->l2_table = qed_alloc_l2_cache_entry(&s->l2_cache);
> > +
> > +    read_l2_table_cb = gencb_alloc(sizeof(*read_l2_table_cb), cb, opaque);
> > +    read_l2_table_cb->s = s;
> > +    read_l2_table_cb->l2_offset = offset;
> > +    read_l2_table_cb->request = request;
> > +
> > +    BLKDBG_EVENT(s->bs->file, BLKDBG_L2_LOAD);
> > +    qed_read_table(s, offset, request->l2_table->table,
> > +                   qed_read_l2_table_cb, read_l2_table_cb);
> > +}
> > +
> > +int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
> > +{
> > +    int ret = -EINPROGRESS;
> > +
> > +    async_context_push();
> > +
> > +    qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
> > +    while (ret == -EINPROGRESS) {
> > +        qemu_aio_wait();
> > +    }
> > +
> > +    async_context_pop();
> > +    return ret;
> > +}
> > +
> > +void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
> > +                        unsigned int index, unsigned int n, bool flush,
> > +                        BlockDriverCompletionFunc *cb, void *opaque)
> > +{
> > +    BLKDBG_EVENT(s->bs->file, BLKDBG_L2_UPDATE);
> > +    qed_write_table(s, request->l2_table->offset,
> > +                    request->l2_table->table, index, n, flush, cb, opaque);
> > +}
> > +
> > +int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
> > +                            unsigned int index, unsigned int n, bool flush)
> > +{
> > +    int ret = -EINPROGRESS;
> > +
> > +    async_context_push();
> > +
> > +    qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
> > +    while (ret == -EINPROGRESS) {
> > +        qemu_aio_wait();
> > +    }
> > +
> > +    async_context_pop();
> > +    return ret;
> > +}
> > diff --git a/block/qed.c b/block/qed.c
> > index ea03798..6d7f4d7 100644
> > --- a/block/qed.c
> > +++ b/block/qed.c
> > @@ -139,6 +139,15 @@ static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
> >      return 0;
> >  }
> >  
> > +static QEDTable *qed_alloc_table(void *opaque)
> > +{
> > +    BDRVQEDState *s = opaque;
> > +
> > +    /* Honor O_DIRECT memory alignment requirements */
> > +    return qemu_blockalign(s->bs,
> > +                           s->header.cluster_size * s->header.table_size);
> > +}
> > +
> >  static int bdrv_qed_open(BlockDriverState *bs, int flags)
> >  {
> >      BDRVQEDState *s = bs->opaque;
> > @@ -207,11 +216,24 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
> >              }
> >          }
> >      }
> > +
> > +    s->l1_table = qed_alloc_table(s);
> > +    qed_init_l2_cache(&s->l2_cache, qed_alloc_table, s);
> > +
> > +    ret = qed_read_l1_table_sync(s);
> > +    if (ret) {
> > +        qed_free_l2_cache(&s->l2_cache);
> 
> Why not initializing the L2 cache only if the read succeeded?

The consistency check patch adds more code after the call to
qed_read_l1_table_sync() where we really need the L2 cache, so it
will be like this or we can add more goto labels, I don't think it
matters.

Stefan

next prev parent reply	other threads:[~2010-10-13 13:41 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08 15:48 [Qemu-devel] [PATCH v2 0/7] qed: Add QEMU Enhanced Disk format Stefan Hajnoczi
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 1/7] qcow2: Make get_bits_from_size() common Stefan Hajnoczi
2010-10-08 18:01   ` [Qemu-devel] " Anthony Liguori
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 2/7] cutils: Add bytes_to_str() to format byte values Stefan Hajnoczi
2010-10-11 11:09   ` [Qemu-devel] " Kevin Wolf
2010-10-13  9:15   ` [Qemu-devel] " Markus Armbruster
2010-10-13  9:28     ` Kevin Wolf
2010-10-13 10:58       ` Stefan Hajnoczi
2010-10-13 10:25   ` [Qemu-devel] " Avi Kivity
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 3/7] docs: Add QED image format specification Stefan Hajnoczi
2010-10-10  9:20   ` [Qemu-devel] " Avi Kivity
2010-10-11 10:09     ` Stefan Hajnoczi
2010-10-11 13:04       ` Avi Kivity
2010-10-11 13:42         ` Stefan Hajnoczi
2010-10-11 13:44           ` Avi Kivity
2010-10-11 14:06             ` Stefan Hajnoczi
2010-10-11 14:12               ` Avi Kivity
2010-10-11 15:02             ` Anthony Liguori
2010-10-11 15:24               ` Avi Kivity
2010-10-11 15:41                 ` Anthony Liguori
2010-10-11 15:47                   ` Avi Kivity
2010-10-11 14:54         ` Anthony Liguori
2010-10-11 14:58           ` Avi Kivity
2010-10-11 15:49             ` Anthony Liguori
2010-10-11 16:02               ` Avi Kivity
2010-10-11 16:10                 ` Anthony Liguori
2010-10-12 10:25                   ` Avi Kivity
2010-10-11 13:58   ` Kevin Wolf
2010-10-11 15:30     ` Stefan Hajnoczi
2010-10-11 15:39       ` Avi Kivity
2010-10-11 15:46         ` Stefan Hajnoczi
2010-10-11 16:18           ` Anthony Liguori
2010-10-11 17:14             ` Anthony Liguori
2010-10-12  8:07               ` Kevin Wolf
2010-10-12 13:16                 ` Stefan Hajnoczi
2010-10-12 13:32                   ` Anthony Liguori
2010-10-11 15:50       ` Kevin Wolf
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 4/7] qed: Add QEMU Enhanced Disk image format Stefan Hajnoczi
2010-10-11 15:16   ` [Qemu-devel] " Kevin Wolf
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 5/7] qed: Table, L2 cache, and cluster functions Stefan Hajnoczi
2010-10-12 14:44   ` [Qemu-devel] " Kevin Wolf
2010-10-13 13:41     ` Stefan Hajnoczi [this message]
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 6/7] qed: Read/write support Stefan Hajnoczi
2010-10-10  9:10   ` [Qemu-devel] " Avi Kivity
2010-10-11 10:37     ` Stefan Hajnoczi
2010-10-11 13:10       ` Avi Kivity
2010-10-11 13:55         ` Stefan Hajnoczi
2010-10-11 14:57         ` Anthony Liguori
2010-10-12 15:08   ` Kevin Wolf
2010-10-12 15:22     ` Anthony Liguori
2010-10-12 15:39       ` Kevin Wolf
2010-10-12 15:59         ` Stefan Hajnoczi
2010-10-12 16:16           ` Anthony Liguori
2010-10-12 16:21             ` Avi Kivity
2010-10-13 12:13             ` Stefan Hajnoczi
2010-10-13 13:07               ` Kevin Wolf
2010-10-13 13:24                 ` Anthony Liguori
2010-10-13 13:50                   ` Avi Kivity
2010-10-13 14:07                     ` Stefan Hajnoczi
2010-10-13 14:08                       ` Anthony Liguori
2010-10-13 14:10                       ` Avi Kivity
2010-10-13 14:11                         ` Anthony Liguori
2010-10-13 14:16                           ` Avi Kivity
2010-10-13 14:53                             ` Anthony Liguori
2010-10-13 15:08                               ` Avi Kivity
2010-10-13 15:42                                 ` Anthony Liguori
2010-10-14 11:06                         ` Stefan Hajnoczi
2010-10-13 14:10                     ` Anthony Liguori
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 7/7] qed: Consistency check support Stefan Hajnoczi
2010-10-11 13:21 ` [Qemu-devel] Re: [PATCH v2 0/7] qed: Add QEMU Enhanced Disk format Kevin Wolf
2010-10-11 15:37   ` Stefan Hajnoczi
2010-10-16  7:51 ` [Qemu-devel] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101013134135.GC8998@stefan-thinkpad.transitives.com \
    --to=stefanha@linux.vnet.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=hch@lst.de \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.