Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Laurent Vivier <Laurent.Vivier@bull.net>
To: Kevin Wolf <kwolf@suse.de>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
Date: Mon, 11 Aug 2008 14:39:05 +0200	[thread overview]
Message-ID: <1218458345.3871.12.camel@frecb07144> (raw)
In-Reply-To: <48A02C42.1040804@suse.de>

Hi Kevin,

BTW, I'm currently rewriting this patch...

Le lundi 11 août 2008 à 14:10 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Modify get_cluster_offset(), alloc_cluster_offset() and free_used_clusters()
> > to specify how many clusters we want.
> > 
> > Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> > ---
> >  block-qcow2.c |  212 ++++++++++++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 154 insertions(+), 58 deletions(-)
> > 
> > Index: qemu/block-qcow2.c
> > ===================================================================
> > --- qemu.orig/block-qcow2.c	2008-07-29 15:22:26.000000000 +0200
> > +++ qemu/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
> > @@ -575,32 +575,76 @@ static int l2_allocate(BlockDriverState 
> >      return 1;
> >  }
> >  
> > -static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
> > +static uint64_t get_cluster_offset(BlockDriverState *bs,
> > +                                   uint64_t offset, int *num)
> 
> I think you start to know what kind of comments I'll provide. So yes,
> here's another one of them: While it's intuitive what value I should
> pass for num, it's cleary not what the function will return in it. Or
> even what the function is doing at all.
> 
> This is how I understand it: The returned num is the number of
> contiguous clusters that can be read with a single read operation, i.e.
> they are all sparse, come from a backing file or are physically
> contiguous in the image file.

Yes

> Add a comment which says this and I'll be happy.

OK (I'll cut&paste the lines above ;-) )

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, l2_index, ret;
> > -    uint64_t l2_offset, *l2_table, cluster_offset;
> > +    uint64_t l2_offset, *l2_table, cluster_offset, next;
> > +    int l1_bits;
> > +    int index_in_cluster, nb_available, nb_needed;
> >  
> > -    l1_index = offset >> (s->l2_bits + s->cluster_bits);
> > +    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> > +    nb_needed = *num + index_in_cluster;
> > +
> > +    l1_bits = s->l2_bits + s->cluster_bits;
> > +
> > +    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> > +    nb_available = (nb_available >> 9) + index_in_cluster;
> 
> This could use a comment that nb_available is the remaining sectors in
> the L2 table (is it?) and that it is used in the following two
> conditions (the goto makes this non-obvious - at first, I thought that
> this value wouldn't be used at all)

nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));

is the number of bytes given by one l1 entry

nb_available = (nb_available >> 9) + index_in_cluster;

is the number of sectors from the first sector we want to the last
sector of the same l1 entry.

I do that because I don't want to manage the case where we run across a
l1 cache entry boundary.

I guess you want comments ?

> > +
> > +    cluster_offset = 0;
> > +
> > +    l1_index = offset >> l1_bits;
> >      if (l1_index >= s->l1_size)
> > -        return 0;
> > +        goto out;
> >  
> >      if (!s->l1_table[l1_index])
> > -        return 0;
> > +        goto out;
> >  
> >      ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
> >      if (ret == 0)
> > -        return 0;
> > +        goto out;
> 
> ret == 0 means that loading the L2 table failed. This is a real error,
> right? Isn't return 0 the right thing to do then?

Yes... :-P

> >  
> >      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> > +    nb_available = s->cluster_sectors;
> > +    l2_index++;
> > +
> > +    if (!cluster_offset) {
> > +
> > +       /* how many empty clusters ? */
> > +
> > +       while (nb_available < nb_needed && !l2_table[l2_index]) {
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +
> > +   } else {
> >  
> > -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +       /* how many allocated clusters ? */
> > +
> > +       cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +       while (nb_available < nb_needed) {
> > +           next = be64_to_cpu(l2_table[l2_index]) & ~QCOW_OFLAG_COPIED;
> > +           if (next != cluster_offset + (nb_available << 9))
> > +               break;
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +   }
> > +
> > +out:
> > +    if (nb_available > nb_needed)
> > +        nb_available = nb_needed;
> > +
> > +    *num = nb_available - index_in_cluster;
> > +
> > +    return cluster_offset;
> >  }
> >  
> >  static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
> >  				   uint64_t **l2_table, uint64_t *l2_offset,
> > -                                   int *l2_index)
> > +                                   int *l2_index, int *nb_clusters)
> 
> You would save some ifs if you didn't allow nb_cluster to be NULL.
> Passing a local variable containing 1 should do the very same thing and
> seems to be less error prone. Otherwise, put a note here which says what
> passing NULL means.
> 

To follow your previous comments (patch 3), free_used_clusters() has
been removed from this patch...

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, ret;
> > @@ -629,21 +673,63 @@ static uint64_t free_used_clusters(Block
> >      *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
> >  
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > +    if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
> > +            *nb_clusters = s->l2_size - (*l2_index);
> > +
> > +    if (!cluster_offset) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> > +        return 0;
> > +    }
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            uint64_t current;
> > +            while (i < *nb_clusters) {
> > +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +                if (cluster_offset + (i << s->cluster_bits) != current)
> > +                    break;
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> >          return cluster_offset;
> > +    }
> >  
> > -    if (cluster_offset) {
> > -        /* free the cluster */
> > -        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > -            int nb_csectors;
> > -            nb_csectors = ((cluster_offset >> s->csize_shift) &
> > -                           s->csize_mask) + 1;
> > -            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> > -                          nb_csectors * 512);
> > -        } else {
> > -            free_clusters(bs, cluster_offset, s->cluster_size);
> > +    /* free the cluster */
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > +        int nb_csectors;
> > +        nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
> > +        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> > +                      nb_csectors * 512);
> > +        if (nb_clusters)
> > +            *nb_clusters = 1;
> > +        return 0;
> > +    }
> > +
> > +    if (nb_clusters) {
> > +        int i = 1;
> > +        uint64_t current;
> > +        while (i < *nb_clusters) {
> > +            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +            if (cluster_offset + (i << s->cluster_bits) != current)
> > +                break;
> > +            i++;
> >          }
> > +        *nb_clusters = i;
> > +        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> > +        return 0;
> >      }
> > +
> > +    free_clusters(bs, cluster_offset, s->cluster_size);
> > +
> >      return 0;
> >  }
> >  
> > @@ -657,7 +743,8 @@ static uint64_t alloc_compressed_cluster
> >      int nb_csectors;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        NULL);
> >      if (cluster_offset & QCOW_OFLAG_COPIED)
> >          return cluster_offset & ~QCOW_OFLAG_COPIED;
> >  
> > @@ -683,63 +770,80 @@ static uint64_t alloc_compressed_cluster
> >  
> >  static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> >                                       uint64_t offset,
> > -                                     int n_start, int n_end)
> > +                                     int n_start, int n_end,
> > +                                     int *num)
> 
> The interface between get_cluster_offset and alloc_cluster_offset is
> inconsistent. In the former function, the value passed in num is used to
> determine the number of clusters to get. In the latter, num is an output
> parameter whose value isn't used. This is confusing.

Yes, I know, I have to think about this.

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l2_index, ret;
> >      uint64_t l2_offset, *l2_table, cluster_offset;
> > +    int nb_available, nb_clusters, i;
> > +    uint64_t start_sect;
> >  
> > +    nb_clusters = ((n_end << 9) + s->cluster_size - 1) >>
> > +                  s->cluster_bits;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        &nb_clusters);
> > +    nb_available = nb_clusters << (s->cluster_bits - 9);
> > +    if (nb_available > n_end)
> > +        nb_available = n_end;
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +        goto out;
> > +    }
> >  
> > -    /* allocate a new cluster */
> > +    /* allocate new clusters */
> >  
> > -    cluster_offset = alloc_clusters(bs, s->cluster_size);
> > +    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> >  
> >      /* we must initialize the cluster content which won't be
> >         written */
> >  
> > -    if ((n_end - n_start) < s->cluster_sectors) {
> > -        uint64_t start_sect;
> > -
> > -        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    if (n_start) {
> >          ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
> >          if (ret < 0)
> >              return 0;
> > -        ret = copy_sectors(bs, start_sect,
> > -                           cluster_offset, n_end, s->cluster_sectors);
> > +    }
> > +
> > +    if (nb_available & (s->cluster_sectors - 1)) {
> > +        uint64_t end = nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> > +        ret = copy_sectors(bs, start_sect + end,
> > +                           cluster_offset + (end << 9),
> > +                           nb_available - end,
> > +                           s->cluster_sectors);
> >          if (ret < 0)
> >              return 0;
> >      }
> >  
> >      /* update L2 table */
> >  
> > -    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> > +    for (i = 0; i < nb_clusters; i++)
> > +        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> > +                                             (i << s->cluster_bits)) |
> > +                                             QCOW_OFLAG_COPIED);
> > +
> >      if (bdrv_pwrite(s->hd,
> >                      l2_offset + l2_index * sizeof(uint64_t),
> >                      l2_table + l2_index,
> > -                    sizeof(uint64_t)) != sizeof(uint64_t))
> > +                    nb_clusters * sizeof(uint64_t)) !=
> > +                    nb_clusters * sizeof(uint64_t))
> >          return 0;
> >  
> > +out:
> > +    *num = nb_available - n_start;
> >      return cluster_offset;
> >  }
> >  
> >  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
> >                               int nb_sectors, int *pnum)
> >  {
> > -    BDRVQcowState *s = bs->opaque;
> > -    int index_in_cluster, n;
> >      uint64_t cluster_offset;
> >  
> > -    cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > -    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -    n = s->cluster_sectors - index_in_cluster;
> > -    if (n > nb_sectors)
> > -        n = nb_sectors;
> > -    *pnum = n;
> > +    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> > +
> >      return (cluster_offset != 0);
> >  }
> >  
> > @@ -816,11 +920,9 @@ static int qcow_read(BlockDriverState *b
> >      uint64_t cluster_offset;
> >  
> >      while (nb_sectors > 0) {
> > -        cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > +        n = nb_sectors;
> > +        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          if (!cluster_offset) {
> >              if (bs->backing_hd) {
> >                  /* read from the base image */
> > @@ -862,12 +964,10 @@ static int qcow_write(BlockDriverState *
> >  
> >      while (nb_sectors > 0) {
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> >                                                index_in_cluster,
> > -                                              index_in_cluster + n);
> > +                                              index_in_cluster + nb_sectors,
> > +                                              &n);
> >          if (!cluster_offset)
> >              return -1;
> >          if (s->crypt_method) {
> > @@ -940,11 +1040,9 @@ static void qcow_aio_read_cb(void *opaqu
> >      }
> >  
> >      /* prepare next AIO request */
> > -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
> > +    acb->n = acb->nb_sectors;
> > +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >  
> >      if (!acb->cluster_offset) {
> >          if (bs->backing_hd) {
> > @@ -1046,12 +1144,10 @@ static void qcow_aio_write_cb(void *opaq
> >      }
> >  
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >      cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> >                                            index_in_cluster,
> > -                                          index_in_cluster + acb->n);
> > +                                          index_in_cluster + acb->nb_sectors,
> > +                                          &acb->n);
> >      if (!cluster_offset || (cluster_offset & 511) != 0) {
> >          ret = -EIO;
> >          goto fail;
> 
> In the writing functions, you can't just assign a big n, because
> s->cluster_data will be too small when processing encrypted data. As you
> said you fixed a segfault, I think you know this one already.

yes, this is the cause of the segfault.

Regards,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

next prev parent reply	other threads:[~2008-08-11 12:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
2008-08-05 14:15   ` Kevin Wolf
2008-08-05 14:28     ` Laurent Vivier
2008-08-05 14:34       ` Kevin Wolf
2008-08-05 14:45         ` Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset() Laurent Vivier
2008-08-05 15:13   ` Kevin Wolf
2008-08-05 15:25     ` Laurent Vivier
2008-08-05 15:41       ` Kevin Wolf
2008-07-29 14:13 ` [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset() Laurent Vivier
2008-08-06 14:20   ` Kevin Wolf
2008-08-06 14:41     ` Laurent Vivier
2008-08-06 14:56       ` Kevin Wolf
2008-08-06 15:03         ` Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters Laurent Vivier
2008-08-11 12:10   ` Kevin Wolf
2008-08-11 12:39     ` Laurent Vivier [this message]
2008-07-29 14:13 ` [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters Laurent Vivier
2008-08-11 13:13   ` Kevin Wolf
2008-08-11 14:04     ` Laurent Vivier
2008-08-11 14:42       ` Laurent Vivier
2008-08-11 15:03       ` Kevin Wolf
2008-07-29 19:15 ` [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Anthony Liguori
2008-07-29 21:35   ` Laurent Vivier
2008-07-29 21:49     ` Anthony Liguori
2008-07-29 21:59       ` Laurent Vivier
2008-08-01 14:54         ` Anthony Liguori
2008-08-01 15:05           ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1218458345.3871.12.camel@frecb07144 \
    --to=laurent.vivier@bull.net \
    --cc=kwolf@suse.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).