From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:48605)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@linux.vnet.ibm.com>) id 1S20RW-00035K-FE
	for qemu-devel@nongnu.org; Mon, 27 Feb 2012 08:16:43 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@linux.vnet.ibm.com>) id 1S20RQ-0001rf-Hd
	for qemu-devel@nongnu.org; Mon, 27 Feb 2012 08:16:42 -0500
Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:48706)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@linux.vnet.ibm.com>) id 1S20RQ-0001rL-9K
	for qemu-devel@nongnu.org; Mon, 27 Feb 2012 08:16:36 -0500
Received: from /spool/local
	by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <stefanha@linux.vnet.ibm.com>;
	Mon, 27 Feb 2012 13:16:32 -0000
From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Date: Mon, 27 Feb 2012 13:16:01 +0000
Message-Id: <1330348561-9340-1-git-send-email-stefanha@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [Qemu-devel] [PATCH] qed: do not evict in-use L2 table cache entries
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-stable@nongnu.org, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

The L2 table cache reduces QED metadata reads that would be required
when translating LBAs to offsets into the image file.  Since requests
execute in parallel it is possible to share an L2 table between multiple
requests.

There is a potential data corruption issue when an in-use L2 table is
evicted from the cache because the following situation occurs:

  1. An allocating write performs an update to L2 table "A".

  2. Another request needs L2 table "B" and causes table "A" to be
     evicted.

  3. A new read request needs L2 table "A" but it is not cached.

As a result the L2 update from #1 can overlap with the L2 fetch from #3.
We must avoid doing overlapping I/O requests here since the worst case
outcome is that the L2 fetch completes before the L2 update and yields
stale data.  In that case we would effectively discard the L2 update and
lose data clusters!

Thanks to Benoît Canet <benoit.canet@gmail.com> for extensive testing
and debugging which lead to discovery of this bug.

Reported-by: Benoît Canet <benoit.canet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
Please include this in -stable once it has been merged into qemu.git/master.

 block/qed-l2-cache.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/block/qed-l2-cache.c b/block/qed-l2-cache.c
index 02b81a2..e9b2aae 100644
--- a/block/qed-l2-cache.c
+++ b/block/qed-l2-cache.c
@@ -161,11 +161,25 @@ void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table)
         return;
     }
 
+    /* Evict an unused cache entry so we have space.  If all entries are in use
+     * we can grow the cache temporarily and we try to shrink back down later.
+     */
     if (l2_cache->n_entries >= MAX_L2_CACHE_SIZE) {
-        entry = QTAILQ_FIRST(&l2_cache->entries);
-        QTAILQ_REMOVE(&l2_cache->entries, entry, node);
-        l2_cache->n_entries--;
-        qed_unref_l2_cache_entry(entry);
+        CachedL2Table *next;
+        QTAILQ_FOREACH_SAFE(entry, &l2_cache->entries, node, next) {
+            if (entry->ref > 1) {
+                continue;
+            }
+
+            QTAILQ_REMOVE(&l2_cache->entries, entry, node);
+            l2_cache->n_entries--;
+            qed_unref_l2_cache_entry(entry);
+
+            /* Stop evicting when we've shrunk back to max size */
+            if (l2_cache->n_entries < MAX_L2_CACHE_SIZE) {
+                break;
+            }
+        }
     }
 
     l2_cache->n_entries++;
-- 
1.7.9