From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43161) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRPdI-0001Pe-3i for qemu-devel@nongnu.org; Fri, 27 Feb 2015 13:27:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YRPdE-0004M3-2q for qemu-devel@nongnu.org; Fri, 27 Feb 2015 13:27:28 -0500 Message-ID: <54F0B708.10503@redhat.com> Date: Fri, 27 Feb 2015 13:27:20 -0500 From: Max Reitz MIME-Version: 1.0 References: <1425045947-9271-1-git-send-email-mreitz@redhat.com> <54F0AC87.3040707@redhat.com> <54F0B2C3.1010506@redhat.com> In-Reply-To: <54F0B2C3.1010506@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] block/vdi: Add locking for parallel requests List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-block@nongnu.org Cc: Kevin Wolf , qemu-stable@nongnu.org, Stefan Hajnoczi , qemu-devel@nongnu.org On 2015-02-27 at 13:09, Max Reitz wrote: > On 2015-02-27 at 12:42, Paolo Bonzini wrote: >> >> On 27/02/2015 15:05, Max Reitz wrote: >>> Concurrently modifying the bmap does not seem to be a good idea; >>> this patch adds >>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for >>> what >>> can go wrong without. >>> >>> Cc: qemu-stable >>> Signed-off-by: Max Reitz >>> --- >>> v2: >>> - Make the mutex cover vdi_co_write() completely [Kevin] >>> - Add a TODO comment [Kevin] >> I think I know what the bug is. Suppose you have two concurrent writes >> to a non-allocated block, one at 16K...32K (in bytes) and one at >> 32K...48K. The first write is enlarged to contain zeros, the second is >> not. Then you have two writes in flight: >> >> 0 zeros >> ... zeros >> 16K data1 >> ... data1 >> 32K zeros data2 >> ... zeros data2 >> 48K zeros >> ... zeros >> 64K >> >> And the contents of 32K...48K are undefined. If the above diagnosis is >> correct, I'm not even sure why Max's v1 patch worked... > > Maybe that's an issue, too; but the test case I sent out does 1 MB > requests (and it fails), so this shouldn't matter there. Considering that test case didn't work for Stefan (Weil), and it fails in a pretty strange way for me (no output from the qemu-io command at all, and while most reads from raw were successful, all reads from vdi failed (the pattern verification failed), maybe that's something completely different. Indeed, when I do sub-MB writes, I get sporadic errors which seem much more related to the original bug report, so it's probably the issue you found that's the real problem. Also, my test case suddenly stopped reproducing the issue on my HDD and only does it on tmpfs. Weird. Max >> An optimized fix could be to use a CoRwLock, then: > > Yes, I'm actually already working on that. > > Max > >> - take it shared (read) around the write in the >> "VDI_IS_ALLOCATED(bmap_entry)" path >> >> - take it exclusive (write) around the write in the >> "!VDI_IS_ALLOCATED(bmap_entry)" path >> >> Paolo