* [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption @ 2014-03-10 22:44 Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz ` (4 more replies) 0 siblings, 5 replies; 11+ messages in thread From: Max Reitz @ 2014-03-10 22:44 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz This series fixes a call to a NULL function pointer during copy_sectors(), if a concurrent write request triggered the corruption prevention mechanism. Max Reitz (3): qcow2: Check bs->drv in copy_sectors() block: bs->drv may be NULL in bdrv_debug_resume() iotests: Test corruption during COW request block.c | 2 +- block/qcow2-cluster.c | 4 ++++ tests/qemu-iotests/060 | 26 ++++++++++++++++++++++++++ tests/qemu-iotests/060.out | 15 +++++++++++++++ 4 files changed, 46 insertions(+), 1 deletion(-) -- 1.9.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz @ 2014-03-10 22:44 ` Max Reitz 2014-03-10 23:16 ` Laszlo Ersek 2014-03-10 22:44 ` [Qemu-devel] [PATCH 2/3] block: bs->drv may be NULL in bdrv_debug_resume() Max Reitz ` (3 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Max Reitz @ 2014-03-10 22:44 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz Before dereferencing bs->drv for a call to its member bdrv_co_readv(), copy_sectors() should check whether that pointer is indeed valid, since it may have been set to NULL by e.g. a concurrent write triggering the corruption prevention mechanism. Signed-off-by: Max Reitz <mreitz@redhat.com> --- To be precise, this still is a race condition. If bs->drv is set to NULL after the check and before the call to bdrv_co_readv(), QEMU will obviously still crash. However, in order to circumvent this behavior, we would probably have to re-lock s->lock, check bs->drv, take the function pointer to bdrv_co_readv() and then unlock s->lock before the function is called. I found this rather ugly and therefore this still has a very small chance of running into a race condition. Therefore, I'm asking for your opinion on this, whether we can really take this chance or should rather "do it right". In fact, if I were a reviewer, I'd probably reject this patch and request the solution with the function pointer (if there is no better solution), but I was afraid to send such an ugly patch. --- block/qcow2-cluster.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 36c1bed..9499df9 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs, BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); + if (!bs->drv) { + return -ENOMEDIUM; + } + /* Call .bdrv_co_readv() directly instead of using the public block-layer * interface. This avoids double I/O throttling and request tracking, * which can lead to deadlock when block layer copy-on-read is enabled. -- 1.9.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz @ 2014-03-10 23:16 ` Laszlo Ersek 2014-03-11 10:16 ` Kevin Wolf 0 siblings, 1 reply; 11+ messages in thread From: Laszlo Ersek @ 2014-03-10 23:16 UTC (permalink / raw) To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi On 03/10/14 23:44, Max Reitz wrote: > Before dereferencing bs->drv for a call to its member bdrv_co_readv(), > copy_sectors() should check whether that pointer is indeed valid, since > it may have been set to NULL by e.g. a concurrent write triggering the > corruption prevention mechanism. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > To be precise, this still is a race condition. If bs->drv is set to NULL > after the check and before the call to bdrv_co_readv(), QEMU will > obviously still crash. However, in order to circumvent this behavior, we > would probably have to re-lock s->lock, check bs->drv, take the function > pointer to bdrv_co_readv() and then unlock s->lock before the function > is called. I found this rather ugly and therefore this still has a very > small chance of running into a race condition. > Therefore, I'm asking for your opinion on this, whether we can really > take this chance or should rather "do it right". In fact, if I were a > reviewer, I'd probably reject this patch and request the solution with > the function pointer (if there is no better solution), but I was afraid > to send such an ugly patch. > --- > block/qcow2-cluster.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c > index 36c1bed..9499df9 100644 > --- a/block/qcow2-cluster.c > +++ b/block/qcow2-cluster.c > @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs, > > BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); > > + if (!bs->drv) { > + return -ENOMEDIUM; > + } > + > /* Call .bdrv_co_readv() directly instead of using the public block-layer > * interface. This avoids double I/O throttling and request tracking, > * which can lead to deadlock when block layer copy-on-read is enabled. > I can't answer your question nor review this patch -- instead, I have a question of my own: when you say "set to NULL by [...] the corruption prevention mechanism", do you mean qcow2_pre_write_overlap_check(): bs->drv = NULL; /* make BDS unusable */ If so: I thought that it was quite a bold move, but also that we'd find the SIGSEGVs sooner or later... :) Thanks Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-10 23:16 ` Laszlo Ersek @ 2014-03-11 10:16 ` Kevin Wolf 2014-03-11 14:04 ` Paolo Bonzini 2014-03-11 18:09 ` Max Reitz 0 siblings, 2 replies; 11+ messages in thread From: Kevin Wolf @ 2014-03-11 10:16 UTC (permalink / raw) To: Laszlo Ersek; +Cc: qemu-devel, Stefan Hajnoczi, Max Reitz Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben: > On 03/10/14 23:44, Max Reitz wrote: > > Before dereferencing bs->drv for a call to its member bdrv_co_readv(), > > copy_sectors() should check whether that pointer is indeed valid, since > > it may have been set to NULL by e.g. a concurrent write triggering the > > corruption prevention mechanism. > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > --- > > To be precise, this still is a race condition. If bs->drv is set to NULL > > after the check and before the call to bdrv_co_readv(), QEMU will > > obviously still crash. However, in order to circumvent this behavior, we > > would probably have to re-lock s->lock, check bs->drv, take the function > > pointer to bdrv_co_readv() and then unlock s->lock before the function > > is called. I found this rather ugly and therefore this still has a very > > small chance of running into a race condition. > > Therefore, I'm asking for your opinion on this, whether we can really > > take this chance or should rather "do it right". In fact, if I were a > > reviewer, I'd probably reject this patch and request the solution with > > the function pointer (if there is no better solution), but I was afraid > > to send such an ugly patch. No, the code is fine. Remember that qcow2 is not threaded, we're talking about coroutines here. There is no way for the code to yield between your check and the protected place. > > block/qcow2-cluster.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c > > index 36c1bed..9499df9 100644 > > --- a/block/qcow2-cluster.c > > +++ b/block/qcow2-cluster.c > > @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs, > > > > BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); > > > > + if (!bs->drv) { > > + return -ENOMEDIUM; > > + } > > + > > /* Call .bdrv_co_readv() directly instead of using the public block-layer > > * interface. This avoids double I/O throttling and request tracking, > > * which can lead to deadlock when block layer copy-on-read is enabled. > > > > I can't answer your question nor review this patch -- instead, I have a > question of my own: when you say "set to NULL by [...] the corruption > prevention mechanism", do you mean qcow2_pre_write_overlap_check(): > > bs->drv = NULL; /* make BDS unusable */ Yes, this is the place. > If so: I thought that it was quite a bold move, but also that we'd find > the SIGSEGVs sooner or later... :) In fact, if you use the block layer API, most functions check for bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we directly dereference the pointer without going through block.c (there's a good reason for this, see the comment, but it still makes it somewhat special). Kevin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-11 10:16 ` Kevin Wolf @ 2014-03-11 14:04 ` Paolo Bonzini 2014-03-11 18:19 ` Max Reitz 2014-03-11 18:09 ` Max Reitz 1 sibling, 1 reply; 11+ messages in thread From: Paolo Bonzini @ 2014-03-11 14:04 UTC (permalink / raw) To: Kevin Wolf, Laszlo Ersek; +Cc: qemu-devel, Stefan Hajnoczi, Max Reitz Il 11/03/2014 11:16, Kevin Wolf ha scritto: > Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben: >> On 03/10/14 23:44, Max Reitz wrote: >>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(), >>> copy_sectors() should check whether that pointer is indeed valid, since >>> it may have been set to NULL by e.g. a concurrent write triggering the >>> corruption prevention mechanism. >>> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> To be precise, this still is a race condition. If bs->drv is set to NULL >>> after the check and before the call to bdrv_co_readv(), QEMU will >>> obviously still crash. However, in order to circumvent this behavior, we >>> would probably have to re-lock s->lock, check bs->drv, take the function >>> pointer to bdrv_co_readv() and then unlock s->lock before the function >>> is called. I found this rather ugly and therefore this still has a very >>> small chance of running into a race condition. >>> Therefore, I'm asking for your opinion on this, whether we can really >>> take this chance or should rather "do it right". In fact, if I were a >>> reviewer, I'd probably reject this patch and request the solution with >>> the function pointer (if there is no better solution), but I was afraid >>> to send such an ugly patch. > > No, the code is fine. Remember that qcow2 is not threaded, we're talking > about coroutines here. There is no way for the code to yield between > your check and the protected place. > >>> block/qcow2-cluster.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >>> index 36c1bed..9499df9 100644 >>> --- a/block/qcow2-cluster.c >>> +++ b/block/qcow2-cluster.c >>> @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs, >>> >>> BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); >>> >>> + if (!bs->drv) { >>> + return -ENOMEDIUM; >>> + } >>> + >>> /* Call .bdrv_co_readv() directly instead of using the public block-layer >>> * interface. This avoids double I/O throttling and request tracking, >>> * which can lead to deadlock when block layer copy-on-read is enabled. >>> >> >> I can't answer your question nor review this patch -- instead, I have a >> question of my own: when you say "set to NULL by [...] the corruption >> prevention mechanism", do you mean qcow2_pre_write_overlap_check(): >> >> bs->drv = NULL; /* make BDS unusable */ > > Yes, this is the place. > >> If so: I thought that it was quite a bold move, but also that we'd find >> the SIGSEGVs sooner or later... :) > > In fact, if you use the block layer API, most functions check for > bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we > directly dereference the pointer without going through block.c (there's > a good reason for this, see the comment, but it still makes it somewhat > special). But why not call qcow2_co_readv directly? Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-11 14:04 ` Paolo Bonzini @ 2014-03-11 18:19 ` Max Reitz 0 siblings, 0 replies; 11+ messages in thread From: Max Reitz @ 2014-03-11 18:19 UTC (permalink / raw) To: Paolo Bonzini, Kevin Wolf, Laszlo Ersek; +Cc: qemu-devel, Stefan Hajnoczi On 11.03.2014 15:04, Paolo Bonzini wrote: > Il 11/03/2014 11:16, Kevin Wolf ha scritto: >> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben: >>> On 03/10/14 23:44, Max Reitz wrote: >>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(), >>>> copy_sectors() should check whether that pointer is indeed valid, >>>> since >>>> it may have been set to NULL by e.g. a concurrent write triggering the >>>> corruption prevention mechanism. >>>> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>> --- >>>> To be precise, this still is a race condition. If bs->drv is set to >>>> NULL >>>> after the check and before the call to bdrv_co_readv(), QEMU will >>>> obviously still crash. However, in order to circumvent this >>>> behavior, we >>>> would probably have to re-lock s->lock, check bs->drv, take the >>>> function >>>> pointer to bdrv_co_readv() and then unlock s->lock before the function >>>> is called. I found this rather ugly and therefore this still has a >>>> very >>>> small chance of running into a race condition. >>>> Therefore, I'm asking for your opinion on this, whether we can really >>>> take this chance or should rather "do it right". In fact, if I were a >>>> reviewer, I'd probably reject this patch and request the solution with >>>> the function pointer (if there is no better solution), but I was >>>> afraid >>>> to send such an ugly patch. >> >> No, the code is fine. Remember that qcow2 is not threaded, we're talking >> about coroutines here. There is no way for the code to yield between >> your check and the protected place. >> >>>> block/qcow2-cluster.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >>>> index 36c1bed..9499df9 100644 >>>> --- a/block/qcow2-cluster.c >>>> +++ b/block/qcow2-cluster.c >>>> @@ -380,6 +380,10 @@ static int coroutine_fn >>>> copy_sectors(BlockDriverState *bs, >>>> >>>> BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); >>>> >>>> + if (!bs->drv) { >>>> + return -ENOMEDIUM; >>>> + } >>>> + >>>> /* Call .bdrv_co_readv() directly instead of using the public >>>> block-layer >>>> * interface. This avoids double I/O throttling and request >>>> tracking, >>>> * which can lead to deadlock when block layer copy-on-read is >>>> enabled. >>>> >>> >>> I can't answer your question nor review this patch -- instead, I have a >>> question of my own: when you say "set to NULL by [...] the corruption >>> prevention mechanism", do you mean qcow2_pre_write_overlap_check(): >>> >>> bs->drv = NULL; /* make BDS unusable */ >> >> Yes, this is the place. >> >>> If so: I thought that it was quite a bold move, but also that we'd find >>> the SIGSEGVs sooner or later... :) >> >> In fact, if you use the block layer API, most functions check for >> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we >> directly dereference the pointer without going through block.c (there's >> a good reason for this, see the comment, but it still makes it somewhat >> special). > > But why not call qcow2_co_readv directly? qcow2_co_readv() would probably actually perform the operation, as far as I can see, since it does not check whether the medium is corrupt (this is only checked when opening the block device). This is the reason why bs->drv is set to NULL, so noone is able to read from the block device anymore, even without checking for corruption (additionally, on version 2 images, there is not even a way to indicate corruption other than setting bs->drv to NULL). Normally, every read operation would pass through bdrv_co_readv(), which will (have to) check whether bs->drv is valid or NULL - since this code decides to skip that and call the driver function directly, we therefore have to check bs->drv anyway, since it is the only way of telling us whether the image may actually be accessed right here. We could change the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people will probably ask why there is a bs->drv check right before. Max ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() 2014-03-11 10:16 ` Kevin Wolf 2014-03-11 14:04 ` Paolo Bonzini @ 2014-03-11 18:09 ` Max Reitz 1 sibling, 0 replies; 11+ messages in thread From: Max Reitz @ 2014-03-11 18:09 UTC (permalink / raw) To: Kevin Wolf, Laszlo Ersek; +Cc: qemu-devel, Stefan Hajnoczi On 11.03.2014 11:16, Kevin Wolf wrote: > Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben: >> On 03/10/14 23:44, Max Reitz wrote: >>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(), >>> copy_sectors() should check whether that pointer is indeed valid, since >>> it may have been set to NULL by e.g. a concurrent write triggering the >>> corruption prevention mechanism. >>> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> To be precise, this still is a race condition. If bs->drv is set to NULL >>> after the check and before the call to bdrv_co_readv(), QEMU will >>> obviously still crash. However, in order to circumvent this behavior, we >>> would probably have to re-lock s->lock, check bs->drv, take the function >>> pointer to bdrv_co_readv() and then unlock s->lock before the function >>> is called. I found this rather ugly and therefore this still has a very >>> small chance of running into a race condition. >>> Therefore, I'm asking for your opinion on this, whether we can really >>> take this chance or should rather "do it right". In fact, if I were a >>> reviewer, I'd probably reject this patch and request the solution with >>> the function pointer (if there is no better solution), but I was afraid >>> to send such an ugly patch. > No, the code is fine. Remember that qcow2 is not threaded, we're talking > about coroutines here. There is no way for the code to yield between > your check and the protected place. Ah, okay, that makes sense. Thank you. :-) Max >>> block/qcow2-cluster.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >>> index 36c1bed..9499df9 100644 >>> --- a/block/qcow2-cluster.c >>> +++ b/block/qcow2-cluster.c >>> @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs, >>> >>> BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); >>> >>> + if (!bs->drv) { >>> + return -ENOMEDIUM; >>> + } >>> + >>> /* Call .bdrv_co_readv() directly instead of using the public block-layer >>> * interface. This avoids double I/O throttling and request tracking, >>> * which can lead to deadlock when block layer copy-on-read is enabled. >>> >> I can't answer your question nor review this patch -- instead, I have a >> question of my own: when you say "set to NULL by [...] the corruption >> prevention mechanism", do you mean qcow2_pre_write_overlap_check(): >> >> bs->drv = NULL; /* make BDS unusable */ > Yes, this is the place. > >> If so: I thought that it was quite a bold move, but also that we'd find >> the SIGSEGVs sooner or later... :) > In fact, if you use the block layer API, most functions check for > bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we > directly dereference the pointer without going through block.c (there's > a good reason for this, see the comment, but it still makes it somewhat > special). > > Kevin ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH 2/3] block: bs->drv may be NULL in bdrv_debug_resume() 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz @ 2014-03-10 22:44 ` Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 3/3] iotests: Test corruption during COW request Max Reitz ` (2 subsequent siblings) 4 siblings, 0 replies; 11+ messages in thread From: Max Reitz @ 2014-03-10 22:44 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz Currently, bdrv_debug_resume() requires every bs->drv in the BDS stack to be NULL until a bs->drv with an implementation of bdrv_debug_resume() is found. For a normal function, this would be fine, but this is a function for debugging purposes and should therefore allow intermediate BDS not to have a driver (i.e., be "ejected"). Otherwise, it is hard to debug such situations. Signed-off-by: Max Reitz <mreitz@redhat.com> --- block.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block.c b/block.c index f1ef4b0..fb487f2 100644 --- a/block.c +++ b/block.c @@ -4055,7 +4055,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag) int bdrv_debug_resume(BlockDriverState *bs, const char *tag) { - while (bs && bs->drv && !bs->drv->bdrv_debug_resume) { + while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) { bs = bs->file; } -- 1.9.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH 3/3] iotests: Test corruption during COW request 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 2/3] block: bs->drv may be NULL in bdrv_debug_resume() Max Reitz @ 2014-03-10 22:44 ` Max Reitz 2014-03-11 10:21 ` [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Kevin Wolf 2014-03-11 13:13 ` Stefan Hajnoczi 4 siblings, 0 replies; 11+ messages in thread From: Max Reitz @ 2014-03-10 22:44 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz Extend test file 060 by a test case for corruption occuring concurrently to a COW request. QEMU should not crash but rather return an appropriate error message. Signed-off-by: Max Reitz <mreitz@redhat.com> --- tests/qemu-iotests/060 | 26 ++++++++++++++++++++++++++ tests/qemu-iotests/060.out | 15 +++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060 index af8ed9f..f0116aa 100755 --- a/tests/qemu-iotests/060 +++ b/tests/qemu-iotests/060 @@ -138,6 +138,32 @@ $QEMU_IMG snapshot -a foo "$TEST_IMG" _check_test_img $QEMU_IO -c "$OPEN_RO" -c "read -P 1 0 512" | _filter_qemu_io +echo +echo "=== Testing overlap while COW is in flight ===" +echo +# compat=0.10 is required in order to make the following discard actually +# unallocate the sector rather than make it a zero sector - we want COW, after +# all. +IMGOPTS='compat=0.10' _make_test_img 1G +# Write two clusters, the second one enforces creation of an L2 table after +# the first data cluster. +$QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io +# Discard the first cluster. This cluster will soon enough be reallocated and +# used for COW. +$QEMU_IO -c 'discard 0k 64k' "$TEST_IMG" | _filter_qemu_io +# Now, corrupt the image by marking the second L2 table cluster as free. +poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c +# Start a write operation requiring COW on the image stopping it right before +# doing the read; then, trigger the corruption prevention by writing anything to +# any unallocated cluster, leading to an attempt to overwrite the second L2 +# table. Finally, resume the COW write and see it fail (but not crash). +echo "open -o file.driver=blkdebug $TEST_IMG +break cow_read 0 +aio_write 0k 1k +wait_break 0 +write 64k 64k +resume 0" | $QEMU_IO | _filter_qemu_io + # success, all done echo "*** done" rm -f $seq.full diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out index 6c7bdbb..a517948 100644 --- a/tests/qemu-iotests/060.out +++ b/tests/qemu-iotests/060.out @@ -78,4 +78,19 @@ read 512/512 bytes at offset 0 No errors were found on the image. read 512/512 bytes at offset 0 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +=== Testing overlap while COW is in flight === + +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 +wrote 65536/65536 bytes at offset 0 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 65536/65536 bytes at offset 536870912 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +discard 65536/65536 bytes at offset 0 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +qcow2: Preventing invalid write on metadata (overlaps with active L2 table); image marked as corrupt. +blkdebug: Suspended request '0' +write failed: Input/output error +blkdebug: Resuming request '0' +aio_write failed: No medium found *** done -- 1.9.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz ` (2 preceding siblings ...) 2014-03-10 22:44 ` [Qemu-devel] [PATCH 3/3] iotests: Test corruption during COW request Max Reitz @ 2014-03-11 10:21 ` Kevin Wolf 2014-03-11 13:13 ` Stefan Hajnoczi 4 siblings, 0 replies; 11+ messages in thread From: Kevin Wolf @ 2014-03-11 10:21 UTC (permalink / raw) To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi Am 10.03.2014 um 23:44 hat Max Reitz geschrieben: > This series fixes a call to a NULL function pointer during > copy_sectors(), if a concurrent write request triggered the corruption > prevention mechanism. > > > Max Reitz (3): > qcow2: Check bs->drv in copy_sectors() > block: bs->drv may be NULL in bdrv_debug_resume() > iotests: Test corruption during COW request > > block.c | 2 +- > block/qcow2-cluster.c | 4 ++++ > tests/qemu-iotests/060 | 26 ++++++++++++++++++++++++++ > tests/qemu-iotests/060.out | 15 +++++++++++++++ > 4 files changed, 46 insertions(+), 1 deletion(-) Reviewed-by: Kevin Wolf <kwolf@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz ` (3 preceding siblings ...) 2014-03-11 10:21 ` [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Kevin Wolf @ 2014-03-11 13:13 ` Stefan Hajnoczi 4 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2014-03-11 13:13 UTC (permalink / raw) To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On Mon, Mar 10, 2014 at 11:44:06PM +0100, Max Reitz wrote: > This series fixes a call to a NULL function pointer during > copy_sectors(), if a concurrent write request triggered the corruption > prevention mechanism. > > > Max Reitz (3): > qcow2: Check bs->drv in copy_sectors() > block: bs->drv may be NULL in bdrv_debug_resume() > iotests: Test corruption during COW request > > block.c | 2 +- > block/qcow2-cluster.c | 4 ++++ > tests/qemu-iotests/060 | 26 ++++++++++++++++++++++++++ > tests/qemu-iotests/060.out | 15 +++++++++++++++ > 4 files changed, 46 insertions(+), 1 deletion(-) > > -- > 1.9.0 > > Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-03-11 18:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz 2014-03-10 23:16 ` Laszlo Ersek 2014-03-11 10:16 ` Kevin Wolf 2014-03-11 14:04 ` Paolo Bonzini 2014-03-11 18:19 ` Max Reitz 2014-03-11 18:09 ` Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 2/3] block: bs->drv may be NULL in bdrv_debug_resume() Max Reitz 2014-03-10 22:44 ` [Qemu-devel] [PATCH 3/3] iotests: Test corruption during COW request Max Reitz 2014-03-11 10:21 ` [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Kevin Wolf 2014-03-11 13:13 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).