* [Qemu-devel] [PATCH] blkdebug: fix "once" rule
@ 2015-02-05 23:15 John Snow
2015-02-05 23:43 ` Max Reitz
0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2015-02-05 23:15 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, John Snow, mreitz
The blkdebug scripts are currently engineered so that when a debug event
occurs, a prefilter browses a master list of parsed rules for a certain
event and adds them to an "active list" of rules to be used for the
forthcoming action, provided the events and state numbers match.
Then, once the request is received, the last active rule is used to inject
an error if certain parameters match.
This active list is cleared every time the prefilter injects a new rule
for the first time during a debug event.
The "once" rule currently causes the error injection, if it is triggered,
to only clear the active list. This is insufficient for preventing future
injections of the same rule.
This patch /deletes/ the rule from the list that the prefilter browses,
so it is gone for good.
Lastly, this impacts iotests 026. Several ENOSPC tests that used "once"
can be seen to have output that shows multiple failure messages. After
this patch, the error messages tend to be smaller and less severe, but
the injection can still be seen to be working.
Patch the expected output to expect the smaller error messages.
Signed-off-by: John Snow <jsnow@redhat.com>
---
block/blkdebug.c | 9 +++++----
tests/qemu-iotests/026.out | 24 ++++--------------------
2 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 9ce35cd..d30c44c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -473,14 +473,15 @@ static BlockAIOCB *inject_error(BlockDriverState *bs,
struct BlkdebugAIOCB *acb;
QEMUBH *bh;
- if (rule->options.inject.once) {
- QSIMPLEQ_INIT(&s->active_rules);
- }
-
if (rule->options.inject.immediately) {
return NULL;
}
+ if (rule->options.inject.once) {
+ QSIMPLEQ_INIT(&s->active_rules);
+ remove_rule(rule);
+ }
+
acb = qemu_aio_get(&blkdebug_aiocb_info, bs, cb, opaque);
acb->ret = -error;
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 524f7ee..5e964fb 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -140,19 +140,13 @@ No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 5; imm: off; once: on; write
-Failed to flush the L2 table cache: Input/output error
write failed: Input/output error
-
-127 leaked clusters were found on the image.
-This means waste of disk space, but no harm to data.
+No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 5; imm: off; once: on; write -b
-Failed to flush the L2 table cache: Input/output error
write failed: Input/output error
-
-127 leaked clusters were found on the image.
-This means waste of disk space, but no harm to data.
+No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 5; imm: off; once: off; write
@@ -174,19 +168,13 @@ This means waste of disk space, but no harm to data.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 28; imm: off; once: on; write
-Failed to flush the L2 table cache: No space left on device
write failed: No space left on device
-
-127 leaked clusters were found on the image.
-This means waste of disk space, but no harm to data.
+No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 28; imm: off; once: on; write -b
-Failed to flush the L2 table cache: No space left on device
write failed: No space left on device
-
-127 leaked clusters were found on the image.
-This means waste of disk space, but no harm to data.
+No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: l2_update; errno: 28; imm: off; once: off; write
@@ -356,13 +344,11 @@ No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: refblock_update_part; errno: 5; imm: off; once: on; write
-Failed to flush the refcount block cache: Input/output error
write failed: Input/output error
No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: refblock_update_part; errno: 5; imm: off; once: on; write -b
-Failed to flush the refcount block cache: Input/output error
write failed: Input/output error
No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
@@ -382,13 +368,11 @@ No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: refblock_update_part; errno: 28; imm: off; once: on; write
-Failed to flush the refcount block cache: No space left on device
write failed: No space left on device
No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
Event: refblock_update_part; errno: 28; imm: off; once: on; write -b
-Failed to flush the refcount block cache: No space left on device
write failed: No space left on device
No errors were found on the image.
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
--
1.9.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] blkdebug: fix "once" rule
2015-02-05 23:15 [Qemu-devel] [PATCH] blkdebug: fix "once" rule John Snow
@ 2015-02-05 23:43 ` Max Reitz
2015-02-06 0:28 ` John Snow
0 siblings, 1 reply; 4+ messages in thread
From: Max Reitz @ 2015-02-05 23:43 UTC (permalink / raw)
To: John Snow, qemu-devel; +Cc: kwolf, pbonzini
On 2015-02-05 at 18:15, John Snow wrote:
> The blkdebug scripts are currently engineered so that when a debug event
> occurs, a prefilter browses a master list of parsed rules for a certain
> event and adds them to an "active list" of rules to be used for the
> forthcoming action, provided the events and state numbers match.
>
> Then, once the request is received, the last active rule is used to inject
> an error if certain parameters match.
>
> This active list is cleared every time the prefilter injects a new rule
> for the first time during a debug event.
>
> The "once" rule currently causes the error injection, if it is triggered,
> to only clear the active list. This is insufficient for preventing future
> injections of the same rule.
>
> This patch /deletes/ the rule from the list that the prefilter browses,
> so it is gone for good.
>
> Lastly, this impacts iotests 026. Several ENOSPC tests that used "once"
> can be seen to have output that shows multiple failure messages. After
> this patch, the error messages tend to be smaller and less severe, but
> the injection can still be seen to be working.
>
> Patch the expected output to expect the smaller error messages.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
> block/blkdebug.c | 9 +++++----
> tests/qemu-iotests/026.out | 24 ++++--------------------
> 2 files changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 9ce35cd..d30c44c 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -473,14 +473,15 @@ static BlockAIOCB *inject_error(BlockDriverState *bs,
> struct BlkdebugAIOCB *acb;
> QEMUBH *bh;
>
> - if (rule->options.inject.once) {
> - QSIMPLEQ_INIT(&s->active_rules);
> - }
> -
> if (rule->options.inject.immediately) {
> return NULL;
> }
>
> + if (rule->options.inject.once) {
> + QSIMPLEQ_INIT(&s->active_rules);
This will still delete all other currently active rules, which I don't
think is what is intended.
Case in point why this is not really right:
$ ./qemu-img create -f qcow2 test.qcow2 64M
Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off
$ ./qemu-io -c 'aio_write 0 64k'
"json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio'},{'event':'write_aio','state':42,'once':true}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
aio_write failed: Input/output error
Failed to flush the L2 table cache: Input/output error
Failed to flush the refcount block cache: Input/output error
So, here we get three errors; the non-once rule stayed active.
Now let's just turn the rules around:
$ ./qemu-img create -f qcow2 test.qcow2 64M
Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off
$ ./qemu-io -c 'aio_write 0 64k'
"json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio','state':42,'once':true},{'event':'write_aio'}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
aio_write failed: Input/output error
Hmmm...
So what happens is that in the first case, the non-"once" rule had
precedence, so the list of active rules was not changed. In the second
case, however, the "once" rule had precedence, therefore all active
rules (including the non-"once" rule) were deleted.
I think it's fine that the "once" rule does not get deleted in the first
case, but whether the non-"once" rule stays active should not be
influenced by the "once" rule.
Note that this issue was not introduced by this patch, but I think this
patch should fix it. :-)
Max
> + remove_rule(rule);
> + }
> +
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] blkdebug: fix "once" rule
2015-02-05 23:43 ` Max Reitz
@ 2015-02-06 0:28 ` John Snow
2015-02-06 13:22 ` Max Reitz
0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2015-02-06 0:28 UTC (permalink / raw)
To: Max Reitz, qemu-devel; +Cc: kwolf, pbonzini
On 02/05/2015 06:43 PM, Max Reitz wrote:
> On 2015-02-05 at 18:15, John Snow wrote:
>> The blkdebug scripts are currently engineered so that when a debug event
>> occurs, a prefilter browses a master list of parsed rules for a certain
>> event and adds them to an "active list" of rules to be used for the
>> forthcoming action, provided the events and state numbers match.
>>
>> Then, once the request is received, the last active rule is used to
>> inject
>> an error if certain parameters match.
>>
>> This active list is cleared every time the prefilter injects a new rule
>> for the first time during a debug event.
>>
>> The "once" rule currently causes the error injection, if it is triggered,
>> to only clear the active list. This is insufficient for preventing future
>> injections of the same rule.
>>
>> This patch /deletes/ the rule from the list that the prefilter browses,
>> so it is gone for good.
>>
>> Lastly, this impacts iotests 026. Several ENOSPC tests that used "once"
>> can be seen to have output that shows multiple failure messages. After
>> this patch, the error messages tend to be smaller and less severe, but
>> the injection can still be seen to be working.
>>
>> Patch the expected output to expect the smaller error messages.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>> block/blkdebug.c | 9 +++++----
>> tests/qemu-iotests/026.out | 24 ++++--------------------
>> 2 files changed, 9 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index 9ce35cd..d30c44c 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -473,14 +473,15 @@ static BlockAIOCB *inject_error(BlockDriverState
>> *bs,
>> struct BlkdebugAIOCB *acb;
>> QEMUBH *bh;
>> - if (rule->options.inject.once) {
>> - QSIMPLEQ_INIT(&s->active_rules);
>> - }
>> -
>> if (rule->options.inject.immediately) {
>> return NULL;
>> }
>> + if (rule->options.inject.once) {
>> + QSIMPLEQ_INIT(&s->active_rules);
>
> This will still delete all other currently active rules, which I don't
> think is what is intended.
>
> Case in point why this is not really right:
>
> $ ./qemu-img create -f qcow2 test.qcow2 64M
> Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
> cluster_size=65536 lazy_refcounts=off
> $ ./qemu-io -c 'aio_write 0 64k'
> "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio'},{'event':'write_aio','state':42,'once':true}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
>
> aio_write failed: Input/output error
> Failed to flush the L2 table cache: Input/output error
> Failed to flush the refcount block cache: Input/output error
>
> So, here we get three errors; the non-once rule stayed active.
>
> Now let's just turn the rules around:
>
> $ ./qemu-img create -f qcow2 test.qcow2 64M
> Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
> cluster_size=65536 lazy_refcounts=off
> $ ./qemu-io -c 'aio_write 0 64k'
> "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio','state':42,'once':true},{'event':'write_aio'}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
>
> aio_write failed: Input/output error
>
> Hmmm...
>
> So what happens is that in the first case, the non-"once" rule had
> precedence, so the list of active rules was not changed. In the second
> case, however, the "once" rule had precedence, therefore all active
> rules (including the non-"once" rule) were deleted.
>
> I think it's fine that the "once" rule does not get deleted in the first
> case, but whether the non-"once" rule stays active should not be
> influenced by the "once" rule.
>
> Note that this issue was not introduced by this patch, but I think this
> patch should fix it. :-)
I was really trying to avoid touching how weird the blkdebug rules look
and just make my own test happier, but you've forced my hand, villain!
> Max
>
>> + remove_rule(rule);
>> + }
>> +
>
OK, this gets sort of weird, because we need to think about the
lifetimes of certain errors, and what we actually want to have happen.
Here's what your first script does:
blkdebug_debug_event -- [reftable_load]
blkdebug_debug_event -- [l2_load]
blkdebug_debug_event -- [cluster_alloc]
Setting new_state: 42
blkdebug_debug_event -- [refblock_alloc]
blkdebug_debug_event -- [refblock_load]
blkdebug_debug_event -- [write_aio]
Clearing active_rules ...
Adding a rule for [write_aio]
Adding a rule for [write_aio]
blkdebug_debug_event -- [pwritev]
An error is being injected for aio_writev.
blkdebug_debug_event -- [pwritev_done]
aio_write failed: Input/output error
blkdebug_debug_event -- [flush_to_os]
An error is being injected for aio_flush.
An error is being injected for aio_flush.
blkdebug_debug_event -- [refblock_update_part]
blkdebug_debug_event -- [pwritev]
An error is being injected for aio_writev.
blkdebug_debug_event -- [pwritev_done]
Failed to flush the L2 table cache: Input/output error
Failed to flush the refcount block cache: Input/output error
An error is being injected for aio_flush.
The prefilter clears active rules when the write_aio debug event comes
in, and adds the two write_aio rules.
It then injects an error after the pwritev event.
We then try to flush twice, but we still have that rule in our active
filters, so maddeningly this and all future calls to any of readv,
writev or flush will fail.
That's bonkers.
The second script does this:
blkdebug_debug_event -- [reftable_load]
blkdebug_debug_event -- [l2_load]
blkdebug_debug_event -- [cluster_alloc]
Setting new_state: 42
blkdebug_debug_event -- [refblock_alloc]
blkdebug_debug_event -- [refblock_load]
blkdebug_debug_event -- [write_aio]
Clearing active_rules ...
Adding a rule for [write_aio]
Adding a rule for [write_aio]
blkdebug_debug_event -- [pwritev]
An error is being injected for aio_writev.
run_once rule hit; clearing active list and run_once rule.
blkdebug_debug_event -- [pwritev_done]
aio_write failed: Input/output error
blkdebug_debug_event -- [flush_to_os]
blkdebug_debug_event -- [refblock_update_part]
blkdebug_debug_event -- [pwritev]
blkdebug_debug_event -- [pwritev_done]
blkdebug_debug_event -- [flush_to_disk]
Again, we clear the active rules only when a new injection is found
after a fresh debug event.
This time, though, on the first pwritev, we use the "once" error and
delete both it and its perennial cousin from the active list.
Then, no errors are encountered for the rest of the run because we
actually don't hit another write_aio event for the rest of execution.
So both outputs are behaving kind of poorly.
I think what we need is:
(1) To only delete the error actually injected from the active_rules
list if the once flag was set, and
(2) To decide when a sane point to clear the active list is, because
otherwise, the active_rules which gets populated for e.g. aio_write is
going to spill over into aio_flush if nothing stops it.
The only current "clear active" list point is when a new event comes in
for which we have an injection rule for. Otherwise they just persist.
I could clear all active requests on every event, but a lot of tests
that use "write_aio" to perhaps catch things that happen under "pwritev"
would begin to fail; but perhaps that's more correct/accurate anyway.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] blkdebug: fix "once" rule
2015-02-06 0:28 ` John Snow
@ 2015-02-06 13:22 ` Max Reitz
0 siblings, 0 replies; 4+ messages in thread
From: Max Reitz @ 2015-02-06 13:22 UTC (permalink / raw)
To: John Snow, qemu-devel; +Cc: kwolf, pbonzini
On 2015-02-05 at 19:28, John Snow wrote:
>
>
> On 02/05/2015 06:43 PM, Max Reitz wrote:
>> On 2015-02-05 at 18:15, John Snow wrote:
>>> The blkdebug scripts are currently engineered so that when a debug
>>> event
>>> occurs, a prefilter browses a master list of parsed rules for a certain
>>> event and adds them to an "active list" of rules to be used for the
>>> forthcoming action, provided the events and state numbers match.
>>>
>>> Then, once the request is received, the last active rule is used to
>>> inject
>>> an error if certain parameters match.
>>>
>>> This active list is cleared every time the prefilter injects a new rule
>>> for the first time during a debug event.
>>>
>>> The "once" rule currently causes the error injection, if it is
>>> triggered,
>>> to only clear the active list. This is insufficient for preventing
>>> future
>>> injections of the same rule.
>>>
>>> This patch /deletes/ the rule from the list that the prefilter browses,
>>> so it is gone for good.
>>>
>>> Lastly, this impacts iotests 026. Several ENOSPC tests that used "once"
>>> can be seen to have output that shows multiple failure messages. After
>>> this patch, the error messages tend to be smaller and less severe, but
>>> the injection can still be seen to be working.
>>>
>>> Patch the expected output to expect the smaller error messages.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>> block/blkdebug.c | 9 +++++----
>>> tests/qemu-iotests/026.out | 24 ++++--------------------
>>> 2 files changed, 9 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>>> index 9ce35cd..d30c44c 100644
>>> --- a/block/blkdebug.c
>>> +++ b/block/blkdebug.c
>>> @@ -473,14 +473,15 @@ static BlockAIOCB *inject_error(BlockDriverState
>>> *bs,
>>> struct BlkdebugAIOCB *acb;
>>> QEMUBH *bh;
>>> - if (rule->options.inject.once) {
>>> - QSIMPLEQ_INIT(&s->active_rules);
>>> - }
>>> -
>>> if (rule->options.inject.immediately) {
>>> return NULL;
>>> }
>>> + if (rule->options.inject.once) {
>>> + QSIMPLEQ_INIT(&s->active_rules);
>>
>> This will still delete all other currently active rules, which I don't
>> think is what is intended.
>>
>> Case in point why this is not really right:
>>
>> $ ./qemu-img create -f qcow2 test.qcow2 64M
>> Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
>> cluster_size=65536 lazy_refcounts=off
>> $ ./qemu-io -c 'aio_write 0 64k'
>> "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio'},{'event':'write_aio','state':42,'once':true}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
>>
>>
>> aio_write failed: Input/output error
>> Failed to flush the L2 table cache: Input/output error
>> Failed to flush the refcount block cache: Input/output error
>>
>> So, here we get three errors; the non-once rule stayed active.
>>
>> Now let's just turn the rules around:
>>
>> $ ./qemu-img create -f qcow2 test.qcow2 64M
>> Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off
>> cluster_size=65536 lazy_refcounts=off
>> $ ./qemu-io -c 'aio_write 0 64k'
>> "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio','state':42,'once':true},{'event':'write_aio'}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}"
>>
>>
>> aio_write failed: Input/output error
>>
>> Hmmm...
>>
>> So what happens is that in the first case, the non-"once" rule had
>> precedence, so the list of active rules was not changed. In the second
>> case, however, the "once" rule had precedence, therefore all active
>> rules (including the non-"once" rule) were deleted.
>>
>> I think it's fine that the "once" rule does not get deleted in the first
>> case, but whether the non-"once" rule stays active should not be
>> influenced by the "once" rule.
>>
>> Note that this issue was not introduced by this patch, but I think this
>> patch should fix it. :-)
>
> I was really trying to avoid touching how weird the blkdebug rules
> look and just make my own test happier, but you've forced my hand,
> villain!
>
>> Max
>>
>>> + remove_rule(rule);
>>> + }
>>> +
>>
>
> OK, this gets sort of weird, because we need to think about the
> lifetimes of certain errors, and what we actually want to have happen.
>
> Here's what your first script does:
>
> blkdebug_debug_event -- [reftable_load]
> blkdebug_debug_event -- [l2_load]
> blkdebug_debug_event -- [cluster_alloc]
> Setting new_state: 42
> blkdebug_debug_event -- [refblock_alloc]
> blkdebug_debug_event -- [refblock_load]
> blkdebug_debug_event -- [write_aio]
> Clearing active_rules ...
> Adding a rule for [write_aio]
> Adding a rule for [write_aio]
> blkdebug_debug_event -- [pwritev]
> An error is being injected for aio_writev.
> blkdebug_debug_event -- [pwritev_done]
> aio_write failed: Input/output error
> blkdebug_debug_event -- [flush_to_os]
> An error is being injected for aio_flush.
> An error is being injected for aio_flush.
> blkdebug_debug_event -- [refblock_update_part]
> blkdebug_debug_event -- [pwritev]
> An error is being injected for aio_writev.
> blkdebug_debug_event -- [pwritev_done]
> Failed to flush the L2 table cache: Input/output error
> Failed to flush the refcount block cache: Input/output error
> An error is being injected for aio_flush.
>
>
> The prefilter clears active rules when the write_aio debug event comes
> in, and adds the two write_aio rules.
>
> It then injects an error after the pwritev event.
> We then try to flush twice, but we still have that rule in our active
> filters, so maddeningly this and all future calls to any of readv,
> writev or flush will fail.
>
> That's bonkers.
>
>
> The second script does this:
>
> blkdebug_debug_event -- [reftable_load]
> blkdebug_debug_event -- [l2_load]
> blkdebug_debug_event -- [cluster_alloc]
> Setting new_state: 42
> blkdebug_debug_event -- [refblock_alloc]
> blkdebug_debug_event -- [refblock_load]
> blkdebug_debug_event -- [write_aio]
> Clearing active_rules ...
> Adding a rule for [write_aio]
> Adding a rule for [write_aio]
> blkdebug_debug_event -- [pwritev]
> An error is being injected for aio_writev.
> run_once rule hit; clearing active list and run_once rule.
> blkdebug_debug_event -- [pwritev_done]
> aio_write failed: Input/output error
> blkdebug_debug_event -- [flush_to_os]
> blkdebug_debug_event -- [refblock_update_part]
> blkdebug_debug_event -- [pwritev]
> blkdebug_debug_event -- [pwritev_done]
> blkdebug_debug_event -- [flush_to_disk]
>
> Again, we clear the active rules only when a new injection is found
> after a fresh debug event.
> This time, though, on the first pwritev, we use the "once" error and
> delete both it and its perennial cousin from the active list.
>
> Then, no errors are encountered for the rest of the run because we
> actually don't hit another write_aio event for the rest of execution.
>
> So both outputs are behaving kind of poorly.
>
>
>
>
> I think what we need is:
>
> (1) To only delete the error actually injected from the active_rules
> list if the once flag was set, and
Right.
> (2) To decide when a sane point to clear the active list is, because
> otherwise, the active_rules which gets populated for e.g. aio_write is
> going to spill over into aio_flush if nothing stops it.
I don't think we need to clear it. If "once" isn't set, the rule just
stays active after the event. If you want a rule to be deactivated at a
certain point in time, you set its "state" to 1 and then add a set-state
entry for transitioning to a different state when you want it to be
deactivated.
The idea was, as far as I know, that once a non-"once" rule is becoming
active, all subsequent requests fail (like if your drive just crashed).
> The only current "clear active" list point is when a new event comes
> in for which we have an injection rule for. Otherwise they just persist.
Hm, you're right. That opposes my previous statement. I have no idea why
the rules get deleted if a new event comes in (and only if there are
rules for that event!).
I can only imagine that the assumption is that you only want to deal
with events that you defined errors for. If there are other events
(imagine a new event is added to the qcow2 code), you just don't want to
deal with them at all which is why the rules remain active in that case;
but since you obviously do know about the events you have errors for,
it's fine to scratch the list of active rules if they occur.
So, since I have no idea how this is meant to be, I'd just leave it as
it is now, that is, let the active rules spill over into "unknown"
events (events without any error specified for them); and just do (1).
Max
> I could clear all active requests on every event, but a lot of tests
> that use "write_aio" to perhaps catch things that happen under
> "pwritev" would begin to fail; but perhaps that's more
> correct/accurate anyway.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-02-06 13:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-05 23:15 [Qemu-devel] [PATCH] blkdebug: fix "once" rule John Snow
2015-02-05 23:43 ` Max Reitz
2015-02-06 0:28 ` John Snow
2015-02-06 13:22 ` Max Reitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).