qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* qemu iotest 161 and make check
@ 2022-02-10  7:57 Christian Borntraeger
  2022-02-10 14:47 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2022-02-10  7:57 UTC (permalink / raw)
  To: qemu-devel, qemu block, qemu-s390x

Hello,

I do see spurious failures of 161 in our CI, but only when I use
make check with parallelism (-j).
I have not yet figured out which other testcase could interfere

@@ -34,6 +34,8 @@
  *** Commit and then change an option on the backing file

  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
+Is another process using the image [TEST_DIR/t.IMGFMT.base]?
  Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
  { 'execute': 'qmp_capabilities' }


any ideas?

Christian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10  7:57 qemu iotest 161 and make check Christian Borntraeger
@ 2022-02-10 14:47 ` Vladimir Sementsov-Ogievskiy
  2022-02-10 14:51   ` Christian Borntraeger
  2022-02-14  9:08   ` Christian Borntraeger
  0 siblings, 2 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-10 14:47 UTC (permalink / raw)
  To: Christian Borntraeger, qemu-devel, qemu block, qemu-s390x

10.02.2022 10:57, Christian Borntraeger wrote:
> Hello,
> 
> I do see spurious failures of 161 in our CI, but only when I use
> make check with parallelism (-j).
> I have not yet figured out which other testcase could interfere
> 
> @@ -34,6 +34,8 @@
>   *** Commit and then change an option on the backing file
> 
>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>   { 'execute': 'qmp_capabilities' }
> 
> 
> any ideas?
> 

Hmm, interesting.. Is it always 161 and always exactly this diff?

First, this place in 161 is usual: we just create and image, like in many other tests.

Second, why _make_test_img trigger "Failed to get write lock"? It should just create an image. Hmm. And probably starts QSD if protocol is fuse. So, that start of QSD may probably fail.. Is that the case? What is image format and protocol used in test run?

But anyway, tests running in parallel should not break each other as each test has own TEST_DIR and SOCK_DIR..

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10 14:47 ` Vladimir Sementsov-Ogievskiy
@ 2022-02-10 14:51   ` Christian Borntraeger
  2022-02-10 17:13     ` Thomas Huth
  2022-02-14  9:08   ` Christian Borntraeger
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2022-02-10 14:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu block, qemu-s390x



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
> 10.02.2022 10:57, Christian Borntraeger wrote:
>> Hello,
>>
>> I do see spurious failures of 161 in our CI, but only when I use
>> make check with parallelism (-j).
>> I have not yet figured out which other testcase could interfere
>>
>> @@ -34,6 +34,8 @@
>>   *** Commit and then change an option on the backing file
>>
>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>>   { 'execute': 'qmp_capabilities' }
>>
>>
>> any ideas?
>>
> 
> Hmm, interesting.. Is it always 161 and always exactly this diff?

Its always 161 and only 161. I would need to check if its always the same error.

> 
> First, this place in 161 is usual: we just create and image, like in many other tests.
> 
> Second, why _make_test_img trigger "Failed to get write lock"? It should just create an image. Hmm. And probably starts QSD if protocol is fuse. So, that start of QSD may probably fail.. Is that the case? What is image format and protocol used in test run?
> 
> But anyway, tests running in parallel should not break each other as each test has own TEST_DIR and SOCK_DIR..
  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10 14:51   ` Christian Borntraeger
@ 2022-02-10 17:13     ` Thomas Huth
  2022-02-10 17:44       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Huth @ 2022-02-10 17:13 UTC (permalink / raw)
  To: Christian Borntraeger, Vladimir Sementsov-Ogievskiy, qemu-devel,
	qemu block, qemu-s390x

On 10/02/2022 15.51, Christian Borntraeger wrote:
> 
> 
> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>> Hello,
>>>
>>> I do see spurious failures of 161 in our CI, but only when I use
>>> make check with parallelism (-j).
>>> I have not yet figured out which other testcase could interfere
>>>
>>> @@ -34,6 +34,8 @@
>>>   *** Commit and then change an option on the backing file
>>>
>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>>> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>>>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 
>>> backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 
>>> backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>>>   { 'execute': 'qmp_capabilities' }
>>>
>>>
>>> any ideas?
>>>
>>
>> Hmm, interesting.. Is it always 161 and always exactly this diff?
> 
> Its always 161 and only 161. I would need to check if its always the same 
> error.
> 
>>
>> First, this place in 161 is usual: we just create and image, like in many 
>> other tests.
>>
>> Second, why _make_test_img trigger "Failed to get write lock"? It should 
>> just create an image. Hmm. And probably starts QSD if protocol is fuse. 
>> So, that start of QSD may probably fail.. Is that the case? What is image 
>> format and protocol used in test run?
>>
>> But anyway, tests running in parallel should not break each other as each 
>> test has own TEST_DIR and SOCK_DIR..

Unless you run into the issue that Hanna described here:

  https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg01735.html

  Thomas




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10 17:13     ` Thomas Huth
@ 2022-02-10 17:44       ` Vladimir Sementsov-Ogievskiy
  2022-02-21 10:27         ` Christian Borntraeger
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-10 17:44 UTC (permalink / raw)
  To: Thomas Huth, Christian Borntraeger, qemu-devel, qemu block,
	qemu-s390x

10.02.2022 20:13, Thomas Huth wrote:
> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>
>>
>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>> Hello,
>>>>
>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>> make check with parallelism (-j).
>>>> I have not yet figured out which other testcase could interfere
>>>>
>>>> @@ -34,6 +34,8 @@
>>>>   *** Commit and then change an option on the backing file
>>>>
>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>>>> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>>>>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>>>>   { 'execute': 'qmp_capabilities' }
>>>>
>>>>
>>>> any ideas?
>>>>
>>>
>>> Hmm, interesting.. Is it always 161 and always exactly this diff?
>>
>> Its always 161 and only 161. I would need to check if its always the same error.
>>
>>>
>>> First, this place in 161 is usual: we just create and image, like in many other tests.
>>>
>>> Second, why _make_test_img trigger "Failed to get write lock"? It should just create an image. Hmm. And probably starts QSD if protocol is fuse. So, that start of QSD may probably fail.. Is that the case? What is image format and protocol used in test run?
>>>
>>> But anyway, tests running in parallel should not break each other as each test has own TEST_DIR and SOCK_DIR..
> 
> Unless you run into the issue that Hanna described here:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg01735.html
> 

Yes, we can't execute same test several times (for different formats) in parallel.. But that's about any test, not only 161.

And I don't think that it's currently possible that we run same test in parallel several times somewhere, do we? In tests/check-block.sh we have a sequential loop through $format_list ..

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10 14:47 ` Vladimir Sementsov-Ogievskiy
  2022-02-10 14:51   ` Christian Borntraeger
@ 2022-02-14  9:08   ` Christian Borntraeger
  1 sibling, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2022-02-14  9:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu block, qemu-s390x



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
> 10.02.2022 10:57, Christian Borntraeger wrote:
>> Hello,
>>
>> I do see spurious failures of 161 in our CI, but only when I use
>> make check with parallelism (-j).
>> I have not yet figured out which other testcase could interfere
>>
>> @@ -34,6 +34,8 @@
>>   *** Commit and then change an option on the backing file
>>
>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>>   { 'execute': 'qmp_capabilities' }
>>
>>
>> any ideas?
>>
> 
> Hmm, interesting.. Is it always 161 and always exactly this diff?

Seems to be always this diff.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-10 17:44       ` Vladimir Sementsov-Ogievskiy
@ 2022-02-21 10:27         ` Christian Borntraeger
  2022-03-31  7:44           ` Christian Borntraeger
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2022-02-21 10:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Thomas Huth, qemu-devel, qemu block,
	qemu-s390x, Paolo Bonzini


Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:
> 10.02.2022 20:13, Thomas Huth wrote:
>> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>>
>>>
>>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>>> Hello,
>>>>>
>>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>>> make check with parallelism (-j).
>>>>> I have not yet figured out which other testcase could interfere
>>>>>
>>>>> @@ -34,6 +34,8 @@
>>>>>   *** Commit and then change an option on the backing file
>>>>>
>>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>>>>> +Is another process using the image [TEST_DIR/t.IMGFMT.base]?
>>>>>   Formatting 'TEST_DIR/t.IMGFMT.int', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>>>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.int backing_fmt=IMGFMT
>>>>>   { 'execute': 'qmp_capabilities' }
>>>>>
>>>>>
>>>>> any ideas?
>>>>>
>>>>
>>>> Hmm, interesting.. Is it always 161 and always exactly this diff?
>>>
>>> Its always 161 and only 161. I would need to check if its always the same error.
>>>
>>>>
>>>> First, this place in 161 is usual: we just create and image, like in many other tests.
>>>>
>>>> Second, why _make_test_img trigger "Failed to get write lock"? It should just create an image. Hmm. And probably starts QSD if protocol is fuse. So, that start of QSD may probably fail.. Is that the case? What is image format and protocol used in test run?
>>>>
>>>> But anyway, tests running in parallel should not break each other as each test has own TEST_DIR and SOCK_DIR..
>>
>> Unless you run into the issue that Hanna described here:
>>
>>   https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg01735.html
>>
> 
> Yes, we can't execute same test several times (for different formats) in parallel.. But that's about any test, not only 161.
> 
> And I don't think that it's currently possible that we run same test in parallel several times somewhere, do we? In tests/check-block.sh we have a sequential loop through $format_list ..

FWIW, I was able to bisect this and it came in with

bcda7b178fde7797f476e3b066fe5fc76bfa1c43 is the first bad commit
commit bcda7b178fde7797f476e3b066fe5fc76bfa1c43
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Thu Dec 23 19:39:33 2021 +0100

     check-block.sh: passthrough -jN flag of make to -j N flag of check
     
     This improves performance of running iotests during "make -jN check".
     
     Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
     Message-Id: <20211223183933.1497037-1-vsementsov@virtuozzo.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

  tests/check-block.sh | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)



With

make check-block -j 100

it reproduced pretty quickly for me.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-02-21 10:27         ` Christian Borntraeger
@ 2022-03-31  7:44           ` Christian Borntraeger
  2022-03-31  8:25             ` Christian Borntraeger
  2022-03-31  9:59             ` Li Zhang
  0 siblings, 2 replies; 12+ messages in thread
From: Christian Borntraeger @ 2022-03-31  7:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Thomas Huth, qemu-devel, qemu block,
	qemu-s390x, Paolo Bonzini



Am 21.02.22 um 11:27 schrieb Christian Borntraeger:
> 
> Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:
>> 10.02.2022 20:13, Thomas Huth wrote:
>>> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>>>
>>>>
>>>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>>>> make check with parallelism (-j).
>>>>>> I have not yet figured out which other testcase could interfere
>>>>>>
>>>>>> @@ -34,6 +34,8 @@
>>>>>>   *** Commit and then change an option on the backing file
>>>>>>
>>>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock

FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
and raw_check_lock_bytes spits this error.


Is this just some overload situation that we do not recover because we do not handle EAGAIN any special.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-03-31  7:44           ` Christian Borntraeger
@ 2022-03-31  8:25             ` Christian Borntraeger
  2022-10-27  5:54               ` Christian Borntraeger
  2022-03-31  9:59             ` Li Zhang
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2022-03-31  8:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Thomas Huth, qemu-devel, qemu block,
	qemu-s390x, Paolo Bonzini



Am 31.03.22 um 09:44 schrieb Christian Borntraeger:
> 
> 
> Am 21.02.22 um 11:27 schrieb Christian Borntraeger:
>>
>> Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:
>>> 10.02.2022 20:13, Thomas Huth wrote:
>>>> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>>>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>>>>> make check with parallelism (-j).
>>>>>>> I have not yet figured out which other testcase could interfere
>>>>>>>
>>>>>>> @@ -34,6 +34,8 @@
>>>>>>>   *** Commit and then change an option on the backing file
>>>>>>>
>>>>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
> 
> FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
> and raw_check_lock_bytes spits this error.


And its coming from here (ret is 0)

int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
{
     int ret;
     struct flock fl = {
         .l_whence = SEEK_SET,
         .l_start  = start,
         .l_len    = len,
         .l_type   = exclusive ? F_WRLCK : F_RDLCK,
     };
     qemu_probe_lock_ops();
     ret = fcntl(fd, fcntl_op_getlk, &fl);
     if (ret == -1) {
         return -errno;
     } else {
----->        return fl.l_type == F_UNLCK ? 0 : -EAGAIN;
     }
}

> 
> 
> Is this just some overload situation that we do not recover because we do not handle EAGAIN any special.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-03-31  7:44           ` Christian Borntraeger
  2022-03-31  8:25             ` Christian Borntraeger
@ 2022-03-31  9:59             ` Li Zhang
  1 sibling, 0 replies; 12+ messages in thread
From: Li Zhang @ 2022-03-31  9:59 UTC (permalink / raw)
  To: Christian Borntraeger, Vladimir Sementsov-Ogievskiy, Thomas Huth,
	qemu-devel, qemu block, qemu-s390x, Paolo Bonzini

On 3/31/22 09:44, Christian Borntraeger wrote:
> 
> 
> Am 21.02.22 um 11:27 schrieb Christian Borntraeger:
>>
>> Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:
>>> 10.02.2022 20:13, Thomas Huth wrote:
>>>> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>>>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>>>>> make check with parallelism (-j).
>>>>>>> I have not yet figured out which other testcase could interfere
>>>>>>>
>>>>>>> @@ -34,6 +34,8 @@
>>>>>>>   *** Commit and then change an option on the backing file
>>>>>>>
>>>>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
> 
> FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
> and raw_check_lock_bytes spits this error.
>

I also run into this issue on S390 when running test cases.
I think it will report this "write" lock error if different processes 
are using the same image.

> 
> Is this just some overload situation that we do not recover because we 
> do not handle EAGAIN any special.
> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-03-31  8:25             ` Christian Borntraeger
@ 2022-10-27  5:54               ` Christian Borntraeger
  2022-12-05 13:49                 ` Christian Borntraeger
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2022-10-27  5:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Thomas Huth, qemu-devel, qemu block,
	qemu-s390x, Paolo Bonzini



Am 31.03.22 um 10:25 schrieb Christian Borntraeger:
> 
> 
> Am 31.03.22 um 09:44 schrieb Christian Borntraeger:
>>
>>
>> Am 21.02.22 um 11:27 schrieb Christian Borntraeger:
>>>
>>> Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:
>>>> 10.02.2022 20:13, Thomas Huth wrote:
>>>>> On 10/02/2022 15.51, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:
>>>>>>> 10.02.2022 10:57, Christian Borntraeger wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I do see spurious failures of 161 in our CI, but only when I use
>>>>>>>> make check with parallelism (-j).
>>>>>>>> I have not yet figured out which other testcase could interfere
>>>>>>>>
>>>>>>>> @@ -34,6 +34,8 @@
>>>>>>>>   *** Commit and then change an option on the backing file
>>>>>>>>
>>>>>>>>   Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
>>>>>>>> +qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock
>>
>> FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
>> and raw_check_lock_bytes spits this error.
> 
> 
> And its coming from here (ret is 0)
> 
> int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
> {
>      int ret;
>      struct flock fl = {
>          .l_whence = SEEK_SET,
>          .l_start  = start,
>          .l_len    = len,
>          .l_type   = exclusive ? F_WRLCK : F_RDLCK,
>      };
>      qemu_probe_lock_ops();
>      ret = fcntl(fd, fcntl_op_getlk, &fl);
>      if (ret == -1) {
>          return -errno;
>      } else {
> ----->        return fl.l_type == F_UNLCK ? 0 : -EAGAIN;
>      }
> }
> 
>>
>>
>> Is this just some overload situation that we do not recover because we do not handle EAGAIN any special.

Restarted my investigation. Looks like the file lock from qemu is not fully cleaned up when the process is gone.
Something like
diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0f1fecc68e..b28a6c187c 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -403,4 +403,5 @@ _cleanup_qemu()
          unset QEMU_IN[$i]
          unset QEMU_OUT[$i]
      done
+    sleep 0.5
  }


makes the problem go away.

Looks like we do use the OFD variant of the file lock, so any clone, fork etc will keep the lock.

So I tested the following:

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0f1fecc68e..01bdb05575 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -388,7 +388,7 @@ _cleanup_qemu()
                  kill -KILL ${QEMU_PID} 2>/dev/null
              fi
              if [ -n "${QEMU_PID}" ]; then
-                wait ${QEMU_PID} 2>/dev/null # silent kill
+                wait 2>/dev/null # silent kill
              fi
          fi


And this also helps. Still trying to find out what clone/fork happens here.


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: qemu iotest 161 and make check
  2022-10-27  5:54               ` Christian Borntraeger
@ 2022-12-05 13:49                 ` Christian Borntraeger
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2022-12-05 13:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Thomas Huth, qemu-devel, qemu block,
	qemu-s390x, Paolo Bonzini



Am 27.10.22 um 07:54 schrieb Christian Borntraeger:
[...]
> diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
> index 0f1fecc68e..01bdb05575 100644
> --- a/tests/qemu-iotests/common.qemu
> +++ b/tests/qemu-iotests/common.qemu
> @@ -388,7 +388,7 @@ _cleanup_qemu()
>                   kill -KILL ${QEMU_PID} 2>/dev/null
>               fi
>               if [ -n "${QEMU_PID}" ]; then
> -                wait ${QEMU_PID} 2>/dev/null # silent kill
> +                wait 2>/dev/null # silent kill
>               fi
>           fi
> 
> 
> And this also helps. Still trying to find out what clone/fork happens here.

As a new information, the problem only exists on Ubuntu,
I cannot reproduce it with Fedora or RHEL. I also changed
the kernel, its not the reason. As soon as I add tracing
the different timing also makes the problem go away.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-12-05 13:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-10  7:57 qemu iotest 161 and make check Christian Borntraeger
2022-02-10 14:47 ` Vladimir Sementsov-Ogievskiy
2022-02-10 14:51   ` Christian Borntraeger
2022-02-10 17:13     ` Thomas Huth
2022-02-10 17:44       ` Vladimir Sementsov-Ogievskiy
2022-02-21 10:27         ` Christian Borntraeger
2022-03-31  7:44           ` Christian Borntraeger
2022-03-31  8:25             ` Christian Borntraeger
2022-10-27  5:54               ` Christian Borntraeger
2022-12-05 13:49                 ` Christian Borntraeger
2022-03-31  9:59             ` Li Zhang
2022-02-14  9:08   ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).