* [Qemu-devel] vhost-user-test failure
@ 2016-09-23 15:36 Eduardo Habkost
2016-09-23 15:41 ` Michael S. Tsirkin
0 siblings, 1 reply; 8+ messages in thread
From: Eduardo Habkost @ 2016-09-23 15:36 UTC (permalink / raw)
To: qemu-devel, Michael S. Tsirkin
Hi,
I hit a weird vhost-user-test failure on travis-ci recently, on a
branch where I didn't touch any vhost-related code. From a quick
look at the code, it looks like the vhost-user code is unhappy to
see a disconnected socket.
I wasn't able to reproduce it. It seems to be a hard to reproduce
race between vhost-user code and socket reconnection.
The failure can be seen at:
https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
Error output:
**
ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly
qemu-system-i386: Failed to set msg fds.
qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
qemu-system-i386: Failed to set msg fds.
qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243
make: *** [check-qtest-i386] Error 1
--
Eduardo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-23 15:36 [Qemu-devel] vhost-user-test failure Eduardo Habkost
@ 2016-09-23 15:41 ` Michael S. Tsirkin
2016-09-23 17:40 ` Maxime Coquelin
0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2016-09-23 15:41 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, maxime.coquelin
On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
> Hi,
>
> I hit a weird vhost-user-test failure on travis-ci recently, on a
> branch where I didn't touch any vhost-related code. From a quick
> look at the code, it looks like the vhost-user code is unhappy to
> see a disconnected socket.
>
> I wasn't able to reproduce it. It seems to be a hard to reproduce
> race between vhost-user code and socket reconnection.
>
> The failure can be seen at:
>
> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
Maxime looked at something similiar. Any idea?
> Error output:
>
> **
> ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly
> qemu-system-i386: Failed to set msg fds.
> qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> qemu-system-i386: Failed to set msg fds.
> qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243
> make: *** [check-qtest-i386] Error 1
>
> --
> Eduardo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-23 15:41 ` Michael S. Tsirkin
@ 2016-09-23 17:40 ` Maxime Coquelin
2016-09-24 17:42 ` Maxime Coquelin
0 siblings, 1 reply; 8+ messages in thread
From: Maxime Coquelin @ 2016-09-23 17:40 UTC (permalink / raw)
To: Michael S. Tsirkin, Eduardo Habkost; +Cc: qemu-devel
On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
>> Hi,
>>
>> I hit a weird vhost-user-test failure on travis-ci recently, on a
>> branch where I didn't touch any vhost-related code. From a quick
>> look at the code, it looks like the vhost-user code is unhappy to
>> see a disconnected socket.
>>
>> I wasn't able to reproduce it. It seems to be a hard to reproduce
>> race between vhost-user code and socket reconnection.
>>
>> The failure can be seen at:
>>
>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
>
> Maxime looked at something similiar. Any idea?
No, not really.
Marc-André contributed a lot to these tests, I add him in cc: in case
he has an idea.
I will have a look in the mean time.
Maxime
>
>> Error output:
>>
>> **
>> ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly
>> qemu-system-i386: Failed to set msg fds.
>> qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
>> qemu-system-i386: Failed to set msg fds.
>> qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>> GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243
>> make: *** [check-qtest-i386] Error 1
>>
>> --
>> Eduardo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-23 17:40 ` Maxime Coquelin
@ 2016-09-24 17:42 ` Maxime Coquelin
2016-09-25 20:55 ` Marc-André Lureau
0 siblings, 1 reply; 8+ messages in thread
From: Maxime Coquelin @ 2016-09-24 17:42 UTC (permalink / raw)
To: Michael S. Tsirkin, Eduardo Habkost; +Cc: qemu-devel, Marc-André Lureau
This time with Marc-André in cc:...
On 09/23/2016 07:40 PM, Maxime Coquelin wrote:
>
>
> On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
>> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
>>> Hi,
>>>
>>> I hit a weird vhost-user-test failure on travis-ci recently, on a
>>> branch where I didn't touch any vhost-related code. From a quick
>>> look at the code, it looks like the vhost-user code is unhappy to
>>> see a disconnected socket.
>>>
>>> I wasn't able to reproduce it. It seems to be a hard to reproduce
>>> race between vhost-user code and socket reconnection.
>>>
>>> The failure can be seen at:
>>>
>>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
>>
>> Maxime looked at something similiar. Any idea?
> No, not really.
> Marc-André contributed a lot to these tests, I add him in cc: in case
> he has an idea.
>
> I will have a look in the mean time.
>
> Maxime
>
>>
>>> Error output:
>>>
>>> **
>>> ERROR:tests/vhost-user-test.c:715:test_reconnect: child process
>>> (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly
>>> qemu-system-i386: Failed to set msg fds.
>>> qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource
>>> temporarily unavailable (11)
>>> qemu-system-i386: Failed to set msg fds.
>>> qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource
>>> temporarily unavailable (11)
>>> GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243
>>> make: *** [check-qtest-i386] Error 1
>>>
>>> --
>>> Eduardo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-24 17:42 ` Maxime Coquelin
@ 2016-09-25 20:55 ` Marc-André Lureau
2016-09-26 12:13 ` Eduardo Habkost
0 siblings, 1 reply; 8+ messages in thread
From: Marc-André Lureau @ 2016-09-25 20:55 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau,
Maxime Coquelin
Hi
----- Original Message -----
> This time with Marc-André in cc:...
>
> On 09/23/2016 07:40 PM, Maxime Coquelin wrote:
> >
> >
> > On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
> >> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
> >>> Hi,
> >>>
> >>> I hit a weird vhost-user-test failure on travis-ci recently, on a
> >>> branch where I didn't touch any vhost-related code. From a quick
> >>> look at the code, it looks like the vhost-user code is unhappy to
> >>> see a disconnected socket.
> >>>
> >>> I wasn't able to reproduce it. It seems to be a hard to reproduce
> >>> race between vhost-user code and socket reconnection.
> >>>
> >>> The failure can be seen at:
> >>>
> >>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
> >>
> >> Maxime looked at something similiar. Any idea?
> > No, not really.
> > Marc-André contributed a lot to these tests, I add him in cc: in case
> > he has an idea.
> >
> > I will have a look in the mean time.
> >
I am unable to reproduce locally (over 500x iterations), and I have no clue what's going on: the warnings there aren't the problem (that's the main reason why we use the subprocess, to silence those). Do you have a local reproducer or is it only on travis? Afaik, there are no other reports of this test failing, are you sure its not related to changes on your branch?
thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-25 20:55 ` Marc-André Lureau
@ 2016-09-26 12:13 ` Eduardo Habkost
2016-09-26 12:52 ` Maxime Coquelin
0 siblings, 1 reply; 8+ messages in thread
From: Eduardo Habkost @ 2016-09-26 12:13 UTC (permalink / raw)
To: Marc-André Lureau
Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau,
Maxime Coquelin
On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote:
> Hi
>
> ----- Original Message -----
> > This time with Marc-André in cc:...
> >
> > On 09/23/2016 07:40 PM, Maxime Coquelin wrote:
> > >
> > >
> > > On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
> > >> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
> > >>> Hi,
> > >>>
> > >>> I hit a weird vhost-user-test failure on travis-ci recently, on a
> > >>> branch where I didn't touch any vhost-related code. From a quick
> > >>> look at the code, it looks like the vhost-user code is unhappy to
> > >>> see a disconnected socket.
> > >>>
> > >>> I wasn't able to reproduce it. It seems to be a hard to reproduce
> > >>> race between vhost-user code and socket reconnection.
> > >>>
> > >>> The failure can be seen at:
> > >>>
> > >>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
> > >>
> > >> Maxime looked at something similiar. Any idea?
> > > No, not really.
> > > Marc-André contributed a lot to these tests, I add him in cc: in case
> > > he has an idea.
> > >
> > > I will have a look in the mean time.
> > >
>
> I am unable to reproduce locally (over 500x iterations), and I
> have no clue what's going on: the warnings there aren't the
> problem (that's the main reason why we use the subprocess, to
> silence those). Do you have a local reproducer or is it only on
> travis? Afaik, there are no other reports of this test failing,
> are you sure its not related to changes on your branch?
I don't have a local reproducer, I could only see it once on
travis-ci. Maybe it is not possible to reproduce it if the
machine isn't loaded enough to make the right thread/process be
delayed.
I am pretty sure it's not related to my changes. Below is the
diffstat between master and the commit that was being tested. All
the changes were limited to x86 CPUID code (which shouldn't
affect qtest code at all).
$ git diff --stat master...8de32e0
include/hw/i386/pc.h | 7 +-
include/sysemu/cpus.h | 5 +-
target-i386/cpu.c | 567 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------
target-i386/cpu.h | 15 +++-
target-ppc/translate_init.c | 3 +-
tests/Makefile.include | 2 +
tests/test-x86-cpuid-compat.c | 171 ++++++++++++++++++++++++++++++++++++++++
7 files changed, 516 insertions(+), 254 deletions(-)
--
Eduardo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-26 12:13 ` Eduardo Habkost
@ 2016-09-26 12:52 ` Maxime Coquelin
2016-09-26 14:07 ` Maxime Coquelin
0 siblings, 1 reply; 8+ messages in thread
From: Maxime Coquelin @ 2016-09-26 12:52 UTC (permalink / raw)
To: Eduardo Habkost, Marc-André Lureau
Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau
Hi,
On 09/26/2016 02:13 PM, Eduardo Habkost wrote:
> On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote:
>> Hi
>>
>> ----- Original Message -----
>>> This time with Marc-André in cc:...
>>>
>>> On 09/23/2016 07:40 PM, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I hit a weird vhost-user-test failure on travis-ci recently, on a
>>>>>> branch where I didn't touch any vhost-related code. From a quick
>>>>>> look at the code, it looks like the vhost-user code is unhappy to
>>>>>> see a disconnected socket.
>>>>>>
>>>>>> I wasn't able to reproduce it. It seems to be a hard to reproduce
>>>>>> race between vhost-user code and socket reconnection.
>>>>>>
>>>>>> The failure can be seen at:
>>>>>>
>>>>>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
>>>>>
>>>>> Maxime looked at something similiar. Any idea?
>>>> No, not really.
>>>> Marc-André contributed a lot to these tests, I add him in cc: in case
>>>> he has an idea.
>>>>
>>>> I will have a look in the mean time.
>>>>
>>
>> I am unable to reproduce locally (over 500x iterations), and I
>> have no clue what's going on: the warnings there aren't the
>> problem (that's the main reason why we use the subprocess, to
>> silence those). Do you have a local reproducer or is it only on
>> travis? Afaik, there are no other reports of this test failing,
>> are you sure its not related to changes on your branch?
>
> I don't have a local reproducer, I could only see it once on
> travis-ci. Maybe it is not possible to reproduce it if the
> machine isn't loaded enough to make the right thread/process be
> delayed.
I'm also trying to reproduce it.
Interestingly, launching the test with strace, I reproduce another
problem systematically:
$> strace -o /tmp/vut -ff ./tests/vhost-user-test
/x86_64/vhost-user/read-guest-mem: OK
/x86_64/vhost-user/migrate: Vhost user backend fails to broadcast fake RARP
OK
/x86_64/vhost-user/reconnect: OK
I'll try to load the CPU randomly when executing the test.
Regards,
Maxime
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure
2016-09-26 12:52 ` Maxime Coquelin
@ 2016-09-26 14:07 ` Maxime Coquelin
0 siblings, 0 replies; 8+ messages in thread
From: Maxime Coquelin @ 2016-09-26 14:07 UTC (permalink / raw)
To: Eduardo Habkost, Marc-André Lureau
Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau
On 09/26/2016 02:52 PM, Maxime Coquelin wrote:
> Hi,
>
> On 09/26/2016 02:13 PM, Eduardo Habkost wrote:
>> On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote:
>>> Hi
>>>
>>> ----- Original Message -----
>>>> This time with Marc-André in cc:...
>>>>
>>>> On 09/23/2016 07:40 PM, Maxime Coquelin wrote:
>>>>>
>>>>>
>>>>> On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote:
>>>>>> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I hit a weird vhost-user-test failure on travis-ci recently, on a
>>>>>>> branch where I didn't touch any vhost-related code. From a quick
>>>>>>> look at the code, it looks like the vhost-user code is unhappy to
>>>>>>> see a disconnected socket.
>>>>>>>
>>>>>>> I wasn't able to reproduce it. It seems to be a hard to reproduce
>>>>>>> race between vhost-user code and socket reconnection.
>>>>>>>
>>>>>>> The failure can be seen at:
>>>>>>>
>>>>>>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239
>>>>>>
>>>>>> Maxime looked at something similiar. Any idea?
>>>>> No, not really.
>>>>> Marc-André contributed a lot to these tests, I add him in cc: in case
>>>>> he has an idea.
>>>>>
>>>>> I will have a look in the mean time.
>>>>>
>>>
>>> I am unable to reproduce locally (over 500x iterations), and I
>>> have no clue what's going on: the warnings there aren't the
>>> problem (that's the main reason why we use the subprocess, to
>>> silence those). Do you have a local reproducer or is it only on
>>> travis? Afaik, there are no other reports of this test failing,
>>> are you sure its not related to changes on your branch?
>>
>> I don't have a local reproducer, I could only see it once on
>> travis-ci. Maybe it is not possible to reproduce it if the
>> machine isn't loaded enough to make the right thread/process be
>> delayed.
>
> I'm also trying to reproduce it.
> Interestingly, launching the test with strace, I reproduce another
> problem systematically:
> $> strace -o /tmp/vut -ff ./tests/vhost-user-test
> /x86_64/vhost-user/read-guest-mem: OK
> /x86_64/vhost-user/migrate: Vhost user backend fails to broadcast fake RARP
> OK
> /x86_64/vhost-user/reconnect: OK
>
> I'll try to load the CPU randomly when executing the test.
FYI, I reproduced it once over ~200 runs while stressing the CPUs:
/x86_64/vhost-user/read-guest-mem: OK
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/reconnect: **
ERROR:/home/max/projects/src/mainline/qemu/tests/vhost-user-test.c:715:test_reconnect:
child process (/x86_64/vhost-user/reconnect/subprocess [8797]) failed
unexpectedly
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Resource
temporarily unavailable (11)
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Resource
temporarily unavailable (11)
I'll continue the investigation.
Maxime
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-26 14:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-23 15:36 [Qemu-devel] vhost-user-test failure Eduardo Habkost
2016-09-23 15:41 ` Michael S. Tsirkin
2016-09-23 17:40 ` Maxime Coquelin
2016-09-24 17:42 ` Maxime Coquelin
2016-09-25 20:55 ` Marc-André Lureau
2016-09-26 12:13 ` Eduardo Habkost
2016-09-26 12:52 ` Maxime Coquelin
2016-09-26 14:07 ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).