* [Qemu-devel] vhost-user-test failure @ 2016-09-23 15:36 Eduardo Habkost 2016-09-23 15:41 ` Michael S. Tsirkin 0 siblings, 1 reply; 8+ messages in thread From: Eduardo Habkost @ 2016-09-23 15:36 UTC (permalink / raw) To: qemu-devel, Michael S. Tsirkin Hi, I hit a weird vhost-user-test failure on travis-ci recently, on a branch where I didn't touch any vhost-related code. From a quick look at the code, it looks like the vhost-user code is unhappy to see a disconnected socket. I wasn't able to reproduce it. It seems to be a hard to reproduce race between vhost-user code and socket reconnection. The failure can be seen at: https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 Error output: ** ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly qemu-system-i386: Failed to set msg fds. qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) qemu-system-i386: Failed to set msg fds. qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243 make: *** [check-qtest-i386] Error 1 -- Eduardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-23 15:36 [Qemu-devel] vhost-user-test failure Eduardo Habkost @ 2016-09-23 15:41 ` Michael S. Tsirkin 2016-09-23 17:40 ` Maxime Coquelin 0 siblings, 1 reply; 8+ messages in thread From: Michael S. Tsirkin @ 2016-09-23 15:41 UTC (permalink / raw) To: Eduardo Habkost; +Cc: qemu-devel, maxime.coquelin On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: > Hi, > > I hit a weird vhost-user-test failure on travis-ci recently, on a > branch where I didn't touch any vhost-related code. From a quick > look at the code, it looks like the vhost-user code is unhappy to > see a disconnected socket. > > I wasn't able to reproduce it. It seems to be a hard to reproduce > race between vhost-user code and socket reconnection. > > The failure can be seen at: > > https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 Maxime looked at something similiar. Any idea? > Error output: > > ** > ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly > qemu-system-i386: Failed to set msg fds. > qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) > qemu-system-i386: Failed to set msg fds. > qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) > GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243 > make: *** [check-qtest-i386] Error 1 > > -- > Eduardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-23 15:41 ` Michael S. Tsirkin @ 2016-09-23 17:40 ` Maxime Coquelin 2016-09-24 17:42 ` Maxime Coquelin 0 siblings, 1 reply; 8+ messages in thread From: Maxime Coquelin @ 2016-09-23 17:40 UTC (permalink / raw) To: Michael S. Tsirkin, Eduardo Habkost; +Cc: qemu-devel On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: > On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: >> Hi, >> >> I hit a weird vhost-user-test failure on travis-ci recently, on a >> branch where I didn't touch any vhost-related code. From a quick >> look at the code, it looks like the vhost-user code is unhappy to >> see a disconnected socket. >> >> I wasn't able to reproduce it. It seems to be a hard to reproduce >> race between vhost-user code and socket reconnection. >> >> The failure can be seen at: >> >> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 > > Maxime looked at something similiar. Any idea? No, not really. Marc-André contributed a lot to these tests, I add him in cc: in case he has an idea. I will have a look in the mean time. Maxime > >> Error output: >> >> ** >> ERROR:tests/vhost-user-test.c:715:test_reconnect: child process (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly >> qemu-system-i386: Failed to set msg fds. >> qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) >> qemu-system-i386: Failed to set msg fds. >> qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) >> GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243 >> make: *** [check-qtest-i386] Error 1 >> >> -- >> Eduardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-23 17:40 ` Maxime Coquelin @ 2016-09-24 17:42 ` Maxime Coquelin 2016-09-25 20:55 ` Marc-André Lureau 0 siblings, 1 reply; 8+ messages in thread From: Maxime Coquelin @ 2016-09-24 17:42 UTC (permalink / raw) To: Michael S. Tsirkin, Eduardo Habkost; +Cc: qemu-devel, Marc-André Lureau This time with Marc-André in cc:... On 09/23/2016 07:40 PM, Maxime Coquelin wrote: > > > On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: >> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: >>> Hi, >>> >>> I hit a weird vhost-user-test failure on travis-ci recently, on a >>> branch where I didn't touch any vhost-related code. From a quick >>> look at the code, it looks like the vhost-user code is unhappy to >>> see a disconnected socket. >>> >>> I wasn't able to reproduce it. It seems to be a hard to reproduce >>> race between vhost-user code and socket reconnection. >>> >>> The failure can be seen at: >>> >>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 >> >> Maxime looked at something similiar. Any idea? > No, not really. > Marc-André contributed a lot to these tests, I add him in cc: in case > he has an idea. > > I will have a look in the mean time. > > Maxime > >> >>> Error output: >>> >>> ** >>> ERROR:tests/vhost-user-test.c:715:test_reconnect: child process >>> (/i386/vhost-user/reconnect/subprocess [23792]) failed unexpectedly >>> qemu-system-i386: Failed to set msg fds. >>> qemu-system-i386: vhost VQ 0 ring restore failed: -1: Resource >>> temporarily unavailable (11) >>> qemu-system-i386: Failed to set msg fds. >>> qemu-system-i386: vhost VQ 1 ring restore failed: -1: Resource >>> temporarily unavailable (11) >>> GTester: last random seed: R02S2892f6ad84bd5d03acd54cb75f444243 >>> make: *** [check-qtest-i386] Error 1 >>> >>> -- >>> Eduardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-24 17:42 ` Maxime Coquelin @ 2016-09-25 20:55 ` Marc-André Lureau 2016-09-26 12:13 ` Eduardo Habkost 0 siblings, 1 reply; 8+ messages in thread From: Marc-André Lureau @ 2016-09-25 20:55 UTC (permalink / raw) To: Eduardo Habkost Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau, Maxime Coquelin Hi ----- Original Message ----- > This time with Marc-André in cc:... > > On 09/23/2016 07:40 PM, Maxime Coquelin wrote: > > > > > > On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: > >> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: > >>> Hi, > >>> > >>> I hit a weird vhost-user-test failure on travis-ci recently, on a > >>> branch where I didn't touch any vhost-related code. From a quick > >>> look at the code, it looks like the vhost-user code is unhappy to > >>> see a disconnected socket. > >>> > >>> I wasn't able to reproduce it. It seems to be a hard to reproduce > >>> race between vhost-user code and socket reconnection. > >>> > >>> The failure can be seen at: > >>> > >>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 > >> > >> Maxime looked at something similiar. Any idea? > > No, not really. > > Marc-André contributed a lot to these tests, I add him in cc: in case > > he has an idea. > > > > I will have a look in the mean time. > > I am unable to reproduce locally (over 500x iterations), and I have no clue what's going on: the warnings there aren't the problem (that's the main reason why we use the subprocess, to silence those). Do you have a local reproducer or is it only on travis? Afaik, there are no other reports of this test failing, are you sure its not related to changes on your branch? thanks ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-25 20:55 ` Marc-André Lureau @ 2016-09-26 12:13 ` Eduardo Habkost 2016-09-26 12:52 ` Maxime Coquelin 0 siblings, 1 reply; 8+ messages in thread From: Eduardo Habkost @ 2016-09-26 12:13 UTC (permalink / raw) To: Marc-André Lureau Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau, Maxime Coquelin On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote: > Hi > > ----- Original Message ----- > > This time with Marc-André in cc:... > > > > On 09/23/2016 07:40 PM, Maxime Coquelin wrote: > > > > > > > > > On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: > > >> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: > > >>> Hi, > > >>> > > >>> I hit a weird vhost-user-test failure on travis-ci recently, on a > > >>> branch where I didn't touch any vhost-related code. From a quick > > >>> look at the code, it looks like the vhost-user code is unhappy to > > >>> see a disconnected socket. > > >>> > > >>> I wasn't able to reproduce it. It seems to be a hard to reproduce > > >>> race between vhost-user code and socket reconnection. > > >>> > > >>> The failure can be seen at: > > >>> > > >>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 > > >> > > >> Maxime looked at something similiar. Any idea? > > > No, not really. > > > Marc-André contributed a lot to these tests, I add him in cc: in case > > > he has an idea. > > > > > > I will have a look in the mean time. > > > > > I am unable to reproduce locally (over 500x iterations), and I > have no clue what's going on: the warnings there aren't the > problem (that's the main reason why we use the subprocess, to > silence those). Do you have a local reproducer or is it only on > travis? Afaik, there are no other reports of this test failing, > are you sure its not related to changes on your branch? I don't have a local reproducer, I could only see it once on travis-ci. Maybe it is not possible to reproduce it if the machine isn't loaded enough to make the right thread/process be delayed. I am pretty sure it's not related to my changes. Below is the diffstat between master and the commit that was being tested. All the changes were limited to x86 CPUID code (which shouldn't affect qtest code at all). $ git diff --stat master...8de32e0 include/hw/i386/pc.h | 7 +- include/sysemu/cpus.h | 5 +- target-i386/cpu.c | 567 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------------- target-i386/cpu.h | 15 +++- target-ppc/translate_init.c | 3 +- tests/Makefile.include | 2 + tests/test-x86-cpuid-compat.c | 171 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 516 insertions(+), 254 deletions(-) -- Eduardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-26 12:13 ` Eduardo Habkost @ 2016-09-26 12:52 ` Maxime Coquelin 2016-09-26 14:07 ` Maxime Coquelin 0 siblings, 1 reply; 8+ messages in thread From: Maxime Coquelin @ 2016-09-26 12:52 UTC (permalink / raw) To: Eduardo Habkost, Marc-André Lureau Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau Hi, On 09/26/2016 02:13 PM, Eduardo Habkost wrote: > On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote: >> Hi >> >> ----- Original Message ----- >>> This time with Marc-André in cc:... >>> >>> On 09/23/2016 07:40 PM, Maxime Coquelin wrote: >>>> >>>> >>>> On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: >>>>> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: >>>>>> Hi, >>>>>> >>>>>> I hit a weird vhost-user-test failure on travis-ci recently, on a >>>>>> branch where I didn't touch any vhost-related code. From a quick >>>>>> look at the code, it looks like the vhost-user code is unhappy to >>>>>> see a disconnected socket. >>>>>> >>>>>> I wasn't able to reproduce it. It seems to be a hard to reproduce >>>>>> race between vhost-user code and socket reconnection. >>>>>> >>>>>> The failure can be seen at: >>>>>> >>>>>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 >>>>> >>>>> Maxime looked at something similiar. Any idea? >>>> No, not really. >>>> Marc-André contributed a lot to these tests, I add him in cc: in case >>>> he has an idea. >>>> >>>> I will have a look in the mean time. >>>> >> >> I am unable to reproduce locally (over 500x iterations), and I >> have no clue what's going on: the warnings there aren't the >> problem (that's the main reason why we use the subprocess, to >> silence those). Do you have a local reproducer or is it only on >> travis? Afaik, there are no other reports of this test failing, >> are you sure its not related to changes on your branch? > > I don't have a local reproducer, I could only see it once on > travis-ci. Maybe it is not possible to reproduce it if the > machine isn't loaded enough to make the right thread/process be > delayed. I'm also trying to reproduce it. Interestingly, launching the test with strace, I reproduce another problem systematically: $> strace -o /tmp/vut -ff ./tests/vhost-user-test /x86_64/vhost-user/read-guest-mem: OK /x86_64/vhost-user/migrate: Vhost user backend fails to broadcast fake RARP OK /x86_64/vhost-user/reconnect: OK I'll try to load the CPU randomly when executing the test. Regards, Maxime ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] vhost-user-test failure 2016-09-26 12:52 ` Maxime Coquelin @ 2016-09-26 14:07 ` Maxime Coquelin 0 siblings, 0 replies; 8+ messages in thread From: Maxime Coquelin @ 2016-09-26 14:07 UTC (permalink / raw) To: Eduardo Habkost, Marc-André Lureau Cc: Michael S. Tsirkin, qemu-devel, Marc-André Lureau On 09/26/2016 02:52 PM, Maxime Coquelin wrote: > Hi, > > On 09/26/2016 02:13 PM, Eduardo Habkost wrote: >> On Sun, Sep 25, 2016 at 04:55:53PM -0400, Marc-André Lureau wrote: >>> Hi >>> >>> ----- Original Message ----- >>>> This time with Marc-André in cc:... >>>> >>>> On 09/23/2016 07:40 PM, Maxime Coquelin wrote: >>>>> >>>>> >>>>> On 09/23/2016 05:41 PM, Michael S. Tsirkin wrote: >>>>>> On Fri, Sep 23, 2016 at 12:36:12PM -0300, Eduardo Habkost wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I hit a weird vhost-user-test failure on travis-ci recently, on a >>>>>>> branch where I didn't touch any vhost-related code. From a quick >>>>>>> look at the code, it looks like the vhost-user code is unhappy to >>>>>>> see a disconnected socket. >>>>>>> >>>>>>> I wasn't able to reproduce it. It seems to be a hard to reproduce >>>>>>> race between vhost-user code and socket reconnection. >>>>>>> >>>>>>> The failure can be seen at: >>>>>>> >>>>>>> https://travis-ci.org/ehabkost/qemu-hacks/jobs/162077239 >>>>>> >>>>>> Maxime looked at something similiar. Any idea? >>>>> No, not really. >>>>> Marc-André contributed a lot to these tests, I add him in cc: in case >>>>> he has an idea. >>>>> >>>>> I will have a look in the mean time. >>>>> >>> >>> I am unable to reproduce locally (over 500x iterations), and I >>> have no clue what's going on: the warnings there aren't the >>> problem (that's the main reason why we use the subprocess, to >>> silence those). Do you have a local reproducer or is it only on >>> travis? Afaik, there are no other reports of this test failing, >>> are you sure its not related to changes on your branch? >> >> I don't have a local reproducer, I could only see it once on >> travis-ci. Maybe it is not possible to reproduce it if the >> machine isn't loaded enough to make the right thread/process be >> delayed. > > I'm also trying to reproduce it. > Interestingly, launching the test with strace, I reproduce another > problem systematically: > $> strace -o /tmp/vut -ff ./tests/vhost-user-test > /x86_64/vhost-user/read-guest-mem: OK > /x86_64/vhost-user/migrate: Vhost user backend fails to broadcast fake RARP > OK > /x86_64/vhost-user/reconnect: OK > > I'll try to load the CPU randomly when executing the test. FYI, I reproduced it once over ~200 runs while stressing the CPUs: /x86_64/vhost-user/read-guest-mem: OK /x86_64/vhost-user/migrate: OK /x86_64/vhost-user/reconnect: ** ERROR:/home/max/projects/src/mainline/qemu/tests/vhost-user-test.c:715:test_reconnect: child process (/x86_64/vhost-user/reconnect/subprocess [8797]) failed unexpectedly qemu-system-x86_64: Failed to set msg fds. qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) qemu-system-x86_64: Failed to set msg fds. qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) I'll continue the investigation. Maxime ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-26 14:08 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-23 15:36 [Qemu-devel] vhost-user-test failure Eduardo Habkost 2016-09-23 15:41 ` Michael S. Tsirkin 2016-09-23 17:40 ` Maxime Coquelin 2016-09-24 17:42 ` Maxime Coquelin 2016-09-25 20:55 ` Marc-André Lureau 2016-09-26 12:13 ` Eduardo Habkost 2016-09-26 12:52 ` Maxime Coquelin 2016-09-26 14:07 ` Maxime Coquelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).