* [Qemu-devel] Consistency of iotests 093 and 136 @ 2019-01-23 17:00 Max Reitz 2019-01-24 10:11 ` Alberto Garcia 0 siblings, 1 reply; 8+ messages in thread From: Max Reitz @ 2019-01-23 17:00 UTC (permalink / raw) To: Qemu-block; +Cc: qemu-devel@nongnu.org, Alberto Garcia [-- Attachment #1: Type: text/plain, Size: 1078 bytes --] Hi, 093 and 136 seem really flaky to me. I can reproduce that by running: $ dd if=/dev/urandom of=/dev/null in as many shells as I have CPU cores, and then run the tests: $ while TEST_DIR=/tmp/t0 ./check -T -raw 93; do; done or $ while TEST_DIR=/tmp/t0 ./check -T -raw 136; do; done which usually fail after one or two iterations. The exact failures vary, but for 093 it's usually something that ends with: [...] self.assertTrue(check_limit(params['iops'], rd_iops + wr_iops)) AssertionError: False is not true Or: [...] self.assertTrue(check_limit(params['iops_rd'], rd_iops)) AssertionError: False is not true etc. -- so the 10 % error range doesn't seem to be enough, I'd say. But will just increasing it solve the problem? And for 136 it's usually (always?): [...] File "136", line 278, in do_test_stats self.check_values() File "136", line 204, in check_values self.assertLess(0, stats['idle_time_ns']) AssertionError: 0 not less than 0 Any ideas on making these more reliable? Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-23 17:00 [Qemu-devel] Consistency of iotests 093 and 136 Max Reitz @ 2019-01-24 10:11 ` Alberto Garcia 2019-01-24 14:34 ` Alberto Garcia 0 siblings, 1 reply; 8+ messages in thread From: Alberto Garcia @ 2019-01-24 10:11 UTC (permalink / raw) To: Max Reitz, Qemu-block; +Cc: qemu-devel@nongnu.org On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote: > Hi, > > 093 and 136 seem really flaky to me. I can reproduce that by running: That's interesting, I can make 093 fail quite easily now (I haven't tested the other one yet), but I don't think this happened earlier. I'll try to figure out what's going on. Berto ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-24 10:11 ` Alberto Garcia @ 2019-01-24 14:34 ` Alberto Garcia 2019-01-24 18:07 ` Eric Blake 0 siblings, 1 reply; 8+ messages in thread From: Alberto Garcia @ 2019-01-24 14:34 UTC (permalink / raw) To: Max Reitz, Qemu-block; +Cc: qemu-devel@nongnu.org, Peter Xu On Thu 24 Jan 2019 11:11:06 AM CET, Alberto Garcia wrote: > On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote: >> Hi, >> >> 093 and 136 seem really flaky to me. I can reproduce that by running: > > That's interesting, I can make 093 fail quite easily now (I haven't > tested the other one yet), but I don't think this happened > earlier. I'll try to figure out what's going on. I bisected this and it seems that 093 started to fail after this: 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally I'm not familiar with that option so I need to investigate. Berto ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-24 14:34 ` Alberto Garcia @ 2019-01-24 18:07 ` Eric Blake 2019-01-28 15:18 ` Alberto Garcia 0 siblings, 1 reply; 8+ messages in thread From: Eric Blake @ 2019-01-24 18:07 UTC (permalink / raw) To: Alberto Garcia, Max Reitz, Qemu-block Cc: qemu-devel@nongnu.org, Peter Xu, Markus Armbruster [-- Attachment #1: Type: text/plain, Size: 1126 bytes --] On 1/24/19 8:34 AM, Alberto Garcia wrote: > On Thu 24 Jan 2019 11:11:06 AM CET, Alberto Garcia wrote: >> On Wed 23 Jan 2019 06:00:49 PM CET, Max Reitz wrote: >>> Hi, >>> >>> 093 and 136 seem really flaky to me. I can reproduce that by running: >> >> That's interesting, I can make 093 fail quite easily now (I haven't >> tested the other one yet), but I don't think this happened >> earlier. I'll try to figure out what's going on. > > I bisected this and it seems that 093 started to fail after this: > > 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally > > I'm not familiar with that option so I need to investigate. We've got several tests failing after making x-oob unconditional; here's another thread: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html Could it be that the test was using some sort of QMP command as an attempt to synchronize state, but the OOB handling is now making it not a reliable sync point? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-24 18:07 ` Eric Blake @ 2019-01-28 15:18 ` Alberto Garcia 2019-01-28 18:38 ` Markus Armbruster 0 siblings, 1 reply; 8+ messages in thread From: Alberto Garcia @ 2019-01-28 15:18 UTC (permalink / raw) To: Eric Blake, Max Reitz, Qemu-block Cc: qemu-devel@nongnu.org, Peter Xu, Markus Armbruster On Thu 24 Jan 2019 07:07:47 PM CET, Eric Blake wrote: >>>> 093 and 136 seem really flaky to me. I can reproduce that by >>>> running: >>> >>> That's interesting, I can make 093 fail quite easily now (I haven't >>> tested the other one yet), but I don't think this happened >>> earlier. I'll try to figure out what's going on. >> >> I bisected this and it seems that 093 started to fail after this: >> >> 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally >> >> I'm not familiar with that option so I need to investigate. > > We've got several tests failing after making x-oob unconditional; here's > another thread: > > https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html > > Could it be that the test was using some sort of QMP command as an > attempt to synchronize state, but the OOB handling is now making it not > a reliable sync point? 093 submits several I/O requests using aio_read and aio_write with hmp_qemu_io(), then advances the clock using clock_step and finally calls query-blockstats to see how much of the I/O has been completed (it's an I/O throttling test). The expectation is that by the time query-blockstats is called all submitted I/O requests have been processed (up to the amount allowed by the throttling limits). Are the QMP (hmp_qemu_io, query-blockstats) and qtest (clock_step) sockets maybe running in different threads? Berto ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-28 15:18 ` Alberto Garcia @ 2019-01-28 18:38 ` Markus Armbruster 2019-01-29 10:03 ` Alberto Garcia 0 siblings, 1 reply; 8+ messages in thread From: Markus Armbruster @ 2019-01-28 18:38 UTC (permalink / raw) To: Alberto Garcia Cc: Eric Blake, Max Reitz, Qemu-block, qemu-devel@nongnu.org, Peter Xu Alberto Garcia <berto@igalia.com> writes: > On Thu 24 Jan 2019 07:07:47 PM CET, Eric Blake wrote: >>>>> 093 and 136 seem really flaky to me. I can reproduce that by >>>>> running: >>>> >>>> That's interesting, I can make 093 fail quite easily now (I haven't >>>> tested the other one yet), but I don't think this happened >>>> earlier. I'll try to figure out what's going on. >>> >>> I bisected this and it seems that 093 started to fail after this: >>> >>> 8258292e monitor: Remove "x-oob", offer capability "oob" unconditionally >>> >>> I'm not familiar with that option so I need to investigate. >> >> We've got several tests failing after making x-oob unconditional; here's >> another thread: >> >> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05587.html >> >> Could it be that the test was using some sort of QMP command as an >> attempt to synchronize state, but the OOB handling is now making it not >> a reliable sync point? > > 093 submits several I/O requests using aio_read and aio_write with > hmp_qemu_io(), then advances the clock using clock_step and finally > calls query-blockstats to see how much of the I/O has been completed > (it's an I/O throttling test). > > The expectation is that by the time query-blockstats is called all > submitted I/O requests have been processed (up to the amount allowed by > the throttling limits). Assumptions like "when we see the reply to QMP command X, surely the main loop has completed doing Y" are problematic. When possible, rely on something more direct, such as a query command that shows you whether Y has been completed. Mind, I'm not claiming the assumption you described is invalid. > Are the QMP (hmp_qemu_io, query-blockstats) and qtest (clock_step) > sockets maybe running in different threads? First order approximation: the QMP *monitors* run in the monitor I/O thread (as of commit 8258292e18c), but the QMP *commands* still run in the main loop. The QMP monitor suspends itself after reading a command and sending it to the main loop. The main loop resumes the monitor after sending the reply. Before this change, the QMP monitors also ran in the main loop, and executed each command right after reading it. The monitor suspend / resume described above is designed to minimize observable differences in behavior. More exact version, may not be relevant to you now. The QMP monitors run in the monitor I/O thread when the underlying character can support that. The character devices you typically want to use with QMP, such as socket, all can. Some character devices can't, e.g. ringbuf, spice, braille, MUX. Monitors offer capability "oob" when running in the I/O thread. For instance: $ qemu-system-x86_64 -nodefaults -S -display none -qmp-pretty stdio { "QMP": { "version": { "qemu": { "micro": 50, "minor": 1, "major": 3 }, "package": "v3.1.0-1200-g9dd0d8111f" }, "capabilities": [ --> "oob" ] } } QMP commands still run in the main loop. The monitor reads commands and sends them to the main loop. The main loop executes them one after the other, and sends replies. The number of commands in flight is limited. Unless the client accepted capability "oob", the limit is one. Clients that accepted capability "oob" can execute oob-capable commands out of band. The monitor executes them right away, jumping the queue. The only commands that can be executed out of band so far are migrate-recover and migrate-pause. See docs/interop/qmp-spec.txt for more detailed information. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-28 18:38 ` Markus Armbruster @ 2019-01-29 10:03 ` Alberto Garcia 2019-01-29 12:11 ` Markus Armbruster 0 siblings, 1 reply; 8+ messages in thread From: Alberto Garcia @ 2019-01-29 10:03 UTC (permalink / raw) To: Markus Armbruster Cc: Eric Blake, Max Reitz, Qemu-block, qemu-devel@nongnu.org, Peter Xu On Mon 28 Jan 2019 07:38:08 PM CET, Markus Armbruster wrote: >> 093 submits several I/O requests using aio_read and aio_write with >> hmp_qemu_io(), then advances the clock using clock_step and finally >> calls query-blockstats to see how much of the I/O has been completed >> (it's an I/O throttling test). >> >> The expectation is that by the time query-blockstats is called all >> submitted I/O requests have been processed (up to the amount allowed >> by the throttling limits). > > Assumptions like "when we see the reply to QMP command X, surely the > main loop has completed doing Y" are problematic. When possible, rely > on something more direct, such as a query command that shows you > whether Y has been completed. Right, but how to do that for aio_read / aio_write ? Berto ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Consistency of iotests 093 and 136 2019-01-29 10:03 ` Alberto Garcia @ 2019-01-29 12:11 ` Markus Armbruster 0 siblings, 0 replies; 8+ messages in thread From: Markus Armbruster @ 2019-01-29 12:11 UTC (permalink / raw) To: Alberto Garcia; +Cc: Peter Xu, qemu-devel@nongnu.org, Qemu-block, Max Reitz Alberto Garcia <berto@igalia.com> writes: > On Mon 28 Jan 2019 07:38:08 PM CET, Markus Armbruster wrote: > >>> 093 submits several I/O requests using aio_read and aio_write with >>> hmp_qemu_io(), then advances the clock using clock_step and finally >>> calls query-blockstats to see how much of the I/O has been completed >>> (it's an I/O throttling test). >>> >>> The expectation is that by the time query-blockstats is called all >>> submitted I/O requests have been processed (up to the amount allowed >>> by the throttling limits). >> >> Assumptions like "when we see the reply to QMP command X, surely the >> main loop has completed doing Y" are problematic. When possible, rely >> on something more direct, such as a query command that shows you >> whether Y has been completed. > > Right, but how to do that for aio_read / aio_write ? Fair question. What exactly do you need to wait for? ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-01-29 12:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-23 17:00 [Qemu-devel] Consistency of iotests 093 and 136 Max Reitz 2019-01-24 10:11 ` Alberto Garcia 2019-01-24 14:34 ` Alberto Garcia 2019-01-24 18:07 ` Eric Blake 2019-01-28 15:18 ` Alberto Garcia 2019-01-28 18:38 ` Markus Armbruster 2019-01-29 10:03 ` Alberto Garcia 2019-01-29 12:11 ` Markus Armbruster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).