* runaway avocado @ 2020-10-26 22:35 Peter Maydell 2020-10-26 22:43 ` Philippe Mathieu-Daudé 2021-02-05 19:23 ` Peter Maydell 0 siblings, 2 replies; 11+ messages in thread From: Peter Maydell @ 2020-10-26 22:35 UTC (permalink / raw) To: QEMU Developers; +Cc: Alex Bennée So, I somehow ended up with this process still running on my local machine after a (probably failed) 'make check-acceptance': petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 ./qemu-system-aarch64 -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait -serial chardev:console -icount shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init -net none -drive file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz -cpu cortex-a53 and it was continuing to log to a deleted file /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin which was steadily eating my disk space and got up to nearly 100GB in used disk (invisible to du, of course, since it was an unlinked file) before I finally figured out what was going on and killed it about six hours later... Any suggestions for how we might improve the robustness of the relevant test ? thanks -- PMM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2020-10-26 22:35 runaway avocado Peter Maydell @ 2020-10-26 22:43 ` Philippe Mathieu-Daudé 2020-10-27 0:28 ` Cleber Rosa 2021-02-05 19:23 ` Peter Maydell 1 sibling, 1 reply; 11+ messages in thread From: Philippe Mathieu-Daudé @ 2020-10-26 22:43 UTC (permalink / raw) To: Peter Maydell, avocado-devel, Cleber Rosa, Eduardo Habkost Cc: Alex Bennée, QEMU Developers, Pavel Dovgalyuk Cc'ing avocado-devel@ On 10/26/20 11:35 PM, Peter Maydell wrote: > So, I somehow ended up with this process still running on my > local machine after a (probably failed) 'make check-acceptance': > > petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 > ./qemu-system-aarch64 -display none -vga none -chardev > socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon > chardev=mon,mode=control -machine virt -chardev > socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait > -serial chardev:console -icount > shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init > -net none -drive > file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none > -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz > -cpu cortex-a53 > > and it was continuing to log to a deleted file > /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin > > which was steadily eating my disk space and got up to nearly 100GB > in used disk (invisible to du, of course, since it was an unlinked > file) before I finally figured out what was going on and killed it > about six hours later... > > Any suggestions for how we might improve the robustness of the > relevant test ? > > thanks > -- PMM > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2020-10-26 22:43 ` Philippe Mathieu-Daudé @ 2020-10-27 0:28 ` Cleber Rosa 2020-12-07 20:45 ` John Snow 0 siblings, 1 reply; 11+ messages in thread From: Cleber Rosa @ 2020-10-27 0:28 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Peter Maydell, Eduardo Habkost, QEMU Developers, avocado-devel, Pavel Dovgalyuk, Alex Bennée [-- Attachment #1: Type: text/plain, Size: 2813 bytes --] On Mon, Oct 26, 2020 at 11:43:36PM +0100, Philippe Mathieu-Daudé wrote: > Cc'ing avocado-devel@ > > On 10/26/20 11:35 PM, Peter Maydell wrote: > > So, I somehow ended up with this process still running on my > > local machine after a (probably failed) 'make check-acceptance': > > > > petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 > > ./qemu-system-aarch64 -display none -vga none -chardev > > socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon > > chardev=mon,mode=control -machine virt -chardev > > socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait > > -serial chardev:console -icount > > shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init > > -net none -drive > > file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none > > -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz > > -cpu cortex-a53 > > > > and it was continuing to log to a deleted file > > /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin > > > > which was steadily eating my disk space and got up to nearly 100GB > > in used disk (invisible to du, of course, since it was an unlinked > > file) before I finally figured out what was going on and killed it > > about six hours later... > > Ouch! > > Any suggestions for how we might improve the robustness of the > > relevant test ? > > While this test may be less robust/reliable than others, the core issue is that the automatic shutdown of the QEMU "vms" can be improved. My best guess is that this specific test ended in ERROR, and (or because?) the tearDown() method failed to end these processes. All tests can be improved at once by adding a second, even more forceful round of shutdown. Currently the process gets, in the worst case scenario, a SIGKILL. But, in addition to that, an upper layer above the test could be given the responsibility to look for and clean up resouces initiated by a test. The Avocado job has hooks for running callbacks right before its own process exits, but, with the new Avocado architecture (AKA "N(ext) Runner") this should probably be implemented as async cleanup actions that begin right after a test ends. I'll give the "second more forceful round of shutdown" approach some and testing, and in addition to that, open an issue to track the upper layer resource cleanup on Avocado. Thanks, - Cleber. > > thanks > > -- PMM > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2020-10-27 0:28 ` Cleber Rosa @ 2020-12-07 20:45 ` John Snow 0 siblings, 0 replies; 11+ messages in thread From: John Snow @ 2020-12-07 20:45 UTC (permalink / raw) To: Cleber Rosa, Philippe Mathieu-Daudé Cc: Peter Maydell, Eduardo Habkost, QEMU Developers, avocado-devel, Pavel Dovgalyuk, Alex Bennée On 10/26/20 8:28 PM, Cleber Rosa wrote: > On Mon, Oct 26, 2020 at 11:43:36PM +0100, Philippe Mathieu-Daudé wrote: >> Cc'ing avocado-devel@ >> >> On 10/26/20 11:35 PM, Peter Maydell wrote: >>> So, I somehow ended up with this process still running on my >>> local machine after a (probably failed) 'make check-acceptance': >>> >>> petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 >>> ./qemu-system-aarch64 -display none -vga none -chardev >>> socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon >>> chardev=mon,mode=control -machine virt -chardev >>> socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait >>> -serial chardev:console -icount >>> shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init >>> -net none -drive >>> file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none >>> -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz >>> -cpu cortex-a53 >>> >>> and it was continuing to log to a deleted file >>> /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin >>> >>> which was steadily eating my disk space and got up to nearly 100GB >>> in used disk (invisible to du, of course, since it was an unlinked >>> file) before I finally figured out what was going on and killed it >>> about six hours later... >>> > > Ouch! > >>> Any suggestions for how we might improve the robustness of the >>> relevant test ? >>> > > While this test may be less robust/reliable than others, the core > issue is that the automatic shutdown of the QEMU "vms" can be > improved. My best guess is that this specific test ended in ERROR, > and (or because?) the tearDown() method failed to end these processes. > > All tests can be improved at once by adding a second, even more > forceful round of shutdown. Currently the process gets, in the worst > case scenario, a SIGKILL. > > But, in addition to that, an upper layer above the test could be given > the responsibility to look for and clean up resouces initiated by a > test. The Avocado job has hooks for running callbacks right before > its own process exits, but, with the new Avocado architecture (AKA "N(ext) > Runner") this should probably be implemented as async cleanup actions > that begin right after a test ends. > > I'll give the "second more forceful round of shutdown" approach some > and testing, and in addition to that, open an issue to track the upper > layer resource cleanup on Avocado. > machine.py should have a timeout that it adheres to, unless it was disabled explicitly -- then I guess it can't help you. --js ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2020-10-26 22:35 runaway avocado Peter Maydell 2020-10-26 22:43 ` Philippe Mathieu-Daudé @ 2021-02-05 19:23 ` Peter Maydell 2021-02-11 17:25 ` Cleber Rosa 1 sibling, 1 reply; 11+ messages in thread From: Peter Maydell @ 2021-02-05 19:23 UTC (permalink / raw) To: QEMU Developers; +Cc: Alex Bennée On Mon, 26 Oct 2020 at 22:35, Peter Maydell <peter.maydell@linaro.org> wrote: > > So, I somehow ended up with this process still running on my > local machine after a (probably failed) 'make check-acceptance': > > petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 > ./qemu-system-aarch64 -display none -vga none -chardev > socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon > chardev=mon,mode=control -machine virt -chardev > socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait > -serial chardev:console -icount > shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init > -net none -drive > file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none > -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz > -cpu cortex-a53 > > and it was continuing to log to a deleted file > /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin > > which was steadily eating my disk space and got up to nearly 100GB > in used disk (invisible to du, of course, since it was an unlinked > file) before I finally figured out what was going on and killed it > about six hours later... Just got hit by this test framework bug again :-( Same thing, runaway avacado record-and-replay test ate all my disk space. -- PMM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-05 19:23 ` Peter Maydell @ 2021-02-11 17:25 ` Cleber Rosa 2021-02-11 17:37 ` Peter Maydell 0 siblings, 1 reply; 11+ messages in thread From: Cleber Rosa @ 2021-02-11 17:25 UTC (permalink / raw) To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 2487 bytes --] On Fri, Feb 05, 2021 at 07:23:22PM +0000, Peter Maydell wrote: > On Mon, 26 Oct 2020 at 22:35, Peter Maydell <peter.maydell@linaro.org> wrote: > > > > So, I somehow ended up with this process still running on my > > local machine after a (probably failed) 'make check-acceptance': > > > > petmay01 13710 99.7 3.7 2313448 1235780 pts/16 Sl 16:10 378:00 > > ./qemu-system-aarch64 -display none -vga none -chardev > > socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon > > chardev=mon,mode=control -machine virt -chardev > > socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait > > -serial chardev:console -icount > > shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init > > -net none -drive > > file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none > > -kernel /home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz > > -cpu cortex-a53 > > > > and it was continuing to log to a deleted file > > /var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin > > > > which was steadily eating my disk space and got up to nearly 100GB > > in used disk (invisible to du, of course, since it was an unlinked > > file) before I finally figured out what was going on and killed it > > about six hours later... > > Just got hit by this test framework bug again :-( Same thing, > runaway avacado record-and-replay test ate all my disk space. > > -- PMM > Hi Peter, I'm sorry this caused you trouble again. IIUC, this specic issue was caused by a runaway QEMU. Granted, it was started by an Avocado test. I've opened a bug report to look into the possibilities to mitigate or prevent this from happening again: https://bugs.launchpad.net/qemu/+bug/1915431 The bug report contains a bit more context into why Avocado does not try to kill all processes started by a test by default. BTW, we've been working with Pavel on identifying issues with replay/reverse features that are causing test failures. So far, I've seen a couple of issues that may be related to this runaway QEMU writing to to the replay.bin file. Regards, - Cleber. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-11 17:25 ` Cleber Rosa @ 2021-02-11 17:37 ` Peter Maydell 2021-02-11 18:47 ` Cleber Rosa 0 siblings, 1 reply; 11+ messages in thread From: Peter Maydell @ 2021-02-11 17:37 UTC (permalink / raw) To: Cleber Rosa; +Cc: Alex Bennée, QEMU Developers On Thu, 11 Feb 2021 at 17:25, Cleber Rosa <crosa@redhat.com> wrote: > IIUC, this specic issue was caused by a runaway QEMU. Granted, it was > started by an Avocado test. I've opened a bug report to look into the > possibilities to mitigate or prevent this from happening again: I wonder if we could have avocado run all our acceptance cases under a 'ulimit -f' setting that restricts the amount of disk space they can use? That would restrict the damage that could be done by any runaways. A CPU usage limit might also be good. thanks -- PMM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-11 17:37 ` Peter Maydell @ 2021-02-11 18:47 ` Cleber Rosa 2021-02-11 19:21 ` Peter Maydell 0 siblings, 1 reply; 11+ messages in thread From: Cleber Rosa @ 2021-02-11 18:47 UTC (permalink / raw) To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 1359 bytes --] On Thu, Feb 11, 2021 at 05:37:20PM +0000, Peter Maydell wrote: > On Thu, 11 Feb 2021 at 17:25, Cleber Rosa <crosa@redhat.com> wrote: > > IIUC, this specic issue was caused by a runaway QEMU. Granted, it was > > started by an Avocado test. I've opened a bug report to look into the > > possibilities to mitigate or prevent this from happening again: > > I wonder if we could have avocado run all our acceptance cases > under a 'ulimit -f' setting that restricts the amount of disk > space they can use? That would restrict the damage that could > be done by any runaways. A CPU usage limit might also be good. > > thanks > -- PMM > To me that sounds a lot like Linux cgroups. I can see either someone setting up cgroups and having Avocado run in it (then all tests inherit from this common parent), or alternatively Avocado setting up cgroups for each of the tests. The former seems simpler and effective wrt preventing system resources. I can see a use case for the later when tests actually want to verify a behavior when certain resources are constrained. We can have a script setting up a cgroup as part of a gitlab-ci.{yml,d} job for the jobs that will run on the non-shared GitLab runners (such as the s390 and aarch64 machines owned by the QEMU project). Does this sound like a solution? Thanks, - Cleber. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-11 18:47 ` Cleber Rosa @ 2021-02-11 19:21 ` Peter Maydell 2021-02-11 23:59 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 11+ messages in thread From: Peter Maydell @ 2021-02-11 19:21 UTC (permalink / raw) To: Cleber Rosa; +Cc: Alex Bennée, QEMU Developers On Thu, 11 Feb 2021 at 18:47, Cleber Rosa <crosa@redhat.com> wrote: > On Thu, Feb 11, 2021 at 05:37:20PM +0000, Peter Maydell wrote: > > I wonder if we could have avocado run all our acceptance cases > > under a 'ulimit -f' setting that restricts the amount of disk > > space they can use? That would restrict the damage that could > > be done by any runaways. A CPU usage limit might also be good. > To me that sounds a lot like Linux cgroups. ...except that ulimits are a well-established mechanism that is straightforward, works for any user and is cross-platform for most Unixes, whereas cgroups are complicated, Linux specific, and AIUI require root access to set them up and configure them. > We can have a script setting up a cgroup as part of a > gitlab-ci.{yml,d} job for the jobs that will run on the non-shared > GitLab runners (such as the s390 and aarch64 machines owned by the > QEMU project). > > Does this sound like a solution? We want a solution that works for anybody running "make check-acceptance" in any situation, not just for the CI runners. thanks -- PMM ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-11 19:21 ` Peter Maydell @ 2021-02-11 23:59 ` Philippe Mathieu-Daudé 2021-02-12 2:31 ` Cleber Rosa 0 siblings, 1 reply; 11+ messages in thread From: Philippe Mathieu-Daudé @ 2021-02-11 23:59 UTC (permalink / raw) To: Cleber Rosa Cc: Lukáš Doktor, Peter Maydell, Yonggang Luo, Alex Bennée, QEMU Developers On 2/11/21 8:21 PM, Peter Maydell wrote: > On Thu, 11 Feb 2021 at 18:47, Cleber Rosa <crosa@redhat.com> wrote: >> On Thu, Feb 11, 2021 at 05:37:20PM +0000, Peter Maydell wrote: >>> I wonder if we could have avocado run all our acceptance cases >>> under a 'ulimit -f' setting that restricts the amount of disk >>> space they can use? That would restrict the damage that could >>> be done by any runaways. A CPU usage limit might also be good. > >> To me that sounds a lot like Linux cgroups. > > ...except that ulimits are a well-established mechanism that > is straightforward, works for any user and is cross-platform > for most Unixes, whereas cgroups are complicated, Linux specific, > and AIUI require root access to set them up and configure them. I agree with Peter, having being POSIX compliant is better than restricting to (recent) Linux. But also note we have users interested running tests for Windows builds. See the Cirrus-CI. > >> We can have a script setting up a cgroup as part of a >> gitlab-ci.{yml,d} job for the jobs that will run on the non-shared >> GitLab runners (such as the s390 and aarch64 machines owned by the >> QEMU project). >> >> Does this sound like a solution? > > We want a solution that works for anybody running > "make check-acceptance" in any situation, not just for > the CI runners. Indeed. Public CI time being limited, I expect users to run tests elsewhere. We don't mind about data loss on CI runners. FWIW similar complain last year: https://www.mail-archive.com/qemu-devel@nongnu.org/msg672277.html Regards, Phil. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: runaway avocado 2021-02-11 23:59 ` Philippe Mathieu-Daudé @ 2021-02-12 2:31 ` Cleber Rosa 0 siblings, 0 replies; 11+ messages in thread From: Cleber Rosa @ 2021-02-12 2:31 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Lukáš Doktor, Peter Maydell, Yonggang Luo, Alex Bennée, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 3100 bytes --] On Fri, Feb 12, 2021 at 12:59:23AM +0100, Philippe Mathieu-Daudé wrote: > On 2/11/21 8:21 PM, Peter Maydell wrote: > > On Thu, 11 Feb 2021 at 18:47, Cleber Rosa <crosa@redhat.com> wrote: > >> On Thu, Feb 11, 2021 at 05:37:20PM +0000, Peter Maydell wrote: > >>> I wonder if we could have avocado run all our acceptance cases > >>> under a 'ulimit -f' setting that restricts the amount of disk > >>> space they can use? That would restrict the damage that could > >>> be done by any runaways. A CPU usage limit might also be good. > > > >> To me that sounds a lot like Linux cgroups. > > > > ...except that ulimits are a well-established mechanism that > > is straightforward, works for any user and is cross-platform > > for most Unixes, whereas cgroups are complicated, Linux specific, > > and AIUI require root access to set them up and configure them. > > I agree with Peter, having being POSIX compliant is better than > restricting to (recent) Linux. But also note we have users interested > running tests for Windows builds. See the Cirrus-CI. > Sure, I feel like cgroups is more comprehensive, but definitely have the drawbacks you both listed. > > > >> We can have a script setting up a cgroup as part of a > >> gitlab-ci.{yml,d} job for the jobs that will run on the non-shared > >> GitLab runners (such as the s390 and aarch64 machines owned by the > >> QEMU project). > >> > >> Does this sound like a solution? > > > > We want a solution that works for anybody running > > "make check-acceptance" in any situation, not just for > > the CI runners. > > Indeed. Public CI time being limited, I expect users to run tests > elsewhere. We don't mind about data loss on CI runners. > That was kind of my point. We want to use all the resources the GitLab CI shared runners give us, so extra limit enforcements make no sense to me. Also, on my personal machines, I also prefer to have faster test turnarounds, so putting extra limits is not beneficial to me. YMMV, so my opinion is that this should be an opt-in, *not* enabled by default. My initial take on this is that we can have a few pre-defined scripts that set those limits. Users get to activate those profiles by name if say, a given environment variable is set. Something like: RESOURCE_LIMIT_PROFILE=low_cpu_4g_files if [ -n $RESOURCE_LIMIT_PROFILE ]; then ./scripts/limit-resources/$RESOUCE_LIMIT_PROFILE $* > FWIW similar complain last year: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg672277.html > The specific issue of Avocado's cache size should be addressed in this development cycle, and a solution available on 86.0. It's being tracked here: https://github.com/avocado-framework/avocado/issues/4311 Now, in Peter's case, it was QEMU writing to a replay.bin file, and I don't see a practical way that Avocado could limit the overall disk space usage by whathever gets run on a test unless disk quotas are set. Not sure if this belongs on a test framework though. Cheers, - Cleber. > Regards, > > Phil. > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-02-12 2:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-10-26 22:35 runaway avocado Peter Maydell 2020-10-26 22:43 ` Philippe Mathieu-Daudé 2020-10-27 0:28 ` Cleber Rosa 2020-12-07 20:45 ` John Snow 2021-02-05 19:23 ` Peter Maydell 2021-02-11 17:25 ` Cleber Rosa 2021-02-11 17:37 ` Peter Maydell 2021-02-11 18:47 ` Cleber Rosa 2021-02-11 19:21 ` Peter Maydell 2021-02-11 23:59 ` Philippe Mathieu-Daudé 2021-02-12 2:31 ` Cleber Rosa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).