All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-4.12-testing test] 169199: regressions - FAIL
@ 2022-04-07  8:45 osstest service owner
  2022-04-08  7:01 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: osstest service owner @ 2022-04-07  8:45 UTC (permalink / raw)
  To: xen-devel

flight 169199 xen-4.12-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169199/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 168480

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-vhd     12 debian-di-install fail in 169184 pass in 169199
 test-amd64-i386-xl-vhd       19 guest-localmigrate/x10     fail pass in 169184

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qcow2    19 guest-localmigrate/x10  fail blocked in 168480
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop             fail like 168480
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check    fail  like 168480
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168480
 test-armhf-armhf-libvirt     16 saverestore-support-check    fail  like 168480
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop            fail like 168480
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop            fail like 168480
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop             fail like 168480
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop            fail like 168480
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168480
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop             fail like 168480
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop            fail like 168480
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop             fail like 168480
 test-amd64-i386-libvirt-xsm  15 migrate-support-check        fail   never pass
 test-amd64-i386-xl-pvshim    14 guest-start                  fail   never pass
 test-amd64-amd64-libvirt     15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      15 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-xsm      15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      16 saverestore-support-check    fail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-rtds     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check    fail  never pass
 test-arm64-arm64-xl-vhd      14 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-vhd      15 saverestore-support-check    fail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-check    fail never pass
 test-armhf-armhf-xl          15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          16 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      15 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-check        fail never pass

version targeted for testing:
 xen                  c633ec9451e76015c409bd5119ffcb0f2e61fe8b
baseline version:
 xen                  944afa38d9339a67f0164d07fb7ac8a54e9a4c60

Last test of basis   168480  2022-03-08 18:07:22 Z   29 days
Testing same since   169184  2022-04-05 14:06:03 Z    1 days    2 attempts

------------------------------------------------------------
People who touched revisions under test:
  Jan Beulich <jbeulich@suse.com>
  Roger Pau Monné <roger.pau@citrix.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-arm64-arm64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm                 fail    
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm                  pass    
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm                 pass    
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm                  pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvhv2-amd                                pass    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-qemuu-freebsd11-amd64                       pass    
 test-amd64-amd64-qemuu-freebsd12-amd64                       pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         fail    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit1                                  pass    
 test-arm64-arm64-xl-credit1                                  pass    
 test-armhf-armhf-xl-credit1                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-arm64-arm64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict        pass    
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict         pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvhv2-intel                              pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-livepatch                                   pass    
 test-amd64-i386-livepatch                                    pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-xl-pvshim                                   pass    
 test-amd64-i386-xl-pvshim                                    fail    
 test-amd64-amd64-pygrub                                      pass    
 test-armhf-armhf-libvirt-qcow2                               pass    
 test-amd64-amd64-xl-qcow2                                    fail    
 test-arm64-arm64-libvirt-raw                                 pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-libvirt-raw                                  pass    
 test-amd64-amd64-xl-rtds                                     pass    
 test-armhf-armhf-xl-rtds                                     pass    
 test-arm64-arm64-xl-seattle                                  pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow             pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow              pass    
 test-amd64-amd64-xl-shadow                                   pass    
 test-amd64-i386-xl-shadow                                    pass    
 test-arm64-arm64-xl-thunderx                                 pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-arm64-arm64-xl-vhd                                      pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-i386-xl-vhd                                       fail    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 375 lines long.)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-07  8:45 [xen-4.12-testing test] 169199: regressions - FAIL osstest service owner
@ 2022-04-08  7:01 ` Jan Beulich
  2022-04-08  8:09   ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2022-04-08  7:01 UTC (permalink / raw)
  To: xen-devel
  Cc: osstest service owner, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu, Dario Faggioli

On 07.04.2022 10:45, osstest service owner wrote:
> flight 169199 xen-4.12-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/169199/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 168480

While the subsequent flight passed, I thought I'd still look into
the logs here since the earlier flight had failed too. The state of
the machine when the debug keys were issued is somewhat odd (and
similar to the earlier failure's): 11 of the 56 CPUs try to
acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
All other CPUs are idle. The test failed because the sole guest
didn't reboot in time. Whether the failure is actually connected to
this apparent lock contention is unclear, though.

One can further see that really all about 70 ECS_PIRQ ports are
bound to vCPU 0 (which makes me wonder about lack of balancing
inside Dom0 itself, but that's unrelated). This means that all
other vCPU-s have nothing at all to do in evtchn_move_pirqs().
Since this moving of pIRQ-s is an optimization (the value of which
has been put under question in the past, iirc), I wonder whether we
shouldn't add a check to the function for the list being empty
prior to actually acquiring the lock. I guess I'll make a patch and
post it as RFC.

And of course in a mostly idle system the other aspect here (again)
is: Why are vCPU-s moved across pCPU-s in the first place? I've
observed (and reported) such seemingly over-aggressive vCPU
migration before, most recently in the context of putting together
'x86: make "dom0_nodes=" work with credit2'. Is there anything that
can be done about this in credit2?

A final, osstest-related question is: Does it make sense to run Dom0
on 56 vCPU-s, one each per pCPU? The bigger a system, the less
useful it looks to me to actually also have a Dom0 as big, when the
purpose of the system is to run guests, not meaningful other
workloads in Dom0. While this is Xen's default (i.e. in the absence
of command line options restricting Dom0), I don't think it's
representing typical use of Xen in the field.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08  7:01 ` Jan Beulich
@ 2022-04-08  8:09   ` Roger Pau Monné
  2022-04-08  9:25     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-08  8:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, osstest service owner, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Dario Faggioli

On Fri, Apr 08, 2022 at 09:01:11AM +0200, Jan Beulich wrote:
> On 07.04.2022 10:45, osstest service owner wrote:
> > flight 169199 xen-4.12-testing real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/169199/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 168480
> 
> While the subsequent flight passed, I thought I'd still look into
> the logs here since the earlier flight had failed too. The state of
> the machine when the debug keys were issued is somewhat odd (and
> similar to the earlier failure's): 11 of the 56 CPUs try to
> acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
> All other CPUs are idle. The test failed because the sole guest
> didn't reboot in time. Whether the failure is actually connected to
> this apparent lock contention is unclear, though.
> 
> One can further see that really all about 70 ECS_PIRQ ports are
> bound to vCPU 0 (which makes me wonder about lack of balancing
> inside Dom0 itself, but that's unrelated). This means that all
> other vCPU-s have nothing at all to do in evtchn_move_pirqs().
> Since this moving of pIRQ-s is an optimization (the value of which
> has been put under question in the past, iirc), I wonder whether we
> shouldn't add a check to the function for the list being empty
> prior to actually acquiring the lock. I guess I'll make a patch and
> post it as RFC.

Seems good to me.

I think a better model would be to migrate the PIRQs when fired, or
even better when EOI is performed?  So that Xen doesn't pointlessly
migrate PIRQs for vCPUs that aren't running.

> And of course in a mostly idle system the other aspect here (again)
> is: Why are vCPU-s moved across pCPU-s in the first place? I've
> observed (and reported) such seemingly over-aggressive vCPU
> migration before, most recently in the context of putting together
> 'x86: make "dom0_nodes=" work with credit2'. Is there anything that
> can be done about this in credit2?
> 
> A final, osstest-related question is: Does it make sense to run Dom0
> on 56 vCPU-s, one each per pCPU? The bigger a system, the less
> useful it looks to me to actually also have a Dom0 as big, when the
> purpose of the system is to run guests, not meaningful other
> workloads in Dom0. While this is Xen's default (i.e. in the absence
> of command line options restricting Dom0), I don't think it's
> representing typical use of Xen in the field.

I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
uses 16 for example.

Albeit not having such parameter has likely led you into figuring out
this issue, so it might not be so bad.  I agree however it's likely
better to test scenarios closer to real world usage.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08  8:09   ` Roger Pau Monné
@ 2022-04-08  9:25     ` Jan Beulich
  2022-04-08 11:01       ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2022-04-08  9:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, osstest service owner, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Dario Faggioli

On 08.04.2022 10:09, Roger Pau Monné wrote:
> On Fri, Apr 08, 2022 at 09:01:11AM +0200, Jan Beulich wrote:
>> On 07.04.2022 10:45, osstest service owner wrote:
>>> flight 169199 xen-4.12-testing real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/169199/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 168480
>>
>> While the subsequent flight passed, I thought I'd still look into
>> the logs here since the earlier flight had failed too. The state of
>> the machine when the debug keys were issued is somewhat odd (and
>> similar to the earlier failure's): 11 of the 56 CPUs try to
>> acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
>> All other CPUs are idle. The test failed because the sole guest
>> didn't reboot in time. Whether the failure is actually connected to
>> this apparent lock contention is unclear, though.
>>
>> One can further see that really all about 70 ECS_PIRQ ports are
>> bound to vCPU 0 (which makes me wonder about lack of balancing
>> inside Dom0 itself, but that's unrelated). This means that all
>> other vCPU-s have nothing at all to do in evtchn_move_pirqs().
>> Since this moving of pIRQ-s is an optimization (the value of which
>> has been put under question in the past, iirc), I wonder whether we
>> shouldn't add a check to the function for the list being empty
>> prior to actually acquiring the lock. I guess I'll make a patch and
>> post it as RFC.
> 
> Seems good to me.
> 
> I think a better model would be to migrate the PIRQs when fired, or
> even better when EOI is performed?  So that Xen doesn't pointlessly
> migrate PIRQs for vCPUs that aren't running.

Well, what the function does is mark the IRQ for migration only
(IRQ_MOVE_PENDING on x86). IRQs will only ever be migrated in the
process of finishing the handling of an actual instance of the
IRQ, as otherwise it's not safe / race-free.

>> And of course in a mostly idle system the other aspect here (again)
>> is: Why are vCPU-s moved across pCPU-s in the first place? I've
>> observed (and reported) such seemingly over-aggressive vCPU
>> migration before, most recently in the context of putting together
>> 'x86: make "dom0_nodes=" work with credit2'. Is there anything that
>> can be done about this in credit2?
>>
>> A final, osstest-related question is: Does it make sense to run Dom0
>> on 56 vCPU-s, one each per pCPU? The bigger a system, the less
>> useful it looks to me to actually also have a Dom0 as big, when the
>> purpose of the system is to run guests, not meaningful other
>> workloads in Dom0. While this is Xen's default (i.e. in the absence
>> of command line options restricting Dom0), I don't think it's
>> representing typical use of Xen in the field.
> 
> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> uses 16 for example.

I'm afraid a fixed number won't do, the more that iirc there are
systems with just a few cores in the pool (and you don't want to
over-commit by default). While for extreme cases it may not suffice,
I would like to suggest to consider using ceil(sqrt(nr_cpus)). But
of course this requires that osstest has a priori knowledge of how
many (usable) CPUs each system (pair) has, to be able to form such
a system-dependent command line option.

> Albeit not having such parameter has likely led you into figuring out
> this issue, so it might not be so bad.  I agree however it's likely
> better to test scenarios closer to real world usage.

True. One might conclude that we need both then. But of course that
would make each flight yet more resource hungry.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08  9:25     ` Jan Beulich
@ 2022-04-08 11:01       ` Roger Pau Monné
  2022-04-08 11:08         ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-08 11:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, osstest service owner, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Dario Faggioli

On Fri, Apr 08, 2022 at 11:25:28AM +0200, Jan Beulich wrote:
> On 08.04.2022 10:09, Roger Pau Monné wrote:
> > On Fri, Apr 08, 2022 at 09:01:11AM +0200, Jan Beulich wrote:
> >> On 07.04.2022 10:45, osstest service owner wrote:
> >>> flight 169199 xen-4.12-testing real [real]
> >>> http://logs.test-lab.xenproject.org/osstest/logs/169199/
> >>>
> >>> Regressions :-(
> >>>
> >>> Tests which did not succeed and are blocking,
> >>> including tests which could not be run:
> >>>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 168480
> >>
> >> While the subsequent flight passed, I thought I'd still look into
> >> the logs here since the earlier flight had failed too. The state of
> >> the machine when the debug keys were issued is somewhat odd (and
> >> similar to the earlier failure's): 11 of the 56 CPUs try to
> >> acquire (apparently) Dom0's event lock, from evtchn_move_pirqs().
> >> All other CPUs are idle. The test failed because the sole guest
> >> didn't reboot in time. Whether the failure is actually connected to
> >> this apparent lock contention is unclear, though.
> >>
> >> One can further see that really all about 70 ECS_PIRQ ports are
> >> bound to vCPU 0 (which makes me wonder about lack of balancing
> >> inside Dom0 itself, but that's unrelated). This means that all
> >> other vCPU-s have nothing at all to do in evtchn_move_pirqs().
> >> Since this moving of pIRQ-s is an optimization (the value of which
> >> has been put under question in the past, iirc), I wonder whether we
> >> shouldn't add a check to the function for the list being empty
> >> prior to actually acquiring the lock. I guess I'll make a patch and
> >> post it as RFC.
> > 
> > Seems good to me.
> > 
> > I think a better model would be to migrate the PIRQs when fired, or
> > even better when EOI is performed?  So that Xen doesn't pointlessly
> > migrate PIRQs for vCPUs that aren't running.
> 
> Well, what the function does is mark the IRQ for migration only
> (IRQ_MOVE_PENDING on x86). IRQs will only ever be migrated in the
> process of finishing the handling of an actual instance of the
> IRQ, as otherwise it's not safe / race-free.

Oh, OK, so then it doesn't seem to be that different from what I had
in mind.

> >> And of course in a mostly idle system the other aspect here (again)
> >> is: Why are vCPU-s moved across pCPU-s in the first place? I've
> >> observed (and reported) such seemingly over-aggressive vCPU
> >> migration before, most recently in the context of putting together
> >> 'x86: make "dom0_nodes=" work with credit2'. Is there anything that
> >> can be done about this in credit2?
> >>
> >> A final, osstest-related question is: Does it make sense to run Dom0
> >> on 56 vCPU-s, one each per pCPU? The bigger a system, the less
> >> useful it looks to me to actually also have a Dom0 as big, when the
> >> purpose of the system is to run guests, not meaningful other
> >> workloads in Dom0. While this is Xen's default (i.e. in the absence
> >> of command line options restricting Dom0), I don't think it's
> >> representing typical use of Xen in the field.
> > 
> > I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> > uses 16 for example.
> 
> I'm afraid a fixed number won't do, the more that iirc there are
> systems with just a few cores in the pool (and you don't want to
> over-commit by default).

But this won't over commit, it would just assign dom0 16 vCPUs at
most, if the system has less than 16 vCPUs that's what would be
assigned to dom0.

> While for extreme cases it may not suffice,
> I would like to suggest to consider using ceil(sqrt(nr_cpus)). But
> of course this requires that osstest has a priori knowledge of how
> many (usable) CPUs each system (pair) has, to be able to form such
> a system-dependent command line option.

Well, we could get this number when installing Xen, because at that
point the system is started and running plain Linux (so can see the
real topology). No need for osstest to have a priori knowledge.

> > Albeit not having such parameter has likely led you into figuring out
> > this issue, so it might not be so bad.  I agree however it's likely
> > better to test scenarios closer to real world usage.
> 
> True. One might conclude that we need both then. But of course that
> would make each flight yet more resource hungry.

Yes, let's focus on real-world uses first.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:01       ` Roger Pau Monné
@ 2022-04-08 11:08         ` Julien Grall
  2022-04-08 11:16           ` Roger Pau Monné
  2022-04-08 11:26           ` Andrew Cooper
  0 siblings, 2 replies; 15+ messages in thread
From: Julien Grall @ 2022-04-08 11:08 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel, osstest service owner, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, Dario Faggioli

Hi,

On 08/04/2022 12:01, Roger Pau Monné wrote:
>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>> uses 16 for example.
>>
>> I'm afraid a fixed number won't do, the more that iirc there are
>> systems with just a few cores in the pool (and you don't want to
>> over-commit by default).
> 
> But this won't over commit, it would just assign dom0 16 vCPUs at
> most, if the system has less than 16 vCPUs that's what would be
> assigned to dom0.

AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will 
get that number even if there are 8 pCPUs.

In fact, the documentation of dom0_max_vcpus suggests that the numbers 
of vCPUs can be more than the number of pCPUs.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:08         ` Julien Grall
@ 2022-04-08 11:16           ` Roger Pau Monné
  2022-04-08 11:24             ` Julien Grall
  2022-04-08 12:01             ` Jan Beulich
  2022-04-08 11:26           ` Andrew Cooper
  1 sibling, 2 replies; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-08 11:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, osstest service owner, Andrew Cooper,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
> Hi,
> 
> On 08/04/2022 12:01, Roger Pau Monné wrote:
> > > > I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> > > > uses 16 for example.
> > > 
> > > I'm afraid a fixed number won't do, the more that iirc there are
> > > systems with just a few cores in the pool (and you don't want to
> > > over-commit by default).
> > 
> > But this won't over commit, it would just assign dom0 16 vCPUs at
> > most, if the system has less than 16 vCPUs that's what would be
> > assigned to dom0.
> 
> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
> that number even if there are 8 pCPUs.
> 
> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
> vCPUs can be more than the number of pCPUs.

It was my understanding that you could only achieve that by using the
min-max nomenclature, so in order to force 16 vCPUs always you would
have to use:

dom0_max_vcpus=16-16

Otherwise the usage of '_max_' in the option name is pointless, and it
should instead be dom0_vcpus.

Anyway, I could use:

dom0_max_vcpus=1-16

Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
most.

But given Jans suggestion we might want to go for something more
complex?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:16           ` Roger Pau Monné
@ 2022-04-08 11:24             ` Julien Grall
  2022-04-08 15:26               ` Roger Pau Monné
  2022-04-08 12:01             ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Julien Grall @ 2022-04-08 11:24 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, osstest service owner, Andrew Cooper,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

Hi Roger,

On 08/04/2022 12:16, Roger Pau Monné wrote:
> On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
>> Hi,
>>
>> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>>> uses 16 for example.
>>>>
>>>> I'm afraid a fixed number won't do, the more that iirc there are
>>>> systems with just a few cores in the pool (and you don't want to
>>>> over-commit by default).
>>>
>>> But this won't over commit, it would just assign dom0 16 vCPUs at
>>> most, if the system has less than 16 vCPUs that's what would be
>>> assigned to dom0.
>>
>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
>> that number even if there are 8 pCPUs.
>>
>> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
>> vCPUs can be more than the number of pCPUs.
> 
> It was my understanding that you could only achieve that by using the
> min-max nomenclature, so in order to force 16 vCPUs always you would
> have to use:
> 
> dom0_max_vcpus=16-16
> 
> Otherwise the usage of '_max_' in the option name is pointless, and it
> should instead be dom0_vcpus.
> 
> Anyway, I could use:
> 
> dom0_max_vcpus=1-16
> 
> Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
> most.

Unfortunately, Arm doesn't support the min-max nomenclature.

> 
> But given Jans suggestion we might want to go for something more
> complex?

I think we already have some knowledge about each HW (i.e. grub vs 
uboot) in Osstest. So I think it would be fine to extend the knowledge 
and add the number of CPUs.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:08         ` Julien Grall
  2022-04-08 11:16           ` Roger Pau Monné
@ 2022-04-08 11:26           ` Andrew Cooper
  2022-04-08 11:56             ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2022-04-08 11:26 UTC (permalink / raw)
  To: Julien Grall, Roger Pau Monne, Jan Beulich
  Cc: xen-devel@lists.xenproject.org, osstest service owner,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

On 08/04/2022 12:08, Julien Grall wrote:
> Hi,
>
> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>> uses 16 for example.
>>>
>>> I'm afraid a fixed number won't do, the more that iirc there are
>>> systems with just a few cores in the pool (and you don't want to
>>> over-commit by default).
>>
>> But this won't over commit, it would just assign dom0 16 vCPUs at
>> most, if the system has less than 16 vCPUs that's what would be
>> assigned to dom0.
>
> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you
> will get that number even if there are 8 pCPUs.
>
> In fact, the documentation of dom0_max_vcpus suggests that the numbers
> of vCPUs can be more than the number of pCPUs.

XenServer uses dom0_max_vcpus=1-16 so we dont oversubscribe (even if
CPUs get turned off in firmware), but top out at 16.

It is possible to use this option to create more vcpus, but whether dom0
decides to do anything with them is up to dom0.  Linux won't go any
further than it can see CPUs listed in the ACPI tables (and yes, this is
a host/guest laying violation for PV dom0 where dom0 sees the system
ACPI tables.)

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:26           ` Andrew Cooper
@ 2022-04-08 11:56             ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-08 11:56 UTC (permalink / raw)
  To: Andrew Cooper, Julien Grall, Roger Pau Monne
  Cc: xen-devel@lists.xenproject.org, osstest service owner,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

On 08.04.2022 13:26, Andrew Cooper wrote:
> On 08/04/2022 12:08, Julien Grall wrote:
>> Hi,
>>
>> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>>> uses 16 for example.
>>>>
>>>> I'm afraid a fixed number won't do, the more that iirc there are
>>>> systems with just a few cores in the pool (and you don't want to
>>>> over-commit by default).
>>>
>>> But this won't over commit, it would just assign dom0 16 vCPUs at
>>> most, if the system has less than 16 vCPUs that's what would be
>>> assigned to dom0.
>>
>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you
>> will get that number even if there are 8 pCPUs.

Same on x86, afaict.

>> In fact, the documentation of dom0_max_vcpus suggests that the numbers
>> of vCPUs can be more than the number of pCPUs.
> 
> XenServer uses dom0_max_vcpus=1-16 so we dont oversubscribe (even if
> CPUs get turned off in firmware), but top out at 16.
> 
> It is possible to use this option to create more vcpus, but whether dom0
> decides to do anything with them is up to dom0.  Linux won't go any
> further than it can see CPUs listed in the ACPI tables (and yes, this is
> a host/guest laying violation for PV dom0 where dom0 sees the system
> ACPI tables.)

That has changed not so long ago, Linux will now use all vCPU-s
supplied by Xen. Since I was able to over-size Dom0 with XenoLinux,
I wanted to have the ability also with the upstream version.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:16           ` Roger Pau Monné
  2022-04-08 11:24             ` Julien Grall
@ 2022-04-08 12:01             ` Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2022-04-08 12:01 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, osstest service owner, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, Dario Faggioli, Julien Grall

On 08.04.2022 13:16, Roger Pau Monné wrote:
> On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
>> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>>> uses 16 for example.
>>>>
>>>> I'm afraid a fixed number won't do, the more that iirc there are
>>>> systems with just a few cores in the pool (and you don't want to
>>>> over-commit by default).
>>>
>>> But this won't over commit, it would just assign dom0 16 vCPUs at
>>> most, if the system has less than 16 vCPUs that's what would be
>>> assigned to dom0.
>>
>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
>> that number even if there are 8 pCPUs.
>>
>> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
>> vCPUs can be more than the number of pCPUs.
> 
> It was my understanding that you could only achieve that by using the
> min-max nomenclature, so in order to force 16 vCPUs always you would
> have to use:
> 
> dom0_max_vcpus=16-16
> 
> Otherwise the usage of '_max_' in the option name is pointless, and it
> should instead be dom0_vcpus.

I disagree: Unlike for DomU there's no way to keep a "reserve" of vCPU-s
for Dom0, except by offlining some once Dom0 runs. Hence this "max" in
the name is quite applicable.

Jan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 11:24             ` Julien Grall
@ 2022-04-08 15:26               ` Roger Pau Monné
  2022-04-08 16:11                 ` Andrew Cooper
  2022-04-11 10:59                 ` Julien Grall
  0 siblings, 2 replies; 15+ messages in thread
From: Roger Pau Monné @ 2022-04-08 15:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, osstest service owner, Andrew Cooper,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

On Fri, Apr 08, 2022 at 12:24:27PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 08/04/2022 12:16, Roger Pau Monné wrote:
> > On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 08/04/2022 12:01, Roger Pau Monné wrote:
> > > > > > I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> > > > > > uses 16 for example.
> > > > > 
> > > > > I'm afraid a fixed number won't do, the more that iirc there are
> > > > > systems with just a few cores in the pool (and you don't want to
> > > > > over-commit by default).
> > > > 
> > > > But this won't over commit, it would just assign dom0 16 vCPUs at
> > > > most, if the system has less than 16 vCPUs that's what would be
> > > > assigned to dom0.
> > > 
> > > AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
> > > that number even if there are 8 pCPUs.
> > > 
> > > In fact, the documentation of dom0_max_vcpus suggests that the numbers of
> > > vCPUs can be more than the number of pCPUs.
> > 
> > It was my understanding that you could only achieve that by using the
> > min-max nomenclature, so in order to force 16 vCPUs always you would
> > have to use:
> > 
> > dom0_max_vcpus=16-16
> > 
> > Otherwise the usage of '_max_' in the option name is pointless, and it
> > should instead be dom0_vcpus.
> > 
> > Anyway, I could use:
> > 
> > dom0_max_vcpus=1-16
> > 
> > Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
> > most.
> 
> Unfortunately, Arm doesn't support the min-max nomenclature.

Hm, can we update the command line document then?

There's no mention that the min-max nomenclature is only available to
x86. I assume it's not possible to share the logic here so that both
Arm and x86 parse the option in the same way?

> > 
> > But given Jans suggestion we might want to go for something more
> > complex?
> 
> I think we already have some knowledge about each HW (i.e. grub vs uboot) in
> Osstest. So I think it would be fine to extend the knowledge and add the
> number of CPUs.

We don't need to store this information anywhere I think. Since we
first install plain Debian and then install Xen we can always fetch
the number of physical CPUs when running plain Linux and use that to
calculate the amount to give to dom0?

Jan suggested using ceil(sqrt(nr_cpus)).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 15:26               ` Roger Pau Monné
@ 2022-04-08 16:11                 ` Andrew Cooper
  2022-04-08 16:26                   ` Stefano Stabellini
  2022-04-11 10:59                 ` Julien Grall
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2022-04-08 16:11 UTC (permalink / raw)
  To: Roger Pau Monne, Julien Grall
  Cc: Jan Beulich, xen-devel@lists.xenproject.org,
	osstest service owner, George Dunlap, Stefano Stabellini, Wei Liu,
	Dario Faggioli

On 08/04/2022 16:26, Roger Pau Monne wrote:
> On Fri, Apr 08, 2022 at 12:24:27PM +0100, Julien Grall wrote:
>> Hi Roger,
>>
>> On 08/04/2022 12:16, Roger Pau Monné wrote:
>>> On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>>>>> uses 16 for example.
>>>>>> I'm afraid a fixed number won't do, the more that iirc there are
>>>>>> systems with just a few cores in the pool (and you don't want to
>>>>>> over-commit by default).
>>>>> But this won't over commit, it would just assign dom0 16 vCPUs at
>>>>> most, if the system has less than 16 vCPUs that's what would be
>>>>> assigned to dom0.
>>>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
>>>> that number even if there are 8 pCPUs.
>>>>
>>>> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
>>>> vCPUs can be more than the number of pCPUs.
>>> It was my understanding that you could only achieve that by using the
>>> min-max nomenclature, so in order to force 16 vCPUs always you would
>>> have to use:
>>>
>>> dom0_max_vcpus=16-16
>>>
>>> Otherwise the usage of '_max_' in the option name is pointless, and it
>>> should instead be dom0_vcpus.
>>>
>>> Anyway, I could use:
>>>
>>> dom0_max_vcpus=1-16
>>>
>>> Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
>>> most.
>> Unfortunately, Arm doesn't support the min-max nomenclature.
> Hm, can we update the command line document then?
>
> There's no mention that the min-max nomenclature is only available to
> x86. I assume it's not possible to share the logic here so that both
> Arm and x86 parse the option in the same way?

TBH, this especially wants moving to common code.  It's atrocious UX to
have per-arch variations on the syntax for "how many vcpus".

>>> But given Jans suggestion we might want to go for something more
>>> complex?
>> I think we already have some knowledge about each HW (i.e. grub vs uboot) in
>> Osstest. So I think it would be fine to extend the knowledge and add the
>> number of CPUs.
> We don't need to store this information anywhere I think. Since we
> first install plain Debian and then install Xen we can always fetch
> the number of physical CPUs when running plain Linux and use that to
> calculate the amount to give to dom0?
>
> Jan suggested using ceil(sqrt(nr_cpus)).

I'm going to play devils advocate here.

Our CI system had demonstrated that the default behaviour in Xen is
broken.  And we're saying "lets bodge around it in the CI system to not
use the default behaviour".


The only user-friendly way of resolving this is to fix the default and
leave the CI alone.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 16:11                 ` Andrew Cooper
@ 2022-04-08 16:26                   ` Stefano Stabellini
  0 siblings, 0 replies; 15+ messages in thread
From: Stefano Stabellini @ 2022-04-08 16:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monne, Julien Grall, Jan Beulich,
	xen-devel@lists.xenproject.org, osstest service owner,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

On Fri, 8 Apr 2022, Andrew Cooper wrote:
> On 08/04/2022 16:26, Roger Pau Monne wrote:
> > On Fri, Apr 08, 2022 at 12:24:27PM +0100, Julien Grall wrote:
> >> Hi Roger,
> >>
> >> On 08/04/2022 12:16, Roger Pau Monné wrote:
> >>> On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
> >>>> Hi,
> >>>>
> >>>> On 08/04/2022 12:01, Roger Pau Monné wrote:
> >>>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
> >>>>>>> uses 16 for example.
> >>>>>> I'm afraid a fixed number won't do, the more that iirc there are
> >>>>>> systems with just a few cores in the pool (and you don't want to
> >>>>>> over-commit by default).
> >>>>> But this won't over commit, it would just assign dom0 16 vCPUs at
> >>>>> most, if the system has less than 16 vCPUs that's what would be
> >>>>> assigned to dom0.
> >>>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
> >>>> that number even if there are 8 pCPUs.
> >>>>
> >>>> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
> >>>> vCPUs can be more than the number of pCPUs.
> >>> It was my understanding that you could only achieve that by using the
> >>> min-max nomenclature, so in order to force 16 vCPUs always you would
> >>> have to use:
> >>>
> >>> dom0_max_vcpus=16-16
> >>>
> >>> Otherwise the usage of '_max_' in the option name is pointless, and it
> >>> should instead be dom0_vcpus.
> >>>
> >>> Anyway, I could use:
> >>>
> >>> dom0_max_vcpus=1-16
> >>>
> >>> Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
> >>> most.
> >> Unfortunately, Arm doesn't support the min-max nomenclature.
> > Hm, can we update the command line document then?
> >
> > There's no mention that the min-max nomenclature is only available to
> > x86. I assume it's not possible to share the logic here so that both
> > Arm and x86 parse the option in the same way?
> 
> TBH, this especially wants moving to common code.  It's atrocious UX to
> have per-arch variations on the syntax for "how many vcpus".

In my view, it would be OK to share the code, but I would also want to
retain the current behavior when e.g. dom0_max_vcpus=2 is specified.
Otherwise we break existing ARM tooling. (It is actually used by Yocto
and others.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xen-4.12-testing test] 169199: regressions - FAIL
  2022-04-08 15:26               ` Roger Pau Monné
  2022-04-08 16:11                 ` Andrew Cooper
@ 2022-04-11 10:59                 ` Julien Grall
  1 sibling, 0 replies; 15+ messages in thread
From: Julien Grall @ 2022-04-11 10:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, osstest service owner, Andrew Cooper,
	George Dunlap, Stefano Stabellini, Wei Liu, Dario Faggioli

Hi,

On 08/04/2022 16:26, Roger Pau Monné wrote:
> On Fri, Apr 08, 2022 at 12:24:27PM +0100, Julien Grall wrote:
>> Hi Roger,
>>
>> On 08/04/2022 12:16, Roger Pau Monné wrote:
>>> On Fri, Apr 08, 2022 at 12:08:02PM +0100, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 08/04/2022 12:01, Roger Pau Monné wrote:
>>>>>>> I could add a suitable dom0_max_vcpus parameter to osstest.  XenServer
>>>>>>> uses 16 for example.
>>>>>>
>>>>>> I'm afraid a fixed number won't do, the more that iirc there are
>>>>>> systems with just a few cores in the pool (and you don't want to
>>>>>> over-commit by default).
>>>>>
>>>>> But this won't over commit, it would just assign dom0 16 vCPUs at
>>>>> most, if the system has less than 16 vCPUs that's what would be
>>>>> assigned to dom0.
>>>>
>>>> AFAICT, this is not the case on Arm. If you ask 16 vCPUs, then you will get
>>>> that number even if there are 8 pCPUs.
>>>>
>>>> In fact, the documentation of dom0_max_vcpus suggests that the numbers of
>>>> vCPUs can be more than the number of pCPUs.
>>>
>>> It was my understanding that you could only achieve that by using the
>>> min-max nomenclature, so in order to force 16 vCPUs always you would
>>> have to use:
>>>
>>> dom0_max_vcpus=16-16
>>>
>>> Otherwise the usage of '_max_' in the option name is pointless, and it
>>> should instead be dom0_vcpus.
>>>
>>> Anyway, I could use:
>>>
>>> dom0_max_vcpus=1-16
>>>
>>> Which is unambiguous and should get us 1 vCPU at least, or 16vCPUs at
>>> most.
>>
>> Unfortunately, Arm doesn't support the min-max nomenclature.
> 
> Hm, can we update the command line document then?
> 
> There's no mention that the min-max nomenclature is only available to
> x86. I assume it's not possible to share the logic here so that both
> Arm and x86 parse the option in the same way?

Looking at the x86 implementation, I think we can re-use everything but 
the pv_shim and NUMA bits.

> We don't need to store this information anywhere I think. Since we
> first install plain Debian and then install Xen we can always fetch
> the number of physical CPUs when running plain Linux and use that to
> calculate the amount to give to dom0?

You will need to check how that works with U-boot. I can't remember 
whether the script is loaded via tftp or stored on the local disk.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-11 11:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-07  8:45 [xen-4.12-testing test] 169199: regressions - FAIL osstest service owner
2022-04-08  7:01 ` Jan Beulich
2022-04-08  8:09   ` Roger Pau Monné
2022-04-08  9:25     ` Jan Beulich
2022-04-08 11:01       ` Roger Pau Monné
2022-04-08 11:08         ` Julien Grall
2022-04-08 11:16           ` Roger Pau Monné
2022-04-08 11:24             ` Julien Grall
2022-04-08 15:26               ` Roger Pau Monné
2022-04-08 16:11                 ` Andrew Cooper
2022-04-08 16:26                   ` Stefano Stabellini
2022-04-11 10:59                 ` Julien Grall
2022-04-08 12:01             ` Jan Beulich
2022-04-08 11:26           ` Andrew Cooper
2022-04-08 11:56             ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.