From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: "Cédric Le Goater" <clg@kaod.org>,
"Richard Henderson" <richard.henderson@linaro.org>,
qemu-devel <qemu-devel@nongnu.org>,
clombard@linux.vnet.ibm.com
Subject: Re: powernv gitlab ci regression
Date: Tue, 21 Dec 2021 18:33:36 -0300 [thread overview]
Message-ID: <8ab3b465-dfec-8579-2b96-e0383f9bcc6f@gmail.com> (raw)
In-Reply-To: <b454cba4-7254-fbe6-0235-8a0778f2938d@gmail.com>
On 12/21/21 06:44, Daniel Henrique Barboza wrote:
>
>
> On 12/21/21 05:20, Cédric Le Goater wrote:
>> On 12/21/21 03:37, Daniel Henrique Barboza wrote:
>>> Hey,
>>>
>>> On 12/20/21 18:35, Richard Henderson wrote:
>>>> Hi guys,
>>>>
>>>> Somewhere within
>>>>
>>>>> Merge tag 'pull-ppc-20211217' of https://github.com/legoater/qemu into staging
>>>>> ppc 7.0 queue:
>>>>>
>>>>> * General cleanup for Mac machines (Peter)
>>>>> * Fixes for FPU exceptions (Lucas)
>>>>> * Support for new ISA31 instructions (Matheus)
>>>>> * Fixes for ivshmem (Daniel)
>>>>> * Cleanups for PowerNV PHB (Christophe and Cedric)
>>>>> * Updates of PowerNV and pSeries documentation (Leonardo and Daniel)
>>>>> * Fixes for PowerNV (Daniel)
>>>>> * Large cleanup of FPU implementation (Richard)
>>>>> * Removal of SoftTLBs support for PPC74x CPUs (Fabiano)
>>>>> * Fixes for exception models in MPCx and 60x CPUs (Fabiano)
>>>>> * Removal of 401/403 CPUs (Cedric)
>>>>> * Deprecation of taihu machine (Thomas)
>>>>> * Large rework of PPC405 machine (Cedric)
>>>>> * Fixes for VSX instructions (Victor and Matheus)
>>>>> * Fix for e6500 CPU (Fabiano)
>>>>> * Initial support for PMU (Daniel)
>>>>
>>>> is something that has caused a timeout regression in avocado-system-centos:
>>>>
>>>>> (047/171) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '047-tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8', 'logdir': '/builds/qemu-project/qemu/build/tests/results/job-2021-12-17T19.23-... (90.46 s)
>>>>> (048/171) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '048-tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9', 'logdir': '/builds/qemu-project/qemu/build/tests/results/job-2021-12-17T19.23-... (90.55 s)
>>>>
>>>> See e.g. https://gitlab.com/qemu-project/qemu/-/jobs/1898304074
>>>
>>> Thanks for letting us know. I bisected it and the culprit is this patch:
>>>
>>>
>>> commit 4db3907a40a087e2cc1839d19a3642539d36610b
>>> Author: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> Date: Fri Dec 17 17:57:18 2021 +0100
>>>
>>> target/ppc: enable PMU instruction count
>>>
>>>
>>> This is a patch where I added instruction count in the ppc64 PMU. After this patch the
>>> performance of these 2 tests are degraded to the point where we're hitting timeouts in
>>> gitlab (didn't hit timeouts in my machine but the performance is noticeable worse).
>>>
>>> I'll need to see the serial console of the VM booting up to evaluate if there's some kernel
>>> module during boot time that is using the PMU and causing the delay. I'll also take a look
>>> into improving the performance as well (e.g. using more TCG code and avoid calling helpers).\
Just rewrote the fore-mentioned patch using TCG ops. Here's some numbers running the tests on
my local machine:
- using current master:
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (71.00 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (69.57 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (76.04 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (72.62 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (76.50 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (73.58 s)
- after my TCG Ops rewrite to count instructions:
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.97 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (40.19 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (41.76 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (40.88 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (41.49 s)
(1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (42.04 s)
Also, there's a high possibility that the code I wrote is not optimized since I'm not well
versed with TCG ops/code. I expect that after a couple of reviews from Richard we might be able
to bring down those numbers even further.
I'll clean this up and send for review.
Thanks,
Daniel
>>
>> Run with :
>>
>> build/tests/venv/bin/avocado --show=app,console run -t machine:powernv9 build/tests/avocado/boot_linux_console.py
>>
>> * 6.2
>>
>> ...
>> console: [ 1.559904] PCI: CLS 0 bytes, default 128
>> /console: [ 8.830245] Initialise system trusted keyrings
>> console: [ 8.832347] Key type blacklist registered
>> console: [ 8.834558] workingset: timestamp_bits=54 max_order=14 bucket_order=0
>> console: [ 9.073051] integrity: Platform Keyring initialized
>> console: [ 9.073586] Key type asymmetric registered
>> console: [ 9.074025] Asymmetric key parser 'x509' registered
>> console: [ 9.075359] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
>> console: [ 9.095115] IPMI message handler: version 39.2
>> console: [ 9.096161] ipmi device interface
>> console: [ 9.514308] ipmi-powernv ibm,opal:ipmi: IPMI message handler: Found new BMC (man_id: 0x000000, prod_id: 0x0000, dev_id: 0x20)
>> -console: [ 10.171273] IPMI Watchdog: driver initialized
>> \console: [ 10.974462] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
>> console: [ 10.975059] hvc0: No interrupts property, using OPAL event
>> console: [ 10.989699] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
>> console: [ 11.156033] brd: module loaded
>> console: [ 11.235965] loop: module loaded
>> console: [ 11.249922] libphy: Fixed MDIO Bus: probed
>> console: [ 11.254128] i2c /dev entries driver
>> console: [ 11.255782] powernv-cpufreq: ibm,pstate-min node not found
>> console: [ 11.256134] powernv-cpufreq: Platform driver disabled. System does not support PState control
>> console: [ 11.273326] ipip: IPv4 and MPLS over IPv4 tunneling driver
>> console: [ 11.303989] NET: Registered protocol family 10
>> console: [ 11.323651] Segment Routing with IPv6
>> console: [ 11.325267] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
>> console: [ 11.335866] NET: Registered protocol family 17
>> console: [ 11.336900] Key type dns_resolver registered
>> console: [ 11.337358] secvar-sysfs: secvar: failed to retrieve secvar operations.
>> console: [ 11.337877] drmem: No dynamic reconfiguration memory found
>> console: [ 11.341767] Loading compiled-in X.509 certificates
>> console: [ 11.362272] Loaded X.509 cert 'Build time autogenerated kernel key: 987b64c96d830fe42d02bbf502e028ebe85c2b4e'
>> console: [ 11.667162] Key type encrypted registered
>> console: [ 11.674616] ima: No TPM chip found, activating TPM-bypass!
>> console: [ 11.676949] ima: Allocated hash algorithm: sha256
>> console: [ 11.682967] Secure boot mode disabled
>> console: [ 11.683726] Trusted boot mode disabled
>> console: [ 11.684075] ima: No architecture policies found
>> console: [ 11.748319] Freeing unused kernel memory: 13696K
>> console: [ 11.748717] This architecture does not have kernel memory protection.
>> console: [ 11.750290] Run /init as init process
>> /console: [ 13.712943] udevd[74]: starting version 3.2.9
>> console: [ 13.731186] random: udevd: uninitialized urandom read (16 bytes read)
>> console: [ 13.735595] random: udevd: uninitialized urandom read (16 bytes read)
>> console: [ 13.737907] random: udevd: uninitialized urandom read (16 bytes read)
>> -console: [ 13.832821] udevd[75]: starting eudev-3.2.9
>> |console: [ 16.333618] PTP clock support registered
>> console: [ 16.601330] e1000e: Intel(R) PRO/1000 Network Driver
>> console: [ 16.601791] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>> console: [ 16.607410] e1000e 0001:02:03.0: enabling device (0100 -> 0102)
>> console: [ 16.617706] e1000e 0001:02:03.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
>> console: [ 16.697120] usbcore: registered new interface driver usbfs
>> console: [ 16.699742] usbcore: registered new interface driver hub
>> console: [ 16.702753] usbcore: registered new device driver usb
>> console: [ 16.762288] nvme nvme0: pci function 0002:01:00.0
>> console: [ 16.763998] nvme 0002:01:00.0: enabling device (0100 -> 0102)
>> /console: [ 16.881654] rtc-opal opal-rtc: registered as rtc0
>> console: [ 17.032286] rtc-opal opal-rtc: setting system clock to 2021-12-21T03:38:58 UTC (1640057938)
>> console: [ 17.052213] nvme nvme0: 1/0/0 default/read/poll queues
>> console: [ 17.055242] e1000e 0001:02:03.0 0001:02:03.0 (uninitialized): registered PHC clock
>> console: [ 17.213376] e1000e 0001:02:03.0 eth0: (PCI Express:2.5GT/s:Width x1) 52:54:00:12:34:57
>> console: [ 17.213982] e1000e 0001:02:03.0 eth0: Intel(R) PRO/1000 Network Connection
>> console: [ 17.214655] e1000e 0001:02:03.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
>> console: [ 17.399217] xhci_hcd 0001:02:02.0: xHCI Host Controller
>> console: [ 17.400904] xhci_hcd 0001:02:02.0: new USB bus registered, assigned bus number 1
>> console: [ 17.408566] xhci_hcd 0001:02:02.0: hcc params 0x00080001 hci version 0x100 quirks 0x0000000000000014
>> -console: [ 17.855122] hub 1-0:1.0: USB hub found
>> console: [ 17.901064] hub 1-0:1.0: 4 ports detected
>> console: [ 17.997599] xhci_hcd 0001:02:02.0: xHCI Host Controller
>> console: [ 17.998280] xhci_hcd 0001:02:02.0: new USB bus registered, assigned bus number 2
>> console: [ 17.999035] xhci_hcd 0001:02:02.0: Host supports USB 3.0 SuperSpeed
>> console: [ 18.104390] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
>> console: [ 18.311117] hub 2-0:1.0: USB hub found
>> console: [ 18.481076] hub 2-0:1.0: 4 ports detected
>> console: [ 18.676328] 1 fixed-partitions partitions found on MTD device flash@0
>> console: [ 18.676887] Creating 1 MTD partitions on "flash@0":
>> PASS (42.00 s)
>> RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> JOB TIME : 42.60 s
>>
>>
>> * 7.0
>>
>> ....
>> console: [ 3.669714] PCI: CLS 0 bytes, default 128
>> \console: [ 38.778763] Initialise system trusted keyrings
>> console: [ 38.781487] Key type blacklist registered
>> console: [ 38.786515] workingset: timestamp_bits=54 max_order=14 bucket_order=0
>> |console: [ 39.406270] integrity: Platform Keyring initialized
>> console: [ 39.407161] Key type asymmetric registered
>> console: [ 39.407879] Asymmetric key parser 'x509' registered
>> console: [ 39.411112] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
>> console: [ 39.458660] IPMI message handler: version 39.2
>> console: [ 39.461141] ipmi device interface
>> console: [ 39.864578] ipmi-powernv ibm,opal:ipmi: IPMI message handler: Found new BMC (man_id: 0x000000, prod_id: 0x0000, dev_id: 0x20)
>> /console: [ 40.518598] IPMI Watchdog: driver initialized
>> \console: [ 42.679606] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
>> console: [ 42.680490] hvc0: No interrupts property, using OPAL event
>> console: [ 42.718661] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
>> console: [ 43.131333] brd: module loaded
>> |console: [ 43.331465] loop: module loaded
>> console: [ 43.366052] libphy: Fixed MDIO Bus: probed
>> console: [ 43.375750] i2c /dev entries driver
>> console: [ 43.380135] powernv-cpufreq: ibm,pstate-min node not found
>> console: [ 43.380691] powernv-cpufreq: Platform driver disabled. System does not support PState control
>> console: [ 43.423981] ipip: IPv4 and MPLS over IPv4 tunneling driver
>> console: [ 43.497912] NET: Registered protocol family 10
>> console: [ 43.540218] Segment Routing with IPv6
>> console: [ 43.542407] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
>> console: [ 43.568856] NET: Registered protocol family 17
>> console: [ 43.570360] Key type dns_resolver registered
>> console: [ 43.571289] secvar-sysfs: secvar: failed to retrieve secvar operations.
>> console: [ 43.572075] drmem: No dynamic reconfiguration memory found
>> console: [ 43.579066] Loading compiled-in X.509 certificates
>> console: [ 43.621261] Loaded X.509 cert 'Build time autogenerated kernel key: 987b64c96d830fe42d02bbf502e028ebe85c2b4e'
>> console: [ 44.052308] Key type encrypted registered
>> console: [ 44.061477] ima: No TPM chip found, activating TPM-bypass!
>> console: [ 44.064700] ima: Allocated hash algorithm: sha256
>> console: [ 44.072905] Secure boot mode disabled
>> console: [ 44.074908] Trusted boot mode disabled
>> console: [ 44.075445] ima: No architecture policies found
>> /console: [ 44.157077] Freeing unused kernel memory: 13696K
>> console: [ 44.157843] This architecture does not have kernel memory protection.
>> console: [ 44.160154] Run /init as init process
>> |console: [ 47.520497] udevd[74]: starting version 3.2.9
>> console: [ 47.555948] random: udevd: uninitialized urandom read (16 bytes read)
>> console: [ 47.563916] random: udevd: uninitialized urandom read (16 bytes read)
>> console: [ 47.567528] random: udevd: uninitialized urandom read (16 bytes read)
>> console: [ 47.729266] udevd[75]: starting eudev-3.2.9
>> /console: [ 49.004222] urandom_read: 5 callbacks suppressed
>> console: [ 49.004347] random: udevd: uninitialized urandom read (16 bytes read)
>> |console: [ 59.758967] PTP clock support registered
>> /console: [ 60.749571] e1000e: Intel(R) PRO/1000 Network Driver
>> console: [ 60.750176] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>> console: [ 60.763411] e1000e 0001:02:03.0: enabling device (0100 -> 0102)
>> console: [ 60.775134] e1000e 0001:02:03.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
>> console: [ 60.840369] nvme nvme0: pci function 0002:01:00.0
>> console: [ 60.842333] nvme 0002:01:00.0: enabling device (0100 -> 0102)
>> console: [ 61.001290] usbcore: registered new interface driver usbfs
>> console: [ 61.008092] usbcore: registered new interface driver hub
>> console: [ 61.013320] usbcore: registered new device driver usb
>> -console: [ 61.260003] e1000e 0001:02:03.0 0001:02:03.0 (uninitialized): registered PHC clock
>> console: [ 61.439568] e1000e 0001:02:03.0 eth0: (PCI Express:2.5GT/s:Width x1) 52:54:00:12:34:57
>> console: [ 61.440440] e1000e 0001:02:03.0 eth0: Intel(R) PRO/1000 Network Connection
>> console: [ 61.441473] e1000e 0001:02:03.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
>> console: [ 61.453243] nvme nvme0: 1/0/0 default/read/poll queues
>> console: [ 61.852525] rtc-opal opal-rtc: registered as rtc0
>> console: [ 62.041247] rtc-opal opal-rtc: setting system clock to 2021-12-21T03:39:09 UTC (1640057949)
>> console: [ 62.175046] xhci_hcd 0001:02:02.0: xHCI Host Controller
>> console: [ 62.176904] xhci_hcd 0001:02:02.0: new USB bus registered, assigned bus number 1
>> console: [ 62.188065] xhci_hcd 0001:02:02.0: hcc params 0x00080001 hci version 0x100 quirks 0x0000000000000014
>> \console: [ 62.849997] hub 1-0:1.0: USB hub found
>> console: [ 62.998795] hub 1-0:1.0: 4 ports detected
>> |console: [ 63.232067] xhci_hcd 0001:02:02.0: xHCI Host Controller
>> console: [ 63.233192] xhci_hcd 0001:02:02.0: new USB bus registered, assigned bus number 2
>> console: [ 63.234507] xhci_hcd 0001:02:02.0: Host supports USB 3.0 SuperSpeed
>> console: [ 63.468360] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
>> console: [ 63.786654] hub 2-0:1.0: USB hub found
>> console: [ 63.908912] hub 2-0:1.0: 4 ports detected
>> /console: [ 65.069424] 1 fixed-partitions partitions found on MTD device flash@0
>> console: [ 65.070221] Creating 1 MTD partitions on "flash@0":
>> PASS (89.13 s)
>> RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> JOB TIME : 89.77 s
>>
>>
>> Emulation is twice as slow. That's not good.
>>
>>
>>> It might be the case that the performance gain is enough to make these tests happy again,
>>> although my initial guess is that there's something during boot that is starting the PMU and
>>> leaving it running.
>>
>> MMCR0 is set to 0 when kernel is started. I guess the modeling is wrong in
>> pmc_is_inactive() or pmc_get_event().
>
>
> Well that's not great. As far as the ISA goes, MMCR0 fully cleared indicates that the
> PMU is counting instruction and cycles in PMCs 5 and 6. In fact, the initial MMCR0
> value is explicitly set to 0x80000000 (frozen bit set) in cpu_init.c because of that.
> The programming note of section 10.2 of PowerISA 3.1 also mentions it:
>
> ----
> The Performance Monitor can be effectively disabled (i.e., put
> into a state in which Performance Monitor SPRs are not altered
> and Performance Monitor exceptions do not occur) by setting
> MMCR0 to 0x0000_0000_8000_0000.
> --
>
> Anyway, I just tested a new assumption where if all the PMU control registers (MMCR0,
> MMCR1, MMCR2 and MMCRA) are all cleared at the same time the PMU is not running. The idea
> is that this condition matches what we might see in an early boot. This fixed the problem
> with the avocado tests. I need to test this new condition with the pseries PMU and EBB test
> suits, but I have a guess that it'll be fine there as well - the Perf driver is always
> writing something in these registers when using the PMU.
>
>
> Meanwhile I'll see if I can get in touch with the Perf kernel folks. Setting MMCR0 to
> zero at kernel start, if you follow ISA 3.1 to the letter, will start event counting
> because all the frozen bits are cleared.
>
>
> Thanks,
>
>
> Daniel
>
>
>>
>> Thanks,
>>
>> C.
next prev parent reply other threads:[~2021-12-21 21:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-20 21:35 powernv gitlab ci regression Richard Henderson
2021-12-21 2:37 ` Daniel Henrique Barboza
2021-12-21 8:20 ` Cédric Le Goater
2021-12-21 9:44 ` Daniel Henrique Barboza
2021-12-21 21:33 ` Daniel Henrique Barboza [this message]
2021-12-21 22:30 ` Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ab3b465-dfec-8579-2b96-e0383f9bcc6f@gmail.com \
--to=danielhb413@gmail.com \
--cc=clg@kaod.org \
--cc=clombard@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).