* further issues with MGA G200 graphics chipset @ 2026-04-22 23:55 Jacob Keller 2026-04-23 0:05 ` David Airlie 2026-04-23 7:44 ` Thomas Zimmermann 0 siblings, 2 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-22 23:55 UTC (permalink / raw) To: Jocelyn Falempe, Thomas Zimmermann, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hello, You may recall the issues I recently reported and submitted a fix for in the mgag200 DRM driver from [1]. [1]: https://lore.kernel.org/all/20260202-jk-mgag200-fix-bad-udelay-v2-1-ce1e9665987d@intel.com/ I recently have been running into another issue with the mgag200 graphics driver on a similar platform. I noticed occasional spikes where Tx timestamps from the ice driver were delayed, very similar behavior to what was going on with the original bug report. However, this was on a system running v6.12.76, which contains my MGA G200 usleep fix. I analyzed the data with perf and have discovered what looks like another issue where the mgag200 polling routine is causing us issues. Here's a perf report which captures the cycles samples between the start of a Tx timestamp request and the point where we report it to the stack: > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] ret_from_fork_asm > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] ret_from_fork > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] kthread > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] worker_thread > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] process_one_work > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] output_poll_execute > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_client_dev_hotplug > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_fbdev_shmem_client_hotplug > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_fb_helper_hotplug_event > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_client_modeset_probe > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_helper_probe_single_connector_modes > + 89.87% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_vga_bmc_connector_helper_get_modes > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_connector_helper_get_modes > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_edid_read > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_edid_read_custom > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] _drm_do_get_edid > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] edid_block_read > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_do_probe_ddc_edid > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] i2c_transfer > + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __i2c_transfer > + 89.87% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] bit_xfer > - 59.65% 59.65% kworker/65:1-ev [kernel.kallsyms] [k] delay_halt_tpause > ret_from_fork_asm > ret_from_fork > kthread > worker_thread > process_one_work > output_poll_execute > drm_client_dev_hotplug > drm_fbdev_shmem_client_hotplug > drm_fb_helper_hotplug_event > drm_client_modeset_probe > drm_helper_probe_single_connector_modes > mgag200_vga_bmc_connector_helper_get_modes > drm_connector_helper_get_modes > drm_edid_read > drm_edid_read_custom > _drm_do_get_edid > edid_block_read > drm_do_probe_ddc_edid > i2c_transfer > __i2c_transfer > + bit_xfer > + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __udelay > + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __const_udelay > + 51.11% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] sclhi > + 30.22% 30.22% kworker/65:1-ev [kernel.kallsyms] [k] ioread8 > + 7.30% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] delay_halt > + 7.30% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] acknak > + 7.29% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_ddc_algo_bit_data_setscl > + 5.02% 0.00% swapper [kernel.kallsyms] [k] secondary_startup_64 > + 5.02% 0.00% swapper [kernel.kallsyms] [k] start_secondary > + 5.02% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry > + 5.02% 0.00% swapper [kernel.kallsyms] [k] do_idle > + 3.60% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle > + 3.60% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter > + 3.53% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state > + 2.57% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_ddc_algo_bit_data_setsda > + 2.14% 0.00% perf [unknown] [k] 0xffffffffffffffff > + 2.14% 0.00% perf perf [.] __cmd_record.constprop.0 > + 2.14% 0.00% perf [kernel.kallsyms] [k] entry_SYSCALL_64 > + 2.14% 0.00% perf [kernel.kallsyms] [k] do_syscall_64 > + 2.14% 0.00% perf [kernel.kallsyms] [k] x64_sys_call > + 2.06% 2.06% swapper [kernel.kallsyms] [k] intel_idle > + 1.31% 0.42% perf [kernel.kallsyms] [k] do_sys_poll > + 1.31% 0.00% perf perf [.] fdarray__poll > + 1.31% 0.00% perf libc.so.6 [.] __poll > + 1.31% 0.00% perf [kernel.kallsyms] [k] __x64_sys_poll > + 1.06% 0.00% systemd-journal systemd-journald [.] 0x00005d6bb7cb3f64 > + 1.06% 0.00% systemd-journal libc.so.6 [.] __libc_start_main > + 1.06% 0.00% systemd-journal libc.so.6 [.] 0x00007d6ce3a2a1c9 > + 1.06% 0.00% systemd-journal systemd-journald [.] 0x00005d6bb7cb389e > + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] sd_event_run > + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] sd_event_dispatch > + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] 0x00007d6ce409d413 > + 1.00% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] i2c_stop > + 0.83% 0.00% perf [kernel.kallsyms] [k] perf_poll > + 0.83% 0.00% perf perf [.] record__mmap_read_evlist > As you can see, in this case we are spending +60% of the cycles in delay_halt_tpause which is part of the bit_xfer function for implementing i2c. I also occasionally see these messages coming on dmesg: > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND > Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND > Apr 20 23:14:45 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND > Apr 20 23:15:15 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND > Apr 20 23:15:25 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND > Apr 20 23:15:46 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND > Apr 20 23:16:27 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND > Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND > Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND > Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND > Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND > Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND > Apr 20 23:17:49 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND > Apr 20 23:20:33 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND > Apr 20 23:26:00 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND > Apr 20 23:36:56 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND > Apr 20 23:58:46 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND > Apr 21 00:34:27 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND > Apr 21 00:42:28 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND > Apr 21 02:09:51 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND > Apr 21 03:27:40 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND > Apr 21 05:04:37 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 2051 times, consider switching to WQ_UNBOUND > Apr 21 08:09:39 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND > Apr 21 08:10:07 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND > Apr 21 08:10:10 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND > Apr 21 08:10:21 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND > Apr 21 09:14:18 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 2051 times, consider switching to WQ_UNBOUND > Apr 21 10:54:08 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 4099 times, consider switching to WQ_UNBOUND > Apr 21 21:11:47 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4099 times, consider switching to WQ_UNBOUND > Apr 21 22:33:11 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 8195 times, consider switching to WQ_UNBOUND > Apr 22 20:31:04 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 8195 times, consider switching to WQ_UNBOUND > Apr 22 21:51:17 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 16387 times, consider switching to WQ_UNBOUND These all appear to be workqueue warnings about functions that are hogging CPU. If I look carefully, it looks like they are all possibly related to the same mgag200 driver. At the very least output_poll_execute is certainly related to the mgag200 stall. I do noot understand exactly what is causing the driver to get stuck, its something in the i2c routine for reading the EDID block. I also see this being printed: EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 It appears to print quite consistently every few seconds. I guess this might be possibly related to a bad EDID block on the mgag200 device? What does this even mean? I am not sure how I'd go about verifying this, or root causing what is going wrong. It looks like we print the message as part of _drm_do_get_edid(), and this definitely is called as part of the mgag200 routines: > - 33.33% 33.33% kworker/64:1-ev [kernel.kallsyms] [k] _drm_do_get_edid > ret_from_fork_asm > ret_from_fork > kthread > worker_thread > process_one_work > output_poll_execute > drm_client_dev_hotplug > drm_fbdev_shmem_client_hotplug > drm_fb_helper_hotplug_event > drm_client_modeset_probe > drm_helper_probe_single_connector_modes > mgag200_vga_bmc_connector_helper_get_modes > drm_connector_helper_get_modes > drm_edid_read > drm_edid_read_custom > _drm_do_get_edid This makes me think that we're reading a bad EDID. I enabled drm.debug setting to get more data: > Apr 22 23:47:11 1762811 kernel: EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:connector_bad_edid] [CONNECTOR:36:VGA-1] EDID is invalid: > Apr 22 23:47:11 1762811 kernel: [00] BAD 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x720": 60 74250 1280 1390 1430 1650 720 725 730 750 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 68250 1280 1328 1360 1440 768 771 778 790 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 79500 1280 1344 1472 1664 768 771 778 798 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 71000 1280 1328 1360 1440 800 803 809 823 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 83500 1280 1352 1480 1680 800 803 809 831 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x960": 60 108000 1280 1376 1488 1800 960 961 964 1000 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1360x768": 60 85500 1360 1424 1536 1792 768 771 777 795 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 85500 1366 1436 1579 1792 768 771 774 798 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 72000 1366 1380 1436 1500 768 769 772 800 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 101000 1400 1448 1480 1560 1050 1053 1057 1080 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 121750 1400 1488 1632 1864 1050 1053 1057 1089 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 88750 1440 1488 1520 1600 900 903 909 926 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 106500 1440 1520 1672 1904 900 903 909 934 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x900": 60 108000 1600 1624 1704 1800 900 901 904 1000 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x1200": 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 119000 1680 1728 1760 1840 1050 1053 1059 1080 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 146250 1680 1784 1960 2240 1050 1053 1059 1089 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1792x1344": 60 204750 1792 1920 2120 2448 1344 1345 1348 1394 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1856x1392": 60 218250 1856 1952 2176 2528 1392 1393 1396 1439 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0xa (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 154000 1920 1968 2000 2080 1200 1203 1209 1235 0x40 0x9 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 193250 1920 2056 2256 2592 1200 1203 1209 1245 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1440": 60 234000 1920 2048 2256 2600 1440 1441 1444 1500 0x40 0x6 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "2048x1152": 60 162000 2048 2074 2154 2250 1152 1153 1156 1200 0x40 0x5 (VIRTUAL_X) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:36:VGA-1] probed modes: > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x48 0xa > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] enabled? yes > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] Not using firmware configuration > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for cmdline mode > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for preferred mode, tile 0 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] Found mode 1024x768 > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] picking CRTCs for 1024x768 config > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CRTC:34:crtc-0] desired mode 1024x768 set (0,0) > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_dev_hotplug] fbdev: ret=0 Does anyone have any idea whats going wrong here? A google search seems to imply this is reading the EDID data from the VGA cable... I'm also curious if its possible to stop polling for so long with udelay in the i2c logic somehow? I am not very familiar with i2c, but it is frustrating that this driver is causing yet another stall that is impacting timing sensitive data. Even if in this case its due to a faulty cable.. it is frustrating that such result causes the PTP failures. Would switching to WQ_UNBOUND be helpful here at all? Thanks, Jake ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-22 23:55 further issues with MGA G200 graphics chipset Jacob Keller @ 2026-04-23 0:05 ` David Airlie 2026-04-23 21:39 ` Jacob Keller 2026-04-23 7:44 ` Thomas Zimmermann 1 sibling, 1 reply; 20+ messages in thread From: David Airlie @ 2026-04-23 0:05 UTC (permalink / raw) To: Jacob Keller Cc: Jocelyn Falempe, Thomas Zimmermann, dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen > > These all appear to be workqueue warnings about functions that are > hogging CPU. If I look carefully, it looks like they are all possibly > related to the same mgag200 driver. At the very least > output_poll_execute is certainly related to the mgag200 stall. > > I do noot understand exactly what is causing the driver to get stuck, > its something in the i2c routine for reading the EDID block. > > I also see this being printed: > > EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 > > It appears to print quite consistently every few seconds. I guess this > might be possibly related to a bad EDID block on the mgag200 device? > What does this even mean? > It sounds like the polling is having trouble with the i2c bus even if there is no cable plugged in, probably cheaped out on some pull up/down resistors on the VGA connector. does adding drm_kms_helper.poll=0 help to the command line help? Dave. > I am not sure how I'd go about verifying this, or root causing what is > going wrong. > > It looks like we print the message as part of _drm_do_get_edid(), and > this definitely is called as part of the mgag200 routines: > > > - 33.33% 33.33% kworker/64:1-ev [kernel.kallsyms] [k] _drm_do_get_edid > > ret_from_fork_asm > > ret_from_fork > > kthread > > worker_thread > > process_one_work > > output_poll_execute > > drm_client_dev_hotplug > > drm_fbdev_shmem_client_hotplug > > drm_fb_helper_hotplug_event > > drm_client_modeset_probe > > drm_helper_probe_single_connector_modes > > mgag200_vga_bmc_connector_helper_get_modes > > drm_connector_helper_get_modes > > drm_edid_read > > drm_edid_read_custom > > _drm_do_get_edid > > This makes me think that we're reading a bad EDID. I enabled drm.debug > setting to get more data: > > > Apr 22 23:47:11 1762811 kernel: EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:connector_bad_edid] [CONNECTOR:36:VGA-1] EDID is invalid: > > Apr 22 23:47:11 1762811 kernel: [00] BAD 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x720": 60 74250 1280 1390 1430 1650 720 725 730 750 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 68250 1280 1328 1360 1440 768 771 778 790 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 79500 1280 1344 1472 1664 768 771 778 798 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 71000 1280 1328 1360 1440 800 803 809 823 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 83500 1280 1352 1480 1680 800 803 809 831 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x960": 60 108000 1280 1376 1488 1800 960 961 964 1000 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1360x768": 60 85500 1360 1424 1536 1792 768 771 777 795 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 85500 1366 1436 1579 1792 768 771 774 798 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 72000 1366 1380 1436 1500 768 769 772 800 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 101000 1400 1448 1480 1560 1050 1053 1057 1080 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 121750 1400 1488 1632 1864 1050 1053 1057 1089 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 88750 1440 1488 1520 1600 900 903 909 926 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 106500 1440 1520 1672 1904 900 903 909 934 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x900": 60 108000 1600 1624 1704 1800 900 901 904 1000 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x1200": 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 119000 1680 1728 1760 1840 1050 1053 1059 1080 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 146250 1680 1784 1960 2240 1050 1053 1059 1089 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1792x1344": 60 204750 1792 1920 2120 2448 1344 1345 1348 1394 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1856x1392": 60 218250 1856 1952 2176 2528 1392 1393 1396 1439 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0xa (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 154000 1920 1968 2000 2080 1200 1203 1209 1235 0x40 0x9 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 193250 1920 2056 2256 2592 1200 1203 1209 1245 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1440": 60 234000 1920 2048 2256 2600 1440 1441 1444 1500 0x40 0x6 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "2048x1152": 60 162000 2048 2074 2154 2250 1152 1153 1156 1200 0x40 0x5 (VIRTUAL_X) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:36:VGA-1] probed modes: > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x48 0xa > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] enabled? yes > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] Not using firmware configuration > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for cmdline mode > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for preferred mode, tile 0 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] Found mode 1024x768 > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] picking CRTCs for 1024x768 config > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CRTC:34:crtc-0] desired mode 1024x768 set (0,0) > > Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_dev_hotplug] fbdev: ret=0 > > Does anyone have any idea whats going wrong here? A google search seems > to imply this is reading the EDID data from the VGA cable... > > I'm also curious if its possible to stop polling for so long with udelay > in the i2c logic somehow? I am not very familiar with i2c, but it is > frustrating that this driver is causing yet another stall that is > impacting timing sensitive data. Even if in this case its due to a > faulty cable.. it is frustrating that such result causes the PTP > failures. Would switching to WQ_UNBOUND be helpful here at all? > > Thanks, > Jake > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 0:05 ` David Airlie @ 2026-04-23 21:39 ` Jacob Keller 0 siblings, 0 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-23 21:39 UTC (permalink / raw) To: David Airlie Cc: Jocelyn Falempe, Thomas Zimmermann, dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/22/2026 5:05 PM, David Airlie wrote: >> >> These all appear to be workqueue warnings about functions that are >> hogging CPU. If I look carefully, it looks like they are all possibly >> related to the same mgag200 driver. At the very least >> output_poll_execute is certainly related to the mgag200 stall. >> >> I do noot understand exactly what is causing the driver to get stuck, >> its something in the i2c routine for reading the EDID block. >> >> I also see this being printed: >> >> EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 >> >> It appears to print quite consistently every few seconds. I guess this >> might be possibly related to a bad EDID block on the mgag200 device? >> What does this even mean? >> > > It sounds like the polling is having trouble with the i2c bus even if > there is no cable plugged in, probably cheaped out on some pull > up/down resistors on the VGA connector. > > does adding drm_kms_helper.poll=0 help to the command line help? > > Dave. > This looks like it is a global parameter for all users of the drm_kms_helper. Would it be feasible to have a mgag200 specific parameter made available? I am testing this out now, but if it helps, it would be good to be able to disable polling only for mgag200 in the off chance that some system has another device which depends on its functionality? I guess that may not be super common so maybe its not a big deal... Thanks, Jake ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-22 23:55 further issues with MGA G200 graphics chipset Jacob Keller 2026-04-23 0:05 ` David Airlie @ 2026-04-23 7:44 ` Thomas Zimmermann 2026-04-23 16:35 ` Jacob Keller 1 sibling, 1 reply; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-23 7:44 UTC (permalink / raw) To: Jacob Keller, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hi Am 23.04.26 um 01:55 schrieb Jacob Keller: > Hello, > > You may recall the issues I recently reported and submitted a fix for in > the mgag200 DRM driver from [1]. > > [1]: > https://lore.kernel.org/all/20260202-jk-mgag200-fix-bad-udelay-v2-1-ce1e9665987d@intel.com/ > > I recently have been running into another issue with the mgag200 > graphics driver on a similar platform. I noticed occasional spikes where > Tx timestamps from the ice driver were delayed, very similar behavior to > what was going on with the original bug report. However, this was on a > system running v6.12.76, which contains my MGA G200 usleep fix. > > I analyzed the data with perf and have discovered what looks like > another issue where the mgag200 polling routine is causing us issues. > > Here's a perf report which captures the cycles samples between the start > of a Tx timestamp request and the point where we report it to the stack: > >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] ret_from_fork_asm >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] ret_from_fork >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] kthread >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] worker_thread >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] process_one_work >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] output_poll_execute >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_client_dev_hotplug >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_fbdev_shmem_client_hotplug >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_fb_helper_hotplug_event >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_client_modeset_probe >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_helper_probe_single_connector_modes >> + 89.87% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_vga_bmc_connector_helper_get_modes >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_connector_helper_get_modes >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_edid_read >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_edid_read_custom >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] _drm_do_get_edid >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] edid_block_read >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] drm_do_probe_ddc_edid >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] i2c_transfer >> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __i2c_transfer >> + 89.87% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] bit_xfer >> - 59.65% 59.65% kworker/65:1-ev [kernel.kallsyms] [k] delay_halt_tpause >> ret_from_fork_asm >> ret_from_fork >> kthread >> worker_thread >> process_one_work >> output_poll_execute >> drm_client_dev_hotplug >> drm_fbdev_shmem_client_hotplug >> drm_fb_helper_hotplug_event >> drm_client_modeset_probe >> drm_helper_probe_single_connector_modes >> mgag200_vga_bmc_connector_helper_get_modes >> drm_connector_helper_get_modes >> drm_edid_read >> drm_edid_read_custom >> _drm_do_get_edid >> edid_block_read >> drm_do_probe_ddc_edid >> i2c_transfer >> __i2c_transfer >> + bit_xfer >> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __udelay >> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] __const_udelay >> + 51.11% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] sclhi >> + 30.22% 30.22% kworker/65:1-ev [kernel.kallsyms] [k] ioread8 >> + 7.30% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] delay_halt >> + 7.30% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] acknak >> + 7.29% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_ddc_algo_bit_data_setscl >> + 5.02% 0.00% swapper [kernel.kallsyms] [k] secondary_startup_64 >> + 5.02% 0.00% swapper [kernel.kallsyms] [k] start_secondary >> + 5.02% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry >> + 5.02% 0.00% swapper [kernel.kallsyms] [k] do_idle >> + 3.60% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle >> + 3.60% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter >> + 3.53% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state >> + 2.57% 0.00% kworker/65:1-ev [mgag200] [k] mgag200_ddc_algo_bit_data_setsda >> + 2.14% 0.00% perf [unknown] [k] 0xffffffffffffffff >> + 2.14% 0.00% perf perf [.] __cmd_record.constprop.0 >> + 2.14% 0.00% perf [kernel.kallsyms] [k] entry_SYSCALL_64 >> + 2.14% 0.00% perf [kernel.kallsyms] [k] do_syscall_64 >> + 2.14% 0.00% perf [kernel.kallsyms] [k] x64_sys_call >> + 2.06% 2.06% swapper [kernel.kallsyms] [k] intel_idle >> + 1.31% 0.42% perf [kernel.kallsyms] [k] do_sys_poll >> + 1.31% 0.00% perf perf [.] fdarray__poll >> + 1.31% 0.00% perf libc.so.6 [.] __poll >> + 1.31% 0.00% perf [kernel.kallsyms] [k] __x64_sys_poll >> + 1.06% 0.00% systemd-journal systemd-journald [.] 0x00005d6bb7cb3f64 >> + 1.06% 0.00% systemd-journal libc.so.6 [.] __libc_start_main >> + 1.06% 0.00% systemd-journal libc.so.6 [.] 0x00007d6ce3a2a1c9 >> + 1.06% 0.00% systemd-journal systemd-journald [.] 0x00005d6bb7cb389e >> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] sd_event_run >> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] sd_event_dispatch >> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] 0x00007d6ce409d413 >> + 1.00% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] i2c_stop >> + 0.83% 0.00% perf [kernel.kallsyms] [k] perf_poll >> + 0.83% 0.00% perf perf [.] record__mmap_read_evlist >> > As you can see, in this case we are spending +60% of the cycles in > delay_halt_tpause which is part of the bit_xfer function for > implementing i2c. That's from the DDC's i2c channel, which we poll on regular intervals when we update the connector status. Dave's suggestion should at least mitigate the problem. > > I also occasionally see these messages coming on dmesg: >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:44 1762811 kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND >> Apr 20 23:14:45 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND >> Apr 20 23:15:15 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND >> Apr 20 23:15:25 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND >> Apr 20 23:15:46 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:27 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND >> Apr 20 23:16:45 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND >> Apr 20 23:17:49 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND >> Apr 20 23:20:33 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND >> Apr 20 23:26:00 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND >> Apr 20 23:36:56 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND >> Apr 20 23:58:46 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND >> Apr 21 00:34:27 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND >> Apr 21 00:42:28 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND >> Apr 21 02:09:51 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND >> Apr 21 03:27:40 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND >> Apr 21 05:04:37 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 2051 times, consider switching to WQ_UNBOUND >> Apr 21 08:09:39 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND >> Apr 21 08:10:07 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND >> Apr 21 08:10:10 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND >> Apr 21 08:10:21 1762811 kernel: workqueue: vmstat_update hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND >> Apr 21 09:14:18 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 2051 times, consider switching to WQ_UNBOUND >> Apr 21 10:54:08 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 4099 times, consider switching to WQ_UNBOUND >> Apr 21 21:11:47 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4099 times, consider switching to WQ_UNBOUND >> Apr 21 22:33:11 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 8195 times, consider switching to WQ_UNBOUND >> Apr 22 20:31:04 1762811 kernel: workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 8195 times, consider switching to WQ_UNBOUND >> Apr 22 21:51:17 1762811 kernel: workqueue: output_poll_execute hogged CPU for >10000us 16387 times, consider switching to WQ_UNBOUND > These all appear to be workqueue warnings about functions that are > hogging CPU. If I look carefully, it looks like they are all possibly > related to the same mgag200 driver. At the very least > output_poll_execute is certainly related to the mgag200 stall. Polling the DDC involves acquiring locks so that it does not interfere with display updates. These errors about drm_fb_helper_damage_work() are fallout. The function most likely waits for the DDC polling to finish. > > I do noot understand exactly what is causing the driver to get stuck, > its something in the i2c routine for reading the EDID block. > > I also see this being printed: > > EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 > > It appears to print quite consistently every few seconds. I guess this > might be possibly related to a bad EDID block on the mgag200 device? > What does this even mean? The monitor's EDID is wrong. This is likely another fallout from the issue. > > I am not sure how I'd go about verifying this, or root causing what is > going wrong. > > It looks like we print the message as part of _drm_do_get_edid(), and > this definitely is called as part of the mgag200 routines: > >> - 33.33% 33.33% kworker/64:1-ev [kernel.kallsyms] [k] _drm_do_get_edid >> ret_from_fork_asm >> ret_from_fork >> kthread >> worker_thread >> process_one_work >> output_poll_execute >> drm_client_dev_hotplug >> drm_fbdev_shmem_client_hotplug >> drm_fb_helper_hotplug_event >> drm_client_modeset_probe >> drm_helper_probe_single_connector_modes >> mgag200_vga_bmc_connector_helper_get_modes >> drm_connector_helper_get_modes >> drm_edid_read >> drm_edid_read_custom >> _drm_do_get_edid > This makes me think that we're reading a bad EDID. I enabled drm.debug > setting to get more data: > >> Apr 22 23:47:11 1762811 kernel: EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:connector_bad_edid] [CONNECTOR:36:VGA-1] EDID is invalid: >> Apr 22 23:47:11 1762811 kernel: [00] BAD 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff This EDID has a correct identifier in the first 8 bytes and the rest is garbage. >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x720": 60 74250 1280 1390 1430 1650 720 725 730 750 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 68250 1280 1328 1360 1440 768 771 778 790 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x768": 60 79500 1280 1344 1472 1664 768 771 778 798 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 71000 1280 1328 1360 1440 800 803 809 823 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x800": 60 83500 1280 1352 1480 1680 800 803 809 831 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x960": 60 108000 1280 1376 1488 1800 960 961 964 1000 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1360x768": 60 85500 1360 1424 1536 1792 768 771 777 795 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 85500 1366 1436 1579 1792 768 771 774 798 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1366x768": 60 72000 1366 1380 1436 1500 768 769 772 800 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 101000 1400 1448 1480 1560 1050 1053 1057 1080 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1400x1050": 60 121750 1400 1488 1632 1864 1050 1053 1057 1089 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 88750 1440 1488 1520 1600 900 903 909 926 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1440x900": 60 106500 1440 1520 1672 1904 900 903 909 934 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x900": 60 108000 1600 1624 1704 1800 900 901 904 1000 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1600x1200": 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 119000 1680 1728 1760 1840 1050 1053 1059 1080 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1680x1050": 60 146250 1680 1784 1960 2240 1050 1053 1059 1089 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1792x1344": 60 204750 1792 1920 2120 2448 1344 1345 1348 1394 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1856x1392": 60 218250 1856 1952 2176 2528 1392 1393 1396 1439 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0xa (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 154000 1920 1968 2000 2080 1200 1203 1209 1235 0x40 0x9 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1200": 60 193250 1920 2056 2256 2592 1200 1203 1209 1245 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "1920x1440": 60 234000 1920 2048 2256 2600 1440 1441 1444 1500 0x40 0x6 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_mode_prune_invalid] Rejected mode: "2048x1152": 60 162000 2048 2074 2154 2250 1152 1153 1156 1200 0x40 0x5 (VIRTUAL_X) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:36:VGA-1] probed modes: >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x48 0xa >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_helper_probe_single_connector_modes] Probed mode: "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] enabled? yes >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] Not using firmware configuration >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for cmdline mode >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] looking for preferred mode, tile 0 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CONNECTOR:36:VGA-1] Found mode 1024x768 >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] picking CRTCs for 1024x768 config >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_modeset_probe] [CRTC:34:crtc-0] desired mode 1024x768 set (0,0) >> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: [drm:drm_client_dev_hotplug] fbdev: ret=0 > Does anyone have any idea whats going wrong here? A google search seems > to imply this is reading the EDID data from the VGA cable... The HW is probably broken. > > I'm also curious if its possible to stop polling for so long with udelay > in the i2c logic somehow? I am not very familiar with i2c, but it is > frustrating that this driver is causing yet another stall that is > impacting timing sensitive data. Even if in this case its due to a > faulty cable.. it is frustrating that such result causes the PTP > failures. Would switching to WQ_UNBOUND be helpful here at all? Try Dave's suggestion to avoid polling. The driver won't be able to detect changes to the connector status, though. Best regards Thomas > > Thanks, > Jake -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 7:44 ` Thomas Zimmermann @ 2026-04-23 16:35 ` Jacob Keller 2026-04-23 19:22 ` Jocelyn Falempe 0 siblings, 1 reply; 20+ messages in thread From: Jacob Keller @ 2026-04-23 16:35 UTC (permalink / raw) To: Thomas Zimmermann, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/23/2026 12:44 AM, Thomas Zimmermann wrote: > Hi > > Am 23.04.26 um 01:55 schrieb Jacob Keller: >> Hello, >> >> You may recall the issues I recently reported and submitted a fix for in >> the mgag200 DRM driver from [1]. >> >> [1]: >> https://lore.kernel.org/all/20260202-jk-mgag200-fix-bad-udelay-v2-1- >> ce1e9665987d@intel.com/ >> >> I recently have been running into another issue with the mgag200 >> graphics driver on a similar platform. I noticed occasional spikes where >> Tx timestamps from the ice driver were delayed, very similar behavior to >> what was going on with the original bug report. However, this was on a >> system running v6.12.76, which contains my MGA G200 usleep fix. >> >> I analyzed the data with perf and have discovered what looks like >> another issue where the mgag200 polling routine is causing us issues. >> >> Here's a perf report which captures the cycles samples between the start >> of a Tx timestamp request and the point where we report it to the stack: >> >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> ret_from_fork_asm >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> ret_from_fork >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> kthread >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> worker_thread >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> process_one_work >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> output_poll_execute >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_client_dev_hotplug >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_fbdev_shmem_client_hotplug >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_fb_helper_hotplug_event >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_client_modeset_probe >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_helper_probe_single_connector_modes >>> + 89.87% 0.00% kworker/65:1-ev [mgag200] [k] >>> mgag200_vga_bmc_connector_helper_get_modes >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_connector_helper_get_modes >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_edid_read >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_edid_read_custom >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> _drm_do_get_edid >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> edid_block_read >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> drm_do_probe_ddc_edid >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> i2c_transfer >>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> __i2c_transfer >>> + 89.87% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] >>> bit_xfer >>> - 59.65% 59.65% kworker/65:1-ev [kernel.kallsyms] [k] >>> delay_halt_tpause >>> ret_from_fork_asm >>> ret_from_fork >>> kthread >>> worker_thread >>> process_one_work >>> output_poll_execute >>> drm_client_dev_hotplug >>> drm_fbdev_shmem_client_hotplug >>> drm_fb_helper_hotplug_event >>> drm_client_modeset_probe >>> drm_helper_probe_single_connector_modes >>> mgag200_vga_bmc_connector_helper_get_modes >>> drm_connector_helper_get_modes >>> drm_edid_read >>> drm_edid_read_custom >>> _drm_do_get_edid >>> edid_block_read >>> drm_do_probe_ddc_edid >>> i2c_transfer >>> __i2c_transfer >>> + bit_xfer >>> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> __udelay >>> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> __const_udelay >>> + 51.11% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] >>> sclhi >>> + 30.22% 30.22% kworker/65:1-ev [kernel.kallsyms] [k] >>> ioread8 >>> + 7.30% 0.00% kworker/65:1-ev [kernel.kallsyms] [k] >>> delay_halt >>> + 7.30% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] >>> acknak >>> + 7.29% 0.00% kworker/65:1-ev [mgag200] [k] >>> mgag200_ddc_algo_bit_data_setscl >>> + 5.02% 0.00% swapper [kernel.kallsyms] [k] >>> secondary_startup_64 >>> + 5.02% 0.00% swapper [kernel.kallsyms] [k] >>> start_secondary >>> + 5.02% 0.00% swapper [kernel.kallsyms] [k] >>> cpu_startup_entry >>> + 5.02% 0.00% swapper [kernel.kallsyms] [k] >>> do_idle >>> + 3.60% 0.00% swapper [kernel.kallsyms] [k] >>> call_cpuidle >>> + 3.60% 0.00% swapper [kernel.kallsyms] [k] >>> cpuidle_enter >>> + 3.53% 0.00% swapper [kernel.kallsyms] [k] >>> cpuidle_enter_state >>> + 2.57% 0.00% kworker/65:1-ev [mgag200] [k] >>> mgag200_ddc_algo_bit_data_setsda >>> + 2.14% 0.00% perf [unknown] [k] >>> 0xffffffffffffffff >>> + 2.14% 0.00% perf perf [.] >>> __cmd_record.constprop.0 >>> + 2.14% 0.00% perf [kernel.kallsyms] [k] >>> entry_SYSCALL_64 >>> + 2.14% 0.00% perf [kernel.kallsyms] [k] >>> do_syscall_64 >>> + 2.14% 0.00% perf [kernel.kallsyms] [k] >>> x64_sys_call >>> + 2.06% 2.06% swapper [kernel.kallsyms] [k] >>> intel_idle >>> + 1.31% 0.42% perf [kernel.kallsyms] [k] >>> do_sys_poll >>> + 1.31% 0.00% perf perf [.] >>> fdarray__poll >>> + 1.31% 0.00% perf libc.so.6 [.] >>> __poll >>> + 1.31% 0.00% perf [kernel.kallsyms] [k] >>> __x64_sys_poll >>> + 1.06% 0.00% systemd-journal systemd-journald [.] >>> 0x00005d6bb7cb3f64 >>> + 1.06% 0.00% systemd-journal libc.so.6 [.] >>> __libc_start_main >>> + 1.06% 0.00% systemd-journal libc.so.6 [.] >>> 0x00007d6ce3a2a1c9 >>> + 1.06% 0.00% systemd-journal systemd-journald [.] >>> 0x00005d6bb7cb389e >>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] >>> sd_event_run >>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] >>> sd_event_dispatch >>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.] >>> 0x00007d6ce409d413 >>> + 1.00% 0.00% kworker/65:1-ev [i2c_algo_bit] [k] >>> i2c_stop >>> + 0.83% 0.00% perf [kernel.kallsyms] [k] >>> perf_poll >>> + 0.83% 0.00% perf perf [.] >>> record__mmap_read_evlist >>> >> As you can see, in this case we are spending +60% of the cycles in >> delay_halt_tpause which is part of the bit_xfer function for >> implementing i2c. > > That's from the DDC's i2c channel, which we poll on regular intervals > when we update the connector status. Dave's suggestion should at least > mitigate the problem. > Right. > > Polling the DDC involves acquiring locks so that it does not interfere > with display updates. These errors about drm_fb_helper_damage_work() are > fallout. The function most likely waits for the DDC polling to finish. Makes sense. I'm still wondering if it makes sense to convert to WQ_UNBOUND so that the task doesn't get bound to CPU and (hopefully?) doesn't cause other critical processes like IRQs to get stuck when they *happen* to be bound to the same CPU? I'm not entirely sure. It seems crazy to me that this simple background polling thread stalls my IRQ from executing for 30 milliseconds, but that appears to be what is happening. I am guessing that refactoring the i2c-bit-algo to allow usleep is not really possible either, so we can't make this part of the logic actually sleep instead of busy-waiting.. :( >> >> I do noot understand exactly what is causing the driver to get stuck, >> its something in the i2c routine for reading the EDID block. >> >> I also see this being printed: >> >> EDID block 0 (tag 0x00) checksum is invalid, remainder is 125 >> >> It appears to print quite consistently every few seconds. I guess this >> might be possibly related to a bad EDID block on the mgag200 device? >> What does this even mean? > > The monitor's EDID is wrong. This is likely another fallout from the issue. > It turns out that the platform doesn't even seem to have a physical VGA port. This makes me suspect Dave's point about a cheap resistor is quite plausible. >> >> I am not sure how I'd go about verifying this, or root causing what is >> going wrong. >> >> It looks like we print the message as part of _drm_do_get_edid(), and >> this definitely is called as part of the mgag200 routines: >> >>> - 33.33% 33.33% kworker/64:1-ev [kernel.kallsyms] [k] >>> _drm_do_get_edid >>> ret_from_fork_asm >>> ret_from_fork >>> kthread >>> worker_thread >>> process_one_work >>> output_poll_execute >>> drm_client_dev_hotplug >>> drm_fbdev_shmem_client_hotplug >>> drm_fb_helper_hotplug_event >>> drm_client_modeset_probe >>> drm_helper_probe_single_connector_modes >>> mgag200_vga_bmc_connector_helper_get_modes >>> drm_connector_helper_get_modes >>> drm_edid_read >>> drm_edid_read_custom >>> _drm_do_get_edid >> This makes me think that we're reading a bad EDID. I enabled drm.debug >> setting to get more data: >> >>> Apr 22 23:47:11 1762811 kernel: EDID block 0 (tag 0x00) checksum is >>> invalid, remainder is 125 >>> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: >>> [drm:connector_bad_edid] [CONNECTOR:36:VGA-1] EDID is invalid: >>> Apr 22 23:47:11 1762811 kernel: [00] BAD 00 ff ff ff ff ff >>> ff 00 ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff >>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff >>> ff ff ff ff ff ff ff ff ff ff > > This EDID has a correct identifier in the first 8 bytes and the rest is > garbage. > Yep. >>> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0: >>> [drm:drm_client_dev_hotplug] fbdev: ret=0 >> Does anyone have any idea whats going wrong here? A google search seems >> to imply this is reading the EDID data from the VGA cable... > > The HW is probably broken. > Right. I thought we had a KVM dongle plugged into the VGA port, but further inspection shows that there doesn't even appear to be a physical VGA port on the system, and the mgag200 is only used for its BMC connection! (We have a mini display port to VGA adapter in use, and I've asked the team to swap that out just to confirm its not related) >> >> I'm also curious if its possible to stop polling for so long with udelay >> in the i2c logic somehow? I am not very familiar with i2c, but it is >> frustrating that this driver is causing yet another stall that is >> impacting timing sensitive data. Even if in this case its due to a >> faulty cable.. it is frustrating that such result causes the PTP >> failures. Would switching to WQ_UNBOUND be helpful here at all? > > Try Dave's suggestion to avoid polling. The driver won't be able to > detect changes to the connector status, though. > That's fine. I don't think we're even using the device. It looks like it might only be in use for BMC, and the VGA connection isn't actually physically available, so there are no changes to detect. Is this polling really only to detect when VGA is enabled? Would it make sense to only poll on platforms which actually *have* that VGA connection? I'd like a solution where we don't have to go to each individual customer and have them ban the mgag200 driver or set some kernel parameter like drm_kms_helper.poll=0 to prevent issues. If the VGA connector isn't even available to *be* plugged in, then it doesn't make sense to constantly poll to check if it was... Many system admins likely aren't even aware of the devices existence, and it ends up causing stall issues like this, which for timing sensitive tasks results in service disruption. It is unpleasant that the mere *existence* of the device+driver causes such problems. > Best regards > Thomas > >> >> Thanks, >> Jake > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 16:35 ` Jacob Keller @ 2026-04-23 19:22 ` Jocelyn Falempe 2026-04-23 19:42 ` Jacob Keller 0 siblings, 1 reply; 20+ messages in thread From: Jocelyn Falempe @ 2026-04-23 19:22 UTC (permalink / raw) To: Jacob Keller, Thomas Zimmermann, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 23/04/2026 18:35, Jacob Keller wrote: > On 4/23/2026 12:44 AM, Thomas Zimmermann wrote: >> Hi >> >> Am 23.04.26 um 01:55 schrieb Jacob Keller: >>> Hello, >>><snip>>>> I'm also curious if its possible to stop polling for so long with udelay >>> in the i2c logic somehow? I am not very familiar with i2c, but it is >>> frustrating that this driver is causing yet another stall that is >>> impacting timing sensitive data. Even if in this case its due to a >>> faulty cable.. it is frustrating that such result causes the PTP >>> failures. Would switching to WQ_UNBOUND be helpful here at all? >> >> Try Dave's suggestion to avoid polling. The driver won't be able to >> detect changes to the connector status, though. >> > > That's fine. I don't think we're even using the device. It looks like it > might only be in use for BMC, and the VGA connection isn't actually > physically available, so there are no changes to detect. > > Is this polling really only to detect when VGA is enabled? Would it make > sense to only poll on platforms which actually *have* that VGA connection? > Polling was introduced with https://patchwork.freedesktop.org/series/131977/ The driver needs to know if a VGA monitor is connected or not, to provide the right available resolutions to the userspace. Otherwise you can set a high resolution that works from the BMC, but then connecting a VGA monitor will not work, as the driver won't notice that something has been connected. The mgag200 doesn't have an IRQ or a register to check if something is connected on the VGA port, so the driver uses the i2c and tries to read the EDID. Unfortunately, there is no way to know reliably if a VGA connector is present. It's possible to disable polling on some machines using DMI quirks, but I don't think this approach will scale. > > I'd like a solution where we don't have to go to each individual > customer and have them ban the mgag200 driver or set some kernel > parameter like drm_kms_helper.poll=0 to prevent issues. If the VGA > connector isn't even available to *be* plugged in, then it doesn't make > sense to constantly poll to check if it was... > > Many system admins likely aren't even aware of the devices existence, > and it ends up causing stall issues like this, which for timing > sensitive tasks results in service disruption. > > It is unpleasant that the mere *existence* of the device+driver causes > such problems. > >> Best regards >> Thomas >> >>> >>> Thanks, >>> Jake >> > Best regards, -- Jocelyn ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 19:22 ` Jocelyn Falempe @ 2026-04-23 19:42 ` Jacob Keller 2026-04-23 21:02 ` David Airlie 2026-04-24 6:20 ` Thomas Zimmermann 0 siblings, 2 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-23 19:42 UTC (permalink / raw) To: Jocelyn Falempe, Thomas Zimmermann, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/23/2026 12:22 PM, Jocelyn Falempe wrote: > On 23/04/2026 18:35, Jacob Keller wrote: >> On 4/23/2026 12:44 AM, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 23.04.26 um 01:55 schrieb Jacob Keller: >>>> Hello, >>>><snip>>>> I'm also curious if its possible to stop polling for so > long with udelay >>>> in the i2c logic somehow? I am not very familiar with i2c, but it is >>>> frustrating that this driver is causing yet another stall that is >>>> impacting timing sensitive data. Even if in this case its due to a >>>> faulty cable.. it is frustrating that such result causes the PTP >>>> failures. Would switching to WQ_UNBOUND be helpful here at all? >>> >>> Try Dave's suggestion to avoid polling. The driver won't be able to >>> detect changes to the connector status, though. >>> >> >> That's fine. I don't think we're even using the device. It looks like it >> might only be in use for BMC, and the VGA connection isn't actually >> physically available, so there are no changes to detect. >> >> Is this polling really only to detect when VGA is enabled? Would it make >> sense to only poll on platforms which actually *have* that VGA > connection? >> > Polling was introduced with https://patchwork.freedesktop.org/ > series/131977/ > > The driver needs to know if a VGA monitor is connected or not, to > provide the right available resolutions to the userspace. > Otherwise you can set a high resolution that works from the BMC, but > then connecting a VGA monitor will not work, as the driver won't notice > that something has been connected. > > The mgag200 doesn't have an IRQ or a register to check if something is > connected on the VGA port, so the driver uses the i2c and tries to read > the EDID. > > Unfortunately, there is no way to know reliably if a VGA connector is > present. It's possible to disable polling on some machines using DMI > quirks, but I don't think this approach will scale. > Timing sensitive setups like mine must have system admins who know to manually disable mgag200 or disable polling. Many users won't be aware of this. If the polling were not intrusive, this would not be an issue. But.... Faulty hardware (perhaps just a cheap pull down resistor on the VGA connection as Dave Airlie suggests) means that any such affected platform has a polling routine that causes significant issues on any timing sensitive applications. Right now, I am stuck in a situation which means that I have to fight to reach every customer who uses one of these platforms and confirm they either disable polling or ban the module so it won't even load. This is frustrating, as it is unlikely I'll reach everyone. I doubt that I'm the only one with users who are affected by mysterious performance or timing problems related to this. While its true that not *every* instance of the device is problematic (at least not now that we fixed the other issue with the udelay...), but many systems using the controller *are* negatively impacted even with the timing fix, as I have now seen... Unfortunately, I also have no better idea than a DMI quirk table to record known platforms that include the controller but don't have a physical VGA connection exposed. Thus, I'm wondering what else we can do? Using WQ_UNBOUND might help somewhat? I have no idea if its safe to sleep instead of spin while reading the i2c connections... As far as I can tell the non-atomic version has nothing that *strictly* prevents sleep.. but maybe i2c access has tighter timing requirements than what usleep_range can fulfill? I am not sure... I'd just really like to not have to worry about going to every single user and asking them to unload and ban a driver for these big server platforms... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 19:42 ` Jacob Keller @ 2026-04-23 21:02 ` David Airlie 2026-04-23 21:18 ` Jacob Keller 2026-04-24 6:16 ` Thomas Zimmermann 2026-04-24 6:20 ` Thomas Zimmermann 1 sibling, 2 replies; 20+ messages in thread From: David Airlie @ 2026-04-23 21:02 UTC (permalink / raw) To: Jacob Keller Cc: Jocelyn Falempe, Thomas Zimmermann, dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On Fri, Apr 24, 2026 at 5:42 AM Jacob Keller <jacob.e.keller@intel.com> wrote: > > On 4/23/2026 12:22 PM, Jocelyn Falempe wrote: > > On 23/04/2026 18:35, Jacob Keller wrote: > >> On 4/23/2026 12:44 AM, Thomas Zimmermann wrote: > >>> Hi > >>> > >>> Am 23.04.26 um 01:55 schrieb Jacob Keller: > >>>> Hello, > >>>><snip>>>> I'm also curious if its possible to stop polling for so > > long with udelay > >>>> in the i2c logic somehow? I am not very familiar with i2c, but it is > >>>> frustrating that this driver is causing yet another stall that is > >>>> impacting timing sensitive data. Even if in this case its due to a > >>>> faulty cable.. it is frustrating that such result causes the PTP > >>>> failures. Would switching to WQ_UNBOUND be helpful here at all? > >>> > >>> Try Dave's suggestion to avoid polling. The driver won't be able to > >>> detect changes to the connector status, though. > >>> > >> > >> That's fine. I don't think we're even using the device. It looks like it > >> might only be in use for BMC, and the VGA connection isn't actually > >> physically available, so there are no changes to detect. > >> > >> Is this polling really only to detect when VGA is enabled? Would it make > >> sense to only poll on platforms which actually *have* that VGA > > connection? > >> > > Polling was introduced with https://patchwork.freedesktop.org/ > > series/131977/ > > > > The driver needs to know if a VGA monitor is connected or not, to > > provide the right available resolutions to the userspace. > > Otherwise you can set a high resolution that works from the BMC, but > > then connecting a VGA monitor will not work, as the driver won't notice > > that something has been connected. > > > > The mgag200 doesn't have an IRQ or a register to check if something is > > connected on the VGA port, so the driver uses the i2c and tries to read > > the EDID. > > > > Unfortunately, there is no way to know reliably if a VGA connector is > > present. It's possible to disable polling on some machines using DMI > > quirks, but I don't think this approach will scale. > > > > Timing sensitive setups like mine must have system admins who know to > manually disable mgag200 or disable polling. Many users won't be aware > of this. If the polling were not intrusive, this would not be an issue. > But.... > > Faulty hardware (perhaps just a cheap pull down resistor on the VGA > connection as Dave Airlie suggests) means that any such affected > platform has a polling routine that causes significant issues on any > timing sensitive applications. We could write a patch to just say if we see 10 bogus EDID polls we just give up and loudly say in the logs. This might break some crash-cart plugins in some data centers though, I don't think we have contracts in Matrox or the server vendors who make the hw to say how they recommend finding this info. It might be in ACPI or dmidecodes. Dave. > > Right now, I am stuck in a situation which means that I have to fight to > reach every customer who uses one of these platforms and confirm they > either disable polling or ban the module so it won't even load. > > This is frustrating, as it is unlikely I'll reach everyone. > > I doubt that I'm the only one with users who are affected by mysterious > performance or timing problems related to this. While its true that not > *every* instance of the device is problematic (at least not now that we > fixed the other issue with the udelay...), but many systems using the > controller *are* negatively impacted even with the timing fix, as I have > now seen... > > Unfortunately, I also have no better idea than a DMI quirk table to > record known platforms that include the controller but don't have a > physical VGA connection exposed. > > Thus, I'm wondering what else we can do? Using WQ_UNBOUND might help > somewhat? I have no idea if its safe to sleep instead of spin while > reading the i2c connections... As far as I can tell the non-atomic > version has nothing that *strictly* prevents sleep.. but maybe i2c > access has tighter timing requirements than what usleep_range can > fulfill? I am not sure... > > I'd just really like to not have to worry about going to every single > user and asking them to unload and ban a driver for these big server > platforms... > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 21:02 ` David Airlie @ 2026-04-23 21:18 ` Jacob Keller 2026-04-24 6:16 ` Thomas Zimmermann 1 sibling, 0 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-23 21:18 UTC (permalink / raw) To: David Airlie Cc: Jocelyn Falempe, Thomas Zimmermann, dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/23/2026 2:02 PM, David Airlie wrote: > On Fri, Apr 24, 2026 at 5:42 AM Jacob Keller <jacob.e.keller@intel.com> wrote: >> >> On 4/23/2026 12:22 PM, Jocelyn Falempe wrote: >>> On 23/04/2026 18:35, Jacob Keller wrote: >>>> On 4/23/2026 12:44 AM, Thomas Zimmermann wrote: >>>>> Hi >>>>> >>>>> Am 23.04.26 um 01:55 schrieb Jacob Keller: >>>>>> Hello, >>>>>> <snip>>>> I'm also curious if its possible to stop polling for so >>> long with udelay >>>>>> in the i2c logic somehow? I am not very familiar with i2c, but it is >>>>>> frustrating that this driver is causing yet another stall that is >>>>>> impacting timing sensitive data. Even if in this case its due to a >>>>>> faulty cable.. it is frustrating that such result causes the PTP >>>>>> failures. Would switching to WQ_UNBOUND be helpful here at all? >>>>> >>>>> Try Dave's suggestion to avoid polling. The driver won't be able to >>>>> detect changes to the connector status, though. >>>>> >>>> >>>> That's fine. I don't think we're even using the device. It looks like it >>>> might only be in use for BMC, and the VGA connection isn't actually >>>> physically available, so there are no changes to detect. >>>> >>>> Is this polling really only to detect when VGA is enabled? Would it make >>>> sense to only poll on platforms which actually *have* that VGA >>> connection? >>>> >>> Polling was introduced with https://patchwork.freedesktop.org/ >>> series/131977/ >>> >>> The driver needs to know if a VGA monitor is connected or not, to >>> provide the right available resolutions to the userspace. >>> Otherwise you can set a high resolution that works from the BMC, but >>> then connecting a VGA monitor will not work, as the driver won't notice >>> that something has been connected. >>> >>> The mgag200 doesn't have an IRQ or a register to check if something is >>> connected on the VGA port, so the driver uses the i2c and tries to read >>> the EDID. >>> >>> Unfortunately, there is no way to know reliably if a VGA connector is >>> present. It's possible to disable polling on some machines using DMI >>> quirks, but I don't think this approach will scale. >>> >> >> Timing sensitive setups like mine must have system admins who know to >> manually disable mgag200 or disable polling. Many users won't be aware >> of this. If the polling were not intrusive, this would not be an issue. >> But.... >> >> Faulty hardware (perhaps just a cheap pull down resistor on the VGA >> connection as Dave Airlie suggests) means that any such affected >> platform has a polling routine that causes significant issues on any >> timing sensitive applications. > > We could write a patch to just say if we see 10 bogus EDID polls we > just give up and loudly say in the logs. > That would certainly be a better situation for me... > This might break some crash-cart plugins in some data centers though, > I don't think we have contracts in Matrox or the server vendors who > make the hw to say how they recommend finding this info. > But I could see this being a problem for data centers who previously saw "no issue" and now see "this device is causing a problem", especially if that problem is really non-existent? > It might be in ACPI or dmidecodes. > I can try checking if anything obvious shows up in dmidecodes for the device. > Dave. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 21:02 ` David Airlie 2026-04-23 21:18 ` Jacob Keller @ 2026-04-24 6:16 ` Thomas Zimmermann 1 sibling, 0 replies; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-24 6:16 UTC (permalink / raw) To: David Airlie, Jacob Keller Cc: Jocelyn Falempe, dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hi Am 23.04.26 um 23:02 schrieb David Airlie: [...] >> Faulty hardware (perhaps just a cheap pull down resistor on the VGA >> connection as Dave Airlie suggests) means that any such affected >> platform has a polling routine that causes significant issues on any >> timing sensitive applications. > We could write a patch to just say if we see 10 bogus EDID polls we > just give up and loudly say in the logs. I don't think we should do that. The fallout might just backfire as well. Best regards Thomas > > This might break some crash-cart plugins in some data centers though, > I don't think we have contracts in Matrox or the server vendors who > make the hw to say how they recommend finding this info. > > It might be in ACPI or dmidecodes. > > Dave. > > >> Right now, I am stuck in a situation which means that I have to fight to >> reach every customer who uses one of these platforms and confirm they >> either disable polling or ban the module so it won't even load. >> >> This is frustrating, as it is unlikely I'll reach everyone. >> >> I doubt that I'm the only one with users who are affected by mysterious >> performance or timing problems related to this. While its true that not >> *every* instance of the device is problematic (at least not now that we >> fixed the other issue with the udelay...), but many systems using the >> controller *are* negatively impacted even with the timing fix, as I have >> now seen... >> >> Unfortunately, I also have no better idea than a DMI quirk table to >> record known platforms that include the controller but don't have a >> physical VGA connection exposed. >> >> Thus, I'm wondering what else we can do? Using WQ_UNBOUND might help >> somewhat? I have no idea if its safe to sleep instead of spin while >> reading the i2c connections... As far as I can tell the non-atomic >> version has nothing that *strictly* prevents sleep.. but maybe i2c >> access has tighter timing requirements than what usleep_range can >> fulfill? I am not sure... >> >> I'd just really like to not have to worry about going to every single >> user and asking them to unload and ban a driver for these big server >> platforms... >> -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-23 19:42 ` Jacob Keller 2026-04-23 21:02 ` David Airlie @ 2026-04-24 6:20 ` Thomas Zimmermann 2026-04-24 7:36 ` Jocelyn Falempe 1 sibling, 1 reply; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-24 6:20 UTC (permalink / raw) To: Jacob Keller, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hi Am 23.04.26 um 21:42 schrieb Jacob Keller: [...] > Unfortunately, I also have no better idea than a DMI quirk table to > record known platforms that include the controller but don't have a > physical VGA connection exposed. I'm in favor of this. If you send a meaningful DMI identifier for your system, I'd make you a patch for testing. I don't know of any way for detecting the presence of a physical VGA connector BTW. Best regards Thomas > > Thus, I'm wondering what else we can do? Using WQ_UNBOUND might help > somewhat? I have no idea if its safe to sleep instead of spin while > reading the i2c connections... As far as I can tell the non-atomic > version has nothing that *strictly* prevents sleep.. but maybe i2c > access has tighter timing requirements than what usleep_range can > fulfill? I am not sure... > > I'd just really like to not have to worry about going to every single > user and asking them to unload and ban a driver for these big server > platforms... -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-24 6:20 ` Thomas Zimmermann @ 2026-04-24 7:36 ` Jocelyn Falempe 2026-04-24 7:47 ` Thomas Zimmermann 0 siblings, 1 reply; 20+ messages in thread From: Jocelyn Falempe @ 2026-04-24 7:36 UTC (permalink / raw) To: Thomas Zimmermann, Jacob Keller, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 24/04/2026 08:20, Thomas Zimmermann wrote: > Hi > > Am 23.04.26 um 21:42 schrieb Jacob Keller: > [...] >> Unfortunately, I also have no better idea than a DMI quirk table to >> record known platforms that include the controller but don't have a >> physical VGA connection exposed. > > I'm in favor of this. If you send a meaningful DMI identifier for your > system, I'd make you a patch for testing. I didn't find something related to VGA connector in dmidecode. My suggestion would be to use the chassis-type [1], and disable polling on Blade (0x1C and 0x1D) and Rack Mount (0x17) as they are less likely to have a real VGA monitor connected. My Dell T310, which is kind of a Tower, has a chassis-type of 0x11 "Main server chassis" so it might not be very reliable. Another option would be to disable polling if PREEMPT_RT is set, so if the user expects low latency, he can actually have it. Last resort is that the driver did work for 2 decades without polling the VGA connector, maybe we can revert to that behavior. -- Jocelyn [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.9.0.pdf > > I don't know of any way for detecting the presence of a physical VGA > connector BTW. > > Best regards > Thomas > >> >> Thus, I'm wondering what else we can do? Using WQ_UNBOUND might help >> somewhat? I have no idea if its safe to sleep instead of spin while >> reading the i2c connections... As far as I can tell the non-atomic >> version has nothing that *strictly* prevents sleep.. but maybe i2c >> access has tighter timing requirements than what usleep_range can >> fulfill? I am not sure... >> >> I'd just really like to not have to worry about going to every single >> user and asking them to unload and ban a driver for these big server >> platforms... > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-24 7:36 ` Jocelyn Falempe @ 2026-04-24 7:47 ` Thomas Zimmermann 2026-04-24 23:29 ` Jacob Keller 0 siblings, 1 reply; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-24 7:47 UTC (permalink / raw) To: Jocelyn Falempe, Jacob Keller, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hi Jocelyn Am 24.04.26 um 09:36 schrieb Jocelyn Falempe: > On 24/04/2026 08:20, Thomas Zimmermann wrote: >> Hi >> >> Am 23.04.26 um 21:42 schrieb Jacob Keller: >> [...] >>> Unfortunately, I also have no better idea than a DMI quirk table to >>> record known platforms that include the controller but don't have a >>> physical VGA connection exposed. >> >> I'm in favor of this. If you send a meaningful DMI identifier for >> your system, I'd make you a patch for testing. > > I didn't find something related to VGA connector in dmidecode. > My suggestion would be to use the chassis-type [1], and disable > polling on Blade (0x1C and 0x1D) and Rack Mount (0x17) as they are > less likely to have a real VGA monitor connected. > My Dell T310, which is kind of a Tower, has a chassis-type of 0x11 > "Main server chassis" so it might not be very reliable. This is the first time, I hear about this problem. I don't think it's very common. And it appears only to be related due to cheap hardware manufacturing. So I suggest to pick Manufacturer, Product, Version as key. I'd be surprised if we find more than a hand full of systems with the issue. If we see a trend or common pattern, we can generalize later on. > > Another option would be to disable polling if PREEMPT_RT is set, so if > the user expects low latency, he can actually have it. > > Last resort is that the driver did work for 2 decades without polling > the VGA connector, maybe we can revert to that behavior. It didn't actually work. Removing it will break a lot of systems. Best regards Thomas -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-24 7:47 ` Thomas Zimmermann @ 2026-04-24 23:29 ` Jacob Keller 2026-04-27 12:14 ` Thomas Zimmermann 0 siblings, 1 reply; 20+ messages in thread From: Jacob Keller @ 2026-04-24 23:29 UTC (permalink / raw) To: Thomas Zimmermann, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/24/2026 12:47 AM, Thomas Zimmermann wrote: > Hi Jocelyn > > Am 24.04.26 um 09:36 schrieb Jocelyn Falempe: >> On 24/04/2026 08:20, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 23.04.26 um 21:42 schrieb Jacob Keller: >>> [...] >>>> Unfortunately, I also have no better idea than a DMI quirk table to >>>> record known platforms that include the controller but don't have a >>>> physical VGA connection exposed. >>> >>> I'm in favor of this. If you send a meaningful DMI identifier for >>> your system, I'd make you a patch for testing. >> >> I didn't find something related to VGA connector in dmidecode. >> My suggestion would be to use the chassis-type [1], and disable >> polling on Blade (0x1C and 0x1D) and Rack Mount (0x17) as they are >> less likely to have a real VGA monitor connected. >> My Dell T310, which is kind of a Tower, has a chassis-type of 0x11 >> "Main server chassis" so it might not be very reliable. > > This is the first time, I hear about this problem. I don't think it's > very common. And it appears only to be related due to cheap hardware > manufacturing. > > So I suggest to pick Manufacturer, Product, Version as key. I'd be > surprised if we find more than a hand full of systems with the issue. If > we see a trend or common pattern, we can generalize later on. > I think this is the best solution. Keep it focused for now. I believe Intel has two major platforms that we care about with respect to this issue. I'll see if I can dig up the data. The systems install the MGA G200 for BMC use but don't seem to expose the VGA connection. For the specific system I have that was faulty, we have the following: $ for t in system-manufacturer system-product-name system-version ; \ do dmidecode -s ${t}; \ done Dell Inc. PowerEdge XR8720t Not Specified I believe there was also some concern about HP systems which similarly use this chipset, but I don't have the DMI data for that one off hand. I've asked some colleagues to confirm the situation and obtain that data. I'll get back early next week if we think there are any other systems possibly affected. In the mean time, I'm happy to have our team test any patch to confirm that it behaves as expected and resolves the service interruptions. Appreciate all the feedback on this thread. Thanks, Jake ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-24 23:29 ` Jacob Keller @ 2026-04-27 12:14 ` Thomas Zimmermann 2026-04-27 22:53 ` Jacob Keller 2026-04-28 19:12 ` stuart hayes 0 siblings, 2 replies; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-27 12:14 UTC (permalink / raw) To: Jacob Keller, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen [-- Attachment #1: Type: text/plain, Size: 2275 bytes --] Hi Am 25.04.26 um 01:29 schrieb Jacob Keller: [...] >> >> So I suggest to pick Manufacturer, Product, Version as key. I'd be >> surprised if we find more than a hand full of systems with the issue. If >> we see a trend or common pattern, we can generalize later on. >> > I think this is the best solution. Keep it focused for now. I believe > Intel has two major platforms that we care about with respect to this > issue. I'll see if I can dig up the data. The systems install the MGA > G200 for BMC use but don't seem to expose the VGA connection. > > For the specific system I have that was faulty, we have the following: > > $ for t in system-manufacturer system-product-name system-version ; \ > do dmidecode -s ${t}; \ > done > Dell Inc. > PowerEdge XR8720t > Not Specified > > > > I believe there was also some concern about HP systems which similarly > use this chipset, but I don't have the DMI data for that one off hand. > I've asked some colleagues to confirm the situation and obtain that > data. I'll get back early next week if we think there are any other > systems possibly affected. > > In the mean time, I'm happy to have our team test any patch to confirm > that it behaves as expected and resolves the service interruptions. For now, I've modified the two places that have BMC support in the driver. Could you please also tell me your system's exact Matrox chipset or its PCI id? The patch is attached for your testing. It would work against drm-tip or v7.1-rc1. I've also found the page at [1], which claims that there's a Mini-DP port at the front. If so, I'd assume that there's also an extra encoder chip to replace the VGA. If we ever get specs for that, we could implement real support in the driver. In the meantime, the current fix should work. In the worst case, that Mini-DP port would give a lower default resolution. [1] https://www.dell.com/en-us/shop/ipovw/poweredge-xr8720t?hve=shop+now#techspecs_section Best regards Thomas > > Appreciate all the feedback on this thread. > > Thanks, > Jake -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) [-- Attachment #2: 0001-drm-mgag200-Add-BMC-only-connector.patch --] [-- Type: text/x-patch, Size: 7785 bytes --] From 95d72c2e4abef9fc45433076d3b130336c734e75 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann <tzimmermann@suse.de> Date: Fri, 24 Apr 2026 09:05:14 +0200 Subject: [PATCH] drm/mgag200: Add BMC-only connector --- drivers/gpu/drm/mgag200/mgag200_bmc.c | 109 ++++++++++++++++++++++ drivers/gpu/drm/mgag200/mgag200_drv.h | 7 +- drivers/gpu/drm/mgag200/mgag200_g200ew3.c | 17 +++- drivers/gpu/drm/mgag200/mgag200_g200wb.c | 17 +++- 4 files changed, 147 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/mgag200/mgag200_bmc.c b/drivers/gpu/drm/mgag200/mgag200_bmc.c index bbdeb791c5b3..8d974e2c1810 100644 --- a/drivers/gpu/drm/mgag200/mgag200_bmc.c +++ b/drivers/gpu/drm/mgag200/mgag200_bmc.c @@ -6,6 +6,8 @@ #include <drm/drm_atomic_helper.h> #include <drm/drm_edid.h> #include <drm/drm_managed.h> +#include <drm/drm_modeset_helper_vtables.h> +#include <drm/drm_print.h> #include <drm/drm_probe_helper.h> #include "mgag200_drv.h" @@ -90,3 +92,110 @@ void mgag200_bmc_start_scanout(struct mga_device *mdev) tmp &= ~0x10; WREG_DAC(MGA1064_GEN_IO_DATA, tmp); } + +static void mgag200_bmc_encoder_atomic_disable(struct drm_encoder *encoder, + struct drm_atomic_state *state) +{ + struct mga_device *mdev = to_mga_device(encoder->dev); + + if (mdev->info->sync_bmc) + mgag200_bmc_stop_scanout(mdev); +} + +static void mgag200_bmc_encoder_atomic_enable(struct drm_encoder *encoder, + struct drm_atomic_state *state) +{ + struct mga_device *mdev = to_mga_device(encoder->dev); + + if (mdev->info->sync_bmc) + mgag200_bmc_start_scanout(mdev); +} + +static int mgag200_bmc_encoder_atomic_check(struct drm_encoder *encoder, + struct drm_crtc_state *new_crtc_state, + struct drm_connector_state *new_connector_state) +{ + struct mga_device *mdev = to_mga_device(encoder->dev); + struct mgag200_crtc_state *new_mgag200_crtc_state = to_mgag200_crtc_state(new_crtc_state); + + new_mgag200_crtc_state->set_vidrst = mdev->info->sync_bmc; + + return 0; +} + +static const struct drm_encoder_helper_funcs mgag200_bmc_encoder_helper_funcs = { + .atomic_disable = mgag200_bmc_encoder_atomic_disable, + .atomic_enable = mgag200_bmc_encoder_atomic_enable, + .atomic_check = mgag200_bmc_encoder_atomic_check, +}; + +static const struct drm_encoder_funcs mgag200_bmc_encoder_funcs = { + .destroy = drm_encoder_cleanup +}; + +static int mgag200_bmc_connector_helper_get_modes(struct drm_connector *connector) +{ + struct mga_device *mdev = to_mga_device(connector->dev); + const struct mgag200_device_info *minfo = mdev->info; + int count; + + /* + * There's no EDID data without a connected monitor. Set BMC- + * compatible modes in this case. The XGA default resolution + * should work well for all BMCs. + */ + count = drm_add_modes_noedid(connector, minfo->max_hdisplay, minfo->max_vdisplay); + if (count) + drm_set_preferred_mode(connector, 1024, 768); + + return count; +} + +static const struct drm_connector_helper_funcs mgag200_bmc_connector_helper_funcs = { + .get_modes = mgag200_bmc_connector_helper_get_modes, +}; + +static const struct drm_connector_funcs mgag200_bmc_connector_funcs = { + .reset = drm_atomic_helper_connector_reset, + .fill_modes = drm_helper_probe_single_connector_modes, + .destroy = drm_connector_cleanup, + .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, + .atomic_destroy_state = drm_atomic_helper_connector_destroy_state +}; + +int mgag200_bmc_output_init(struct mga_device *mdev) +{ + struct drm_device *dev = &mdev->base; + struct drm_crtc *crtc = &mdev->crtc; + struct drm_encoder *encoder; + struct drm_connector *connector; + int ret; + + encoder = &mdev->output.bmc.encoder; + ret = drm_encoder_init(dev, encoder, &mgag200_bmc_encoder_funcs, + DRM_MODE_ENCODER_VIRTUAL, NULL); + if (ret) { + drm_err(dev, "drm_encoder_init() failed: %d\n", ret); + return ret; + } + drm_encoder_helper_add(encoder, &mgag200_bmc_encoder_helper_funcs); + + encoder->possible_crtcs = drm_crtc_mask(crtc); + + connector = &mdev->output.bmc.connector; + ret = drm_connector_init(dev, connector, &mgag200_bmc_connector_funcs, + DRM_MODE_CONNECTOR_VGA); + if (ret) { + drm_err(dev, "drm_connector_init() failed: %d\n", ret); + return ret; + } + drm_connector_helper_add(connector, &mgag200_bmc_connector_helper_funcs); + + ret = drm_connector_attach_encoder(connector, encoder); + if (ret) { + drm_err(dev, "drm_connector_attach_encoder() failed: %d\n", ret); + return ret; + } + + return 0; +} diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h b/drivers/gpu/drm/mgag200/mgag200_drv.h index a875c4bf8cbe..f126f6d61ed0 100644 --- a/drivers/gpu/drm/mgag200/mgag200_drv.h +++ b/drivers/gpu/drm/mgag200/mgag200_drv.h @@ -279,7 +279,11 @@ struct mga_device { struct drm_plane primary_plane; struct drm_crtc crtc; - struct { + union { + struct { + struct drm_encoder encoder; + struct drm_connector connector; + } bmc; struct { struct drm_encoder encoder; struct drm_connector connector; @@ -435,5 +439,6 @@ int mgag200_vga_output_init(struct mga_device *mdev); /* mgag200_bmc.c */ void mgag200_bmc_stop_scanout(struct mga_device *mdev); void mgag200_bmc_start_scanout(struct mga_device *mdev); +int mgag200_bmc_output_init(struct mga_device *mdev); #endif /* __MGAG200_DRV_H__ */ diff --git a/drivers/gpu/drm/mgag200/mgag200_g200ew3.c b/drivers/gpu/drm/mgag200/mgag200_g200ew3.c index e387a455eae5..12047066b615 100644 --- a/drivers/gpu/drm/mgag200/mgag200_g200ew3.c +++ b/drivers/gpu/drm/mgag200/mgag200_g200ew3.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only +#include <linux/dmi.h> #include <linux/pci.h> #include <drm/drm_atomic.h> @@ -11,6 +12,17 @@ #include "mgag200_drv.h" +static const struct dmi_system_id mgag200_g200ew3_novga[] = { + { + .ident = "PowerEdge XR8720t", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge XR8720t"), + }, + }, + {}, +}; + static void mgag200_g200ew3_init_registers(struct mga_device *mdev) { mgag200_g200wb_init_registers(mdev); // same as G200WB @@ -128,7 +140,10 @@ static int mgag200_g200ew3_pipeline_init(struct mga_device *mdev) drm_mode_crtc_set_gamma_size(crtc, MGAG200_LUT_SIZE); drm_crtc_enable_color_mgmt(crtc, 0, false, MGAG200_LUT_SIZE); - ret = mgag200_vga_bmc_output_init(mdev); + if (dmi_check_system(mgag200_g200ew3_novga)) + ret = mgag200_bmc_output_init(mdev); + else + ret = mgag200_vga_bmc_output_init(mdev); if (ret) return ret; diff --git a/drivers/gpu/drm/mgag200/mgag200_g200wb.c b/drivers/gpu/drm/mgag200/mgag200_g200wb.c index d847fa8ded8c..e6ce1130d5eb 100644 --- a/drivers/gpu/drm/mgag200/mgag200_g200wb.c +++ b/drivers/gpu/drm/mgag200/mgag200_g200wb.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only #include <linux/delay.h> +#include <linux/dmi.h> #include <linux/pci.h> #include <drm/drm_atomic.h> @@ -12,6 +13,17 @@ #include "mgag200_drv.h" +static const struct dmi_system_id mgag200_g200wb_novga[] = { + { + .ident = "PowerEdge XR8720t", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge XR8720t"), + }, + }, + {}, +}; + void mgag200_g200wb_init_registers(struct mga_device *mdev) { static const u8 dacvalue[] = { @@ -262,7 +274,10 @@ static int mgag200_g200wb_pipeline_init(struct mga_device *mdev) drm_mode_crtc_set_gamma_size(crtc, MGAG200_LUT_SIZE); drm_crtc_enable_color_mgmt(crtc, 0, false, MGAG200_LUT_SIZE); - ret = mgag200_vga_bmc_output_init(mdev); + if (dmi_check_system(mgag200_g200wb_novga)) + ret = mgag200_bmc_output_init(mdev); + else + ret = mgag200_vga_bmc_output_init(mdev); if (ret) return ret; -- 2.54.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-27 12:14 ` Thomas Zimmermann @ 2026-04-27 22:53 ` Jacob Keller 2026-04-27 23:32 ` Jacob Keller 2026-04-28 19:12 ` stuart hayes 1 sibling, 1 reply; 20+ messages in thread From: Jacob Keller @ 2026-04-27 22:53 UTC (permalink / raw) To: Thomas Zimmermann, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/27/2026 5:14 AM, Thomas Zimmermann wrote: > Hi > > Am 25.04.26 um 01:29 schrieb Jacob Keller: > [...] >>> >>> So I suggest to pick Manufacturer, Product, Version as key. I'd be >>> surprised if we find more than a hand full of systems with the issue. If >>> we see a trend or common pattern, we can generalize later on. >>> >> I think this is the best solution. Keep it focused for now. I believe >> Intel has two major platforms that we care about with respect to this >> issue. I'll see if I can dig up the data. The systems install the MGA >> G200 for BMC use but don't seem to expose the VGA connection. >> >> For the specific system I have that was faulty, we have the following: >> >> $ for t in system-manufacturer system-product-name system-version ; \ >> do dmidecode -s ${t}; \ >> done >> Dell Inc. >> PowerEdge XR8720t >> Not Specified >> >> >> >> I believe there was also some concern about HP systems which similarly >> use this chipset, but I don't have the DMI data for that one off hand. >> I've asked some colleagues to confirm the situation and obtain that >> data. I'll get back early next week if we think there are any other >> systems possibly affected. >> >> In the mean time, I'm happy to have our team test any patch to confirm >> that it behaves as expected and resolves the service interruptions. > > For now, I've modified the two places that have BMC support in the > driver. Could you please also tell me your system's exact Matrox chipset > or its PCI id? > Here's the lspci output: > > b5:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 08) (prog-if 00 [VGA controller]) > DeviceName: Embedded Video > Subsystem: Dell Integrated Matrox G200eW3 Graphics Controller [1028:0d38] > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 16 > NUMA node: 0 > IOMMU group: 16 > Region 0: Memory at e5000000 (32-bit, prefetchable) [size=16M] > Region 1: Memory at e6810000 (32-bit, non-prefetchable) [size=16K] > Region 2: Memory at e6000000 (32-bit, non-prefetchable) [size=8M] > Capabilities: [dc] Power Management version 3 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Kernel driver in use: mgag200 > Kernel modules: mgag200 The device ID looks to be 0x0536, and the subdevice ID is Dell 0x0D38. I don't see anything specifically related to mini display port. It is plausible there is an encoder between that output and the G200eW3. > The patch is attached for your testing. It would work against drm-tip or > v7.1-rc1. > I'll give it a shot. > I've also found the page at [1], which claims that there's a Mini-DP > port at the front. If so, I'd assume that there's also an extra encoder > chip to replace the VGA. If we ever get specs for that, we could > implement real support in the driver. > There does appear to be a mini display port cable. I am not certain if that is driven from the Matrox graphics or not, but looking at lspci there doesn't appear to be any other graphics chipset on the system, so you might be right, but I am not certain. > In the meantime, the current fix should work. In the worst case, that > Mini-DP port would give a lower default resolution. > I can check the behavior of the mini-DP output too and see if this changes anything for it. > [1] https://www.dell.com/en-us/shop/ipovw/poweredge-xr8720t? > hve=shop+now#techspecs_section > > Best regards > Thomas > > > > >> >> Appreciate all the feedback on this thread. >> >> Thanks, >> Jake > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-27 22:53 ` Jacob Keller @ 2026-04-27 23:32 ` Jacob Keller 0 siblings, 0 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-27 23:32 UTC (permalink / raw) To: Thomas Zimmermann, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/27/2026 3:53 PM, Jacob Keller wrote: > On 4/27/2026 5:14 AM, Thomas Zimmermann wrote: >> Hi >>> In the mean time, I'm happy to have our team test any patch to confirm >>> that it behaves as expected and resolves the service interruptions. >> >> For now, I've modified the two places that have BMC support in the >> driver. Could you please also tell me your system's exact Matrox chipset >> or its PCI id? >> > > Here's the lspci output: > >> >> b5:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 08) (prog-if 00 [VGA controller]) >> DeviceName: Embedded Video >> Subsystem: Dell Integrated Matrox G200eW3 Graphics Controller [1028:0d38] >> Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >> Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Interrupt: pin A routed to IRQ 16 >> NUMA node: 0 >> IOMMU group: 16 >> Region 0: Memory at e5000000 (32-bit, prefetchable) [size=16M] >> Region 1: Memory at e6810000 (32-bit, non-prefetchable) [size=16K] >> Region 2: Memory at e6000000 (32-bit, non-prefetchable) [size=8M] >> Capabilities: [dc] Power Management version 3 >> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >> Kernel driver in use: mgag200 >> Kernel modules: mgag200 > > The device ID looks to be 0x0536, and the subdevice ID is Dell 0x0D38. I > don't see anything specifically related to mini display port. It is > plausible there is an encoder between that output and the G200eW3. > >> The patch is attached for your testing. It would work against drm-tip or >> v7.1-rc1. >> > > I'll give it a shot. > The systems that were having trouble are currently being used by other folks on my team to check other issues. It might take a day or two before I can get access again to test this. I'll update you once I've gotten access and had a chance to test the changes. Thanks, Jake ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-27 12:14 ` Thomas Zimmermann 2026-04-27 22:53 ` Jacob Keller @ 2026-04-28 19:12 ` stuart hayes 2026-04-28 21:07 ` Jacob Keller 2026-04-29 6:40 ` Thomas Zimmermann 1 sibling, 2 replies; 20+ messages in thread From: stuart hayes @ 2026-04-28 19:12 UTC (permalink / raw) To: Thomas Zimmermann, Jacob Keller, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/27/2026 7:14 AM, Thomas Zimmermann wrote: > Hi > > Am 25.04.26 um 01:29 schrieb Jacob Keller: > [...] >>> >>> So I suggest to pick Manufacturer, Product, Version as key. I'd be >>> surprised if we find more than a hand full of systems with the issue. If >>> we see a trend or common pattern, we can generalize later on. >>> >> I think this is the best solution. Keep it focused for now. I believe >> Intel has two major platforms that we care about with respect to this >> issue. I'll see if I can dig up the data. The systems install the MGA >> G200 for BMC use but don't seem to expose the VGA connection. >> >> For the specific system I have that was faulty, we have the following: >> >> $ for t in system-manufacturer system-product-name system-version ; \ >> do dmidecode -s ${t}; \ >> done >> Dell Inc. >> PowerEdge XR8720t >> Not Specified >> >> >> >> I believe there was also some concern about HP systems which similarly >> use this chipset, but I don't have the DMI data for that one off hand. >> I've asked some colleagues to confirm the situation and obtain that >> data. I'll get back early next week if we think there are any other >> systems possibly affected. >> >> In the mean time, I'm happy to have our team test any patch to confirm >> that it behaves as expected and resolves the service interruptions. > > For now, I've modified the two places that have BMC support in the > driver. Could you please also tell me your system's exact Matrox chipset > or its PCI id? > > The patch is attached for your testing. It would work against drm-tip or > v7.1-rc1. > > I've also found the page at [1], which claims that there's a Mini-DP > port at the front. If so, I'd assume that there's also an extra encoder > chip to replace the VGA. If we ever get specs for that, we could > implement real support in the driver. > > In the meantime, the current fix should work. In the worst case, that > Mini-DP port would give a lower default resolution. > > [1] https://www.dell.com/en-us/shop/ipovw/poweredge-xr8720t? > hve=shop+now#techspecs_section > > Best regards > Thomas > > So this patch disables DDC polling if the dmi_check_system() matches. If this was to happen on systems that _do_ have a physical VGA connector, will that port still be active, just with a resolution that may not be compatible with the monitor that's plugged in? I don't see anyone say that the DDC polling doesn't cause too much latency for real time kernels on other systems that do have a VGA connector... did I miss that, or is there a chance that a lot of other systems that use this driver might also have issues with a real time kernel? > > >> >> Appreciate all the feedback on this thread. >> >> Thanks, >> Jake > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-28 19:12 ` stuart hayes @ 2026-04-28 21:07 ` Jacob Keller 2026-04-29 6:40 ` Thomas Zimmermann 1 sibling, 0 replies; 20+ messages in thread From: Jacob Keller @ 2026-04-28 21:07 UTC (permalink / raw) To: stuart hayes, Thomas Zimmermann, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen On 4/28/2026 12:12 PM, stuart hayes wrote: > On 4/27/2026 7:14 AM, Thomas Zimmermann wrote: >> Hi >> >> Am 25.04.26 um 01:29 schrieb Jacob Keller: >> [...] >>>> >>>> So I suggest to pick Manufacturer, Product, Version as key. I'd be >>>> surprised if we find more than a hand full of systems with the >>>> issue. If >>>> we see a trend or common pattern, we can generalize later on. >>>> >>> I think this is the best solution. Keep it focused for now. I believe >>> Intel has two major platforms that we care about with respect to this >>> issue. I'll see if I can dig up the data. The systems install the MGA >>> G200 for BMC use but don't seem to expose the VGA connection. >>> >>> For the specific system I have that was faulty, we have the following: >>> >>> $ for t in system-manufacturer system-product-name system-version ; \ >>> do dmidecode -s ${t}; \ >>> done >>> Dell Inc. >>> PowerEdge XR8720t >>> Not Specified >>> >>> >>> >>> I believe there was also some concern about HP systems which similarly >>> use this chipset, but I don't have the DMI data for that one off hand. >>> I've asked some colleagues to confirm the situation and obtain that >>> data. I'll get back early next week if we think there are any other >>> systems possibly affected. >>> >>> In the mean time, I'm happy to have our team test any patch to confirm >>> that it behaves as expected and resolves the service interruptions. >> >> For now, I've modified the two places that have BMC support in the >> driver. Could you please also tell me your system's exact Matrox >> chipset or its PCI id? >> >> The patch is attached for your testing. It would work against drm-tip >> or v7.1-rc1. >> >> I've also found the page at [1], which claims that there's a Mini-DP >> port at the front. If so, I'd assume that there's also an extra >> encoder chip to replace the VGA. If we ever get specs for that, we >> could implement real support in the driver. >> >> In the meantime, the current fix should work. In the worst case, that >> Mini-DP port would give a lower default resolution. >> >> [1] https://www.dell.com/en-us/shop/ipovw/poweredge-xr8720t? >> hve=shop+now#techspecs_section >> >> Best regards >> Thomas >> >> > > So this patch disables DDC polling if the dmi_check_system() matches. If > this was to happen on systems that _do_ have a physical VGA connector, > will that port still be active, just with a resolution that may not be > compatible with the monitor that's plugged in? > That is my understanding, yes. At least as far as I can tell none of the Dell PowerEdge systems with this chipset have a true VGA port, but it is still unconfirmed if the mini DisplayPort is connected to the MGA G200 through some sort of encoder, and how it interacts with the polling disabled. What I can confirm so far is that the mini Display Port output does seem to work despite the MGA G200 driver continuously complaining about bad/faulty EDID checksums. I haven't had time again on the system to check the patch or confirm what happens with mgag200 polling disabled yet. > I don't see anyone say that the DDC polling doesn't cause too much > latency for real time kernels on other systems that do have a VGA > connector... did I miss that, or is there a chance that a lot of other > systems that use this driver might also have issues with a real time > kernel? > There were 2 problems so far identified: 1) the DDC polling was causing issues due to spinning where it could sleep. I fixed that a while ago with 0e0c8f4d16de ("drm/mgag200: fix mgag200_bmc_stop_scanout()"). This was causing 300 millisecond delays that impacted PTP functionality. This was affecting both RT and non-RT systems (though both setups are timing sensitive only one was actually using PREEMPT_RT). This issue, I believe, widely affects all systems which use the MGA G200. It has been fixed. 2) the issue I reported here. This issue appears to be possibly due to fault in the hadware, and does not happen on *every* Dell PowerEdge we have access to.. but it seems that the driver fails to read data over DDC when reading the EDID data for the connector. This results in it continuously retrying. Because i2c bit algo uses udelays, this results in enough spinning that it impacts my PTP setup. I do not know how wide the impacts from (2) are. I also do not know if the issue causes problems on systems that have a VGA port and which also have the VGA port plugged in. Given my experience, it seems entirely plausible. Here's a more detailed summary of the issue I saw: On regular (not PREEMPT_RT) kernel, the udelay on a CPU appears to block the interrupts from being fired on the CPU that is spinning. The polling doesn't use WORKQEUEUE_UNBOUND, so it schedules on the specific CPU that was executing the scheduling function. If this happens on the same CPU as the one that is assigned the IRQ for the ice driver, it won't fire. Since the polling thread ends up doing udelay for potentially a long time, it results in delaying ~20-30 milliseconds which is enough to impact the PTP functionality. I suspect this also causes issues with PREEMPT_RT but they might be different issues, due to the nature of PREEMPT_RT changing a lot of the way various critical sections work. I am fairly certain that other timing sensitive applications will have issues caused by this issue. It is plausible that such issues are below the radar for many deployments and its not ultimately causing a "real" impact for everyone.. but its definitely causing problems and hiccups for Intel as well as some of our partners and customers. The actual failure symptom we get is somewhat inconsistent. Even though it fails to read the EDID every second or 2, timestamps are only impacted every few minutes. But that missed timestamp is considered catastrophic failure for us. It results in ptp4l going to fault and losing synchronization for several seconds. It also means such setups do not pass various industry tests. We have been recently recommending that users remove the mgag200 driver. However this is a poor workaround as it results in inability to access the video over BMC, which some customers rely on. I also plan to confirm whether the mini DisplayPort is also affected by the driver removal. If it is, then removal of the driver could also result in the output being stopped. Some of our customers rely on the BMC to connect to the system, and I suspect it is useful to have local video access when debugging if something goes wrong with your remote access methods. Thus, I am trying to find another solution that resolves the issues we're having without needing to completely remove the driver. Thanks, Jake > >> >> >>> >>> Appreciate all the feedback on this thread. >>> >>> Thanks, >>> Jake >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: further issues with MGA G200 graphics chipset 2026-04-28 19:12 ` stuart hayes 2026-04-28 21:07 ` Jacob Keller @ 2026-04-29 6:40 ` Thomas Zimmermann 1 sibling, 0 replies; 20+ messages in thread From: Thomas Zimmermann @ 2026-04-29 6:40 UTC (permalink / raw) To: stuart hayes, Jacob Keller, Jocelyn Falempe, airlied@redhat.com Cc: dri-devel, linux-kernel@vger.kernel.org, Pasi Vaananen Hi Am 28.04.26 um 21:12 schrieb stuart hayes: > On 4/27/2026 7:14 AM, Thomas Zimmermann wrote: >> Hi >> >> Am 25.04.26 um 01:29 schrieb Jacob Keller: >> [...] >>>> >>>> So I suggest to pick Manufacturer, Product, Version as key. I'd be >>>> surprised if we find more than a hand full of systems with the >>>> issue. If >>>> we see a trend or common pattern, we can generalize later on. >>>> >>> I think this is the best solution. Keep it focused for now. I believe >>> Intel has two major platforms that we care about with respect to this >>> issue. I'll see if I can dig up the data. The systems install the MGA >>> G200 for BMC use but don't seem to expose the VGA connection. >>> >>> For the specific system I have that was faulty, we have the following: >>> >>> $ for t in system-manufacturer system-product-name system-version ; \ >>> do dmidecode -s ${t}; \ >>> done >>> Dell Inc. >>> PowerEdge XR8720t >>> Not Specified >>> >>> >>> >>> I believe there was also some concern about HP systems which similarly >>> use this chipset, but I don't have the DMI data for that one off hand. >>> I've asked some colleagues to confirm the situation and obtain that >>> data. I'll get back early next week if we think there are any other >>> systems possibly affected. >>> >>> In the mean time, I'm happy to have our team test any patch to confirm >>> that it behaves as expected and resolves the service interruptions. >> >> For now, I've modified the two places that have BMC support in the >> driver. Could you please also tell me your system's exact Matrox >> chipset or its PCI id? >> >> The patch is attached for your testing. It would work against drm-tip >> or v7.1-rc1. >> >> I've also found the page at [1], which claims that there's a Mini-DP >> port at the front. If so, I'd assume that there's also an extra >> encoder chip to replace the VGA. If we ever get specs for that, we >> could implement real support in the driver. >> >> In the meantime, the current fix should work. In the worst case, that >> Mini-DP port would give a lower default resolution. >> >> [1] https://www.dell.com/en-us/shop/ipovw/poweredge-xr8720t? >> hve=shop+now#techspecs_section >> >> Best regards >> Thomas >> >> > > So this patch disables DDC polling if the dmi_check_system() matches. > If this was to happen on systems that _do_ have a physical VGA > connector, will that port still be active, just with a resolution that > may not be compatible with the monitor that's plugged in? The DMI test is supposed to only match on affected systems. If there's a matched system with a VGA port, or possibly that Mini-DP port, users will get a number of bogus display modes at the worst. But the default mode is 1024x768, which should work on any display. > > I don't see anyone say that the DDC polling doesn't cause too much > latency for real time kernels on other systems that do have a VGA > connector... did I miss that, or is there a chance that a lot of other > systems that use this driver might also have issues with a real time > kernel? That missing VGA port is a problem on any workload. So we fix it as far as possible. Where the connector polling interferes with RT, it can also be disabled with drm_kms_helper.poll=0. Best regards Thomas > > >> >> >>> >>> Appreciate all the feedback on this thread. >>> >>> Thanks, >>> Jake >> > -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-04-29 6:40 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-22 23:55 further issues with MGA G200 graphics chipset Jacob Keller 2026-04-23 0:05 ` David Airlie 2026-04-23 21:39 ` Jacob Keller 2026-04-23 7:44 ` Thomas Zimmermann 2026-04-23 16:35 ` Jacob Keller 2026-04-23 19:22 ` Jocelyn Falempe 2026-04-23 19:42 ` Jacob Keller 2026-04-23 21:02 ` David Airlie 2026-04-23 21:18 ` Jacob Keller 2026-04-24 6:16 ` Thomas Zimmermann 2026-04-24 6:20 ` Thomas Zimmermann 2026-04-24 7:36 ` Jocelyn Falempe 2026-04-24 7:47 ` Thomas Zimmermann 2026-04-24 23:29 ` Jacob Keller 2026-04-27 12:14 ` Thomas Zimmermann 2026-04-27 22:53 ` Jacob Keller 2026-04-27 23:32 ` Jacob Keller 2026-04-28 19:12 ` stuart hayes 2026-04-28 21:07 ` Jacob Keller 2026-04-29 6:40 ` Thomas Zimmermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox