From: Jacob Keller <jacob.e.keller@intel.com>
To: Thomas Zimmermann <tzimmermann@suse.de>,
Jocelyn Falempe <jfalempe@redhat.com>,
"airlied@redhat.com" <airlied@redhat.com>
Cc: <dri-devel@lists.freedesktop.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Pasi Vaananen <pvaanane@redhat.com>
Subject: Re: further issues with MGA G200 graphics chipset
Date: Thu, 23 Apr 2026 09:35:31 -0700 [thread overview]
Message-ID: <6ec01703-31e0-4998-9508-a5a115ae7bc9@intel.com> (raw)
In-Reply-To: <a9d176ec-d19b-4f41-af16-cdc4e475a8cc@suse.de>
On 4/23/2026 12:44 AM, Thomas Zimmermann wrote:
> Hi
>
> Am 23.04.26 um 01:55 schrieb Jacob Keller:
>> Hello,
>>
>> You may recall the issues I recently reported and submitted a fix for in
>> the mgag200 DRM driver from [1].
>>
>> [1]:
>> https://lore.kernel.org/all/20260202-jk-mgag200-fix-bad-udelay-v2-1-
>> ce1e9665987d@intel.com/
>>
>> I recently have been running into another issue with the mgag200
>> graphics driver on a similar platform. I noticed occasional spikes where
>> Tx timestamps from the ice driver were delayed, very similar behavior to
>> what was going on with the original bug report. However, this was on a
>> system running v6.12.76, which contains my MGA G200 usleep fix.
>>
>> I analyzed the data with perf and have discovered what looks like
>> another issue where the mgag200 polling routine is causing us issues.
>>
>> Here's a perf report which captures the cycles samples between the start
>> of a Tx timestamp request and the point where we report it to the stack:
>>
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> ret_from_fork_asm
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> ret_from_fork
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> kthread
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> worker_thread
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> process_one_work
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> output_poll_execute
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_client_dev_hotplug
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_fbdev_shmem_client_hotplug
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_fb_helper_hotplug_event
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_client_modeset_probe
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_helper_probe_single_connector_modes
>>> + 89.87% 0.00% kworker/65:1-ev [mgag200] [k]
>>> mgag200_vga_bmc_connector_helper_get_modes
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_connector_helper_get_modes
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_edid_read
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_edid_read_custom
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> _drm_do_get_edid
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> edid_block_read
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> drm_do_probe_ddc_edid
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> i2c_transfer
>>> + 89.87% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> __i2c_transfer
>>> + 89.87% 0.00% kworker/65:1-ev [i2c_algo_bit] [k]
>>> bit_xfer
>>> - 59.65% 59.65% kworker/65:1-ev [kernel.kallsyms] [k]
>>> delay_halt_tpause
>>> ret_from_fork_asm
>>> ret_from_fork
>>> kthread
>>> worker_thread
>>> process_one_work
>>> output_poll_execute
>>> drm_client_dev_hotplug
>>> drm_fbdev_shmem_client_hotplug
>>> drm_fb_helper_hotplug_event
>>> drm_client_modeset_probe
>>> drm_helper_probe_single_connector_modes
>>> mgag200_vga_bmc_connector_helper_get_modes
>>> drm_connector_helper_get_modes
>>> drm_edid_read
>>> drm_edid_read_custom
>>> _drm_do_get_edid
>>> edid_block_read
>>> drm_do_probe_ddc_edid
>>> i2c_transfer
>>> __i2c_transfer
>>> + bit_xfer
>>> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> __udelay
>>> + 59.65% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> __const_udelay
>>> + 51.11% 0.00% kworker/65:1-ev [i2c_algo_bit] [k]
>>> sclhi
>>> + 30.22% 30.22% kworker/65:1-ev [kernel.kallsyms] [k]
>>> ioread8
>>> + 7.30% 0.00% kworker/65:1-ev [kernel.kallsyms] [k]
>>> delay_halt
>>> + 7.30% 0.00% kworker/65:1-ev [i2c_algo_bit] [k]
>>> acknak
>>> + 7.29% 0.00% kworker/65:1-ev [mgag200] [k]
>>> mgag200_ddc_algo_bit_data_setscl
>>> + 5.02% 0.00% swapper [kernel.kallsyms] [k]
>>> secondary_startup_64
>>> + 5.02% 0.00% swapper [kernel.kallsyms] [k]
>>> start_secondary
>>> + 5.02% 0.00% swapper [kernel.kallsyms] [k]
>>> cpu_startup_entry
>>> + 5.02% 0.00% swapper [kernel.kallsyms] [k]
>>> do_idle
>>> + 3.60% 0.00% swapper [kernel.kallsyms] [k]
>>> call_cpuidle
>>> + 3.60% 0.00% swapper [kernel.kallsyms] [k]
>>> cpuidle_enter
>>> + 3.53% 0.00% swapper [kernel.kallsyms] [k]
>>> cpuidle_enter_state
>>> + 2.57% 0.00% kworker/65:1-ev [mgag200] [k]
>>> mgag200_ddc_algo_bit_data_setsda
>>> + 2.14% 0.00% perf [unknown] [k]
>>> 0xffffffffffffffff
>>> + 2.14% 0.00% perf perf [.]
>>> __cmd_record.constprop.0
>>> + 2.14% 0.00% perf [kernel.kallsyms] [k]
>>> entry_SYSCALL_64
>>> + 2.14% 0.00% perf [kernel.kallsyms] [k]
>>> do_syscall_64
>>> + 2.14% 0.00% perf [kernel.kallsyms] [k]
>>> x64_sys_call
>>> + 2.06% 2.06% swapper [kernel.kallsyms] [k]
>>> intel_idle
>>> + 1.31% 0.42% perf [kernel.kallsyms] [k]
>>> do_sys_poll
>>> + 1.31% 0.00% perf perf [.]
>>> fdarray__poll
>>> + 1.31% 0.00% perf libc.so.6 [.]
>>> __poll
>>> + 1.31% 0.00% perf [kernel.kallsyms] [k]
>>> __x64_sys_poll
>>> + 1.06% 0.00% systemd-journal systemd-journald [.]
>>> 0x00005d6bb7cb3f64
>>> + 1.06% 0.00% systemd-journal libc.so.6 [.]
>>> __libc_start_main
>>> + 1.06% 0.00% systemd-journal libc.so.6 [.]
>>> 0x00007d6ce3a2a1c9
>>> + 1.06% 0.00% systemd-journal systemd-journald [.]
>>> 0x00005d6bb7cb389e
>>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.]
>>> sd_event_run
>>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.]
>>> sd_event_dispatch
>>> + 1.06% 0.00% systemd-journal libsystemd-shared-255.so [.]
>>> 0x00007d6ce409d413
>>> + 1.00% 0.00% kworker/65:1-ev [i2c_algo_bit] [k]
>>> i2c_stop
>>> + 0.83% 0.00% perf [kernel.kallsyms] [k]
>>> perf_poll
>>> + 0.83% 0.00% perf perf [.]
>>> record__mmap_read_evlist
>>>
>> As you can see, in this case we are spending +60% of the cycles in
>> delay_halt_tpause which is part of the bit_xfer function for
>> implementing i2c.
>
> That's from the DDC's i2c channel, which we poll on regular intervals
> when we update the connector status. Dave's suggestion should at least
> mitigate the problem.
>
Right.
>
> Polling the DDC involves acquiring locks so that it does not interfere
> with display updates. These errors about drm_fb_helper_damage_work() are
> fallout. The function most likely waits for the DDC polling to finish.
Makes sense. I'm still wondering if it makes sense to convert to
WQ_UNBOUND so that the task doesn't get bound to CPU and (hopefully?)
doesn't cause other critical processes like IRQs to get stuck when they
*happen* to be bound to the same CPU? I'm not entirely sure. It seems
crazy to me that this simple background polling thread stalls my IRQ
from executing for 30 milliseconds, but that appears to be what is
happening.
I am guessing that refactoring the i2c-bit-algo to allow usleep is not
really possible either, so we can't make this part of the logic actually
sleep instead of busy-waiting.. :(
>>
>> I do noot understand exactly what is causing the driver to get stuck,
>> its something in the i2c routine for reading the EDID block.
>>
>> I also see this being printed:
>>
>> EDID block 0 (tag 0x00) checksum is invalid, remainder is 125
>>
>> It appears to print quite consistently every few seconds. I guess this
>> might be possibly related to a bad EDID block on the mgag200 device?
>> What does this even mean?
>
> The monitor's EDID is wrong. This is likely another fallout from the issue.
>
It turns out that the platform doesn't even seem to have a physical VGA
port. This makes me suspect Dave's point about a cheap resistor is quite
plausible.
>>
>> I am not sure how I'd go about verifying this, or root causing what is
>> going wrong.
>>
>> It looks like we print the message as part of _drm_do_get_edid(), and
>> this definitely is called as part of the mgag200 routines:
>>
>>> - 33.33% 33.33% kworker/64:1-ev [kernel.kallsyms] [k]
>>> _drm_do_get_edid
>>> ret_from_fork_asm
>>> ret_from_fork
>>> kthread
>>> worker_thread
>>> process_one_work
>>> output_poll_execute
>>> drm_client_dev_hotplug
>>> drm_fbdev_shmem_client_hotplug
>>> drm_fb_helper_hotplug_event
>>> drm_client_modeset_probe
>>> drm_helper_probe_single_connector_modes
>>> mgag200_vga_bmc_connector_helper_get_modes
>>> drm_connector_helper_get_modes
>>> drm_edid_read
>>> drm_edid_read_custom
>>> _drm_do_get_edid
>> This makes me think that we're reading a bad EDID. I enabled drm.debug
>> setting to get more data:
>>
>>> Apr 22 23:47:11 1762811 kernel: EDID block 0 (tag 0x00) checksum is
>>> invalid, remainder is 125
>>> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0:
>>> [drm:connector_bad_edid] [CONNECTOR:36:VGA-1] EDID is invalid:
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD 00 ff ff ff ff ff
>>> ff 00 ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>>> Apr 22 23:47:11 1762811 kernel: [00] BAD ff ff ff ff ff ff
>>> ff ff ff ff ff ff ff ff ff ff
>
> This EDID has a correct identifier in the first 8 bytes and the rest is
> garbage.
>
Yep.
>>> Apr 22 23:47:11 1762811 kernel: mgag200 0000:b5:00.0:
>>> [drm:drm_client_dev_hotplug] fbdev: ret=0
>> Does anyone have any idea whats going wrong here? A google search seems
>> to imply this is reading the EDID data from the VGA cable...
>
> The HW is probably broken.
>
Right. I thought we had a KVM dongle plugged into the VGA port, but
further inspection shows that there doesn't even appear to be a physical
VGA port on the system, and the mgag200 is only used for its BMC
connection! (We have a mini display port to VGA adapter in use, and I've
asked the team to swap that out just to confirm its not related)
>>
>> I'm also curious if its possible to stop polling for so long with udelay
>> in the i2c logic somehow? I am not very familiar with i2c, but it is
>> frustrating that this driver is causing yet another stall that is
>> impacting timing sensitive data. Even if in this case its due to a
>> faulty cable.. it is frustrating that such result causes the PTP
>> failures. Would switching to WQ_UNBOUND be helpful here at all?
>
> Try Dave's suggestion to avoid polling. The driver won't be able to
> detect changes to the connector status, though.
>
That's fine. I don't think we're even using the device. It looks like it
might only be in use for BMC, and the VGA connection isn't actually
physically available, so there are no changes to detect.
Is this polling really only to detect when VGA is enabled? Would it make
sense to only poll on platforms which actually *have* that VGA connection?
I'd like a solution where we don't have to go to each individual
customer and have them ban the mgag200 driver or set some kernel
parameter like drm_kms_helper.poll=0 to prevent issues. If the VGA
connector isn't even available to *be* plugged in, then it doesn't make
sense to constantly poll to check if it was...
Many system admins likely aren't even aware of the devices existence,
and it ends up causing stall issues like this, which for timing
sensitive tasks results in service disruption.
It is unpleasant that the mere *existence* of the device+driver causes
such problems.
> Best regards
> Thomas
>
>>
>> Thanks,
>> Jake
>
next prev parent reply other threads:[~2026-04-23 16:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 23:55 further issues with MGA G200 graphics chipset Jacob Keller
2026-04-23 0:05 ` David Airlie
2026-04-23 21:39 ` Jacob Keller
2026-04-23 7:44 ` Thomas Zimmermann
2026-04-23 16:35 ` Jacob Keller [this message]
2026-04-23 19:22 ` Jocelyn Falempe
2026-04-23 19:42 ` Jacob Keller
2026-04-23 21:02 ` David Airlie
2026-04-23 21:18 ` Jacob Keller
2026-04-24 6:16 ` Thomas Zimmermann
2026-04-24 6:20 ` Thomas Zimmermann
2026-04-24 7:36 ` Jocelyn Falempe
2026-04-24 7:47 ` Thomas Zimmermann
2026-04-24 23:29 ` Jacob Keller
2026-04-27 12:14 ` Thomas Zimmermann
2026-04-27 22:53 ` Jacob Keller
2026-04-27 23:32 ` Jacob Keller
2026-04-28 19:12 ` stuart hayes
2026-04-28 21:07 ` Jacob Keller
2026-04-29 6:40 ` Thomas Zimmermann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ec01703-31e0-4998-9508-a5a115ae7bc9@intel.com \
--to=jacob.e.keller@intel.com \
--cc=airlied@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jfalempe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pvaanane@redhat.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox