Hi Fengguang, I am quoting both your messages below so read until the bottom. I've even found what looks to be the cause. On 11/9/11 2:01 PM, Wu Fengguang wrote: > Christopher, > > On Wed, Nov 09, 2011 at 05:30:18PM +0800, Christopher White wrote: >> Couldn't resist connecting to my machine over VNC even though I'm not home. > Thanks a lot! > >> So, I booted it with the new patch and see that while you HAVE found >> the bug, the new register addresses still seem to be wrong. Instead >> of the audio registers containing their default, pre-filled, >> standard 2-speaker configuration, > Yeah, AFAICS the BIOS may choose to pre-fill some simple ELD at boot time. Indeed, I could tell that the 2ch standard configuration was a pre-filled default. > >> it's now being overwritten - which >> is good. With garbage - which is bad. ;-) > One step forward ;-) Yes! ;-) > >> One possible cause is that >> the SandyBridge addresses shouldn't use the same register addresses >> as IvyBridge after all. This will have to be double checked. > What puzzled me is that I've been testing DisplayPort on a Sandybridge > notebook today and it is working fine. > > The other question is, why the intel_audio_dump tool goes wrong with > your hardware? It's reading the right register addresses in all the > boxes I tested... I've got good news about that further down. > >> Instead of /proc/asound/card0/eld#3.0 containing the default 2-speaker configuration, it now contains: >> monitor_present 1 >> eld_valid 1 > So at least it sets the ELD valid bit right. Hehe. That might be a side effect of incorrect writing though. We'll see. > >> monitor_name >> connection_type HDMI >> eld_version [0x0] reserved >> edid_version [0x0] no CEA EDID Timing Extension block present >> manufacture_id 0x0 >> product_id 0x0 >> port_id 0x0 >> support_hdcp 0 >> support_ai 0 >> audio_sync_delay 0 >> speakers [0x0] >> sad_count 0 >> >> You can see that the data is obviously zeroed out, and I bet it's due to a misaligned address. > Did you view *every* ELD file? > > cat /proc/asound/card0/eld* Yeah, the attached _eld.txt file shows that there is only *one* ELD file there. This is the file that used to contain the default 2ch initialization of the audio registers. At least we've come far enough that it's now being overwritten. It's being incorrectly overwritten or corrupted afterwards, but it's a start. > I verified that it's writing to the right address in the spec. And even > find direct evidence in your dmesg that the ELD contents are correctly > received and interpreted by the audio driver: > > [ 10.278612] HDMI hot plug event: Pin=7 Presence_Detect=1 ELD_Valid=1 > [ 10.278644] HDMI status: Pin=7 Presence_Detect=1 ELD_Valid=1 > > Output by snd_hdmi_show_eld(): > [ 10.282143] HDMI: detected monitor TX-SR607 at connection type HDMI > [ 10.282145] HDMI: available speakers: FL/FR LFE FC RL/RR RLC/RRC > [ 10.282147] HDMI: supports coding type LPCM: channels = 2, rates = 44100 48000 88200 176400 192000 384000, bits = 16 20 24 > [ 10.282149] HDMI: supports coding type LPCM: channels = 8, rates = 44100 48000 88200 176400 192000 384000, bits = 16 20 24 > [ 10.282151] HDMI: supports coding type AC-3: channels = 8, rates = 44100 48000 88200, max bitrate = 640000 > [ 10.282152] HDMI: supports coding type DTS: channels = 8, rates = 48000 88200, max bitrate = 1536000 > [ 10.282154] HDMI: supports coding type DSD (One Bit Audio): channels = 6, rates = 48000 > [ 10.282155] HDMI: supports coding type E-AC-3/DD+ (Dolby Digital Plus): channels = 8, rates = 48000 88200 > [ 10.282157] HDMI: supports coding type DTS-HD: channels = 8, rates = 48000 88200 176400 192000 384000 > [ 10.282159] HDMI: supports coding type MLP (Dolby TrueHD): channels = 8, rates = 88200 192000 > Hmm, wow, I compared this to my older dmesg.log which I gave you a few weeks ago, and that's definitely news. It was only showing the default 2ch configuration before. Now it IS writing to the correct address. How VERY strange! So it DOES write to the correct register... It gets STRANGER though, as you'll see at the bottom of this email. >> Booting with drm.debug=6 produced the attached log file which I've included for completeness. However, there's nothing strange in it. > Thanks, the full dmesg helped a lot! > >> On 11/9/11 10:00 AM, Christopher White wrote: >> Good day, Fengguang! Great work! This sounds very promising! >> >> I went through the ELD parsing code myself (drm_edid_to_eld), as my programmer mind's curiosity killed me even though I didn't really have time for it, and I could see that it grabs the CEA extension block, grabs the monitor name string, then goes through each data block collection, copying all short descriptor data for each of the block types we're interested in. Good and clean code. >> >> So, I came to the same conclusion - that the parsing code was completely correct. I'm therefore very happy to hear that you've found the real problem; trying to write the ELD structure to the wrong audio registers on SandyBridge. Yep, that HAS to be it! >> >> I've applied the patch and the kernel is currently being re-built, but I've got to leave home so I won't report back until later today. >> >> However, I am confident that you've found the true cause of the problem. Superb work once again! >> >> You're going to make a lot of Home Theater PC owners very happy. > ...I appreciate your help a lot! > > Thanks, > Fengguang On 11/9/11 2:12 PM, Wu Fengguang wrote: > Hi Christopher, > >> Now, onto the intel-gpu-tools test. I ran intel_audio_dump as requested >> and it only comes back with "Couldn't map MMIO region: No such file or >> directory". I spent 10 minutes looking around on Google to no avail. It >> seems it tries to mmap() something that doesn't exist. > What if you run it with > > export HAS_PCH_SPLIT=1 > intel_audio_dump > > I also queued a patch for the tool. Note that if the above trick > failed to work, the applied patch won't help, too. > > Thanks, > Fengguang The dump tool did not work with that environment variable either. However, it occurred to me that intel_audio_dump may be too outdated in my distro. It was built on 2010-04-01, v1.0.2+git20100324. If I look at http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ I can see that the reason for this is that the latest stable was 1.0.2 tagged nearly two years ago. I decided to build intel-gpu-tools from the latest Git source instead. That took a while to figure out as it also needed xutils-dev package for xorg-macros.m4, required by the autoconf script, and libtool (needed by the resulting configure script). So the complete list of dependencies is "autotools-dev pkg-config libpciaccess-dev libdrm-dev libdrm-intel1 xutils-dev libtool". DebugFS must also be ready and mounted on /sys/kernel/debug and enabled in the kernel (kernel hacking > debug file system). Finally, building it is standard procedure with autogen.sh, configure, make and make install. (I am writing down these instructions just in case someone else reads this down the line; Google is a wonderful thing). After building, I tried running intel_audio_dump, and was first dumbfounded as it gave me the same error, "Couldn't map MMIO region". I verified with "which intel_audio_dump" that it DID point to the NEW /usr/local/bin/intel_audio_dump path, and not the OLD /usr/bin/intel_audio_dump path. However, I thought that maybe it WAS running the OLD version for some reason despite claiming it was pointing to the new one. So, I tried calling it specifically with the full path to the new binary, and... SUCCESS! You need to tag a new release version of intel-gpu-tools soon so that distros are updated, since the old 1.0.2 release does NOT support SandyBridge. I've attached the full dump here. Scroll down to the bottom and you can see that I was right in my theory that all the ELD data was zeroed out. But hey at least we're getting SOMEWHERE! ;-) So what we KNOW now: ELD parsing code = 100% correct. ELD writing to correct audio register = 100% correct, verified by looking at snd_hdmi_show_eld()'s output in dmesg log. However, SOMETIME after the boot, it seems that it gets corrupted/zeroed out. I'll replicate the relevant dump portion here: AUD_HDMIW_HDMIEDID_A HDMI ELD: 10000d00 6882004f 00000000 00000000 3dcb6508 AUD_HDMIW_HDMIEDID_B HDMI ELD: 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_HDMIEDID_C HDMI ELD: 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_A HDMI audio Infoframe: 84010a70 01000000 00000000 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_B HDMI audio Infoframe: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_C HDMI audio Infoframe: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 I decided to look in /sys/class/drm/card0-HDMI-A-2/edid and it's 0 bytes! This used to be 256 bytes! How freaking weird is that?! That means: System boots up, Intel driver sees 256 byte EDID, parses it into ELD, writes it to the audio register, the system dmesg log shows that it parsed all supported audio modes correctly, then the system boots and edid becomes 0 bytes, and the ELD is zeroed out. What the heck is going on here? :-O I tried "dmesg | grep "HDMI: detected monitor"" and see NOTHING later than the initial boot event, meaning I have no freaking clue why it's zeroing out the EDID. It almost looks like the act of writing ELD to the audio register is tampering with the ability of the graphics card to read the EDID itself after that point. Erhmm... This is very odd. Finally, I tried a complete power cycle of every component, turning off the outlet power on everything. I then started the Receiver, then the Projector, and finally the computer. Not that startup order matters much, but this is the optimal order. However, it still did the same thing. With one difference. /sys/class/drm/card0-HDMI-A-2/edid now contains the correct contents. Everything else was as before: /proc/asound/card0/eld#3.0 full of zeroes as shown in the attached file. Intel_Audio_Dump showing the EXACT SAME zeroed out content as I have quoted above. DMESG showing the exact same, nice list of supported codecs and rates. So, somewhere AFTER the write of correct ELD to the audio register, it all goes wrong and gets zeroed out. I'm thinking POSSIBLY some routine that runs after snd_hdmi_show_eld() could be responsible for clearing all data? This is on an ASUS P8H67-I B3 m-ITX Intel H67 motherboard, and an Intel Core i5 2500K CPU with Intel HD3000. Err... wait a minute! I think I've figured it out! My Intel H67 motherboard has ONE HDMI output. That port is connected to card0-HDMI-A-2 internally (port two, NOT port one). Now note the audio register dump again: AUD_HDMIW_HDMIEDID_A HDMI ELD: 10000d00 6882004f 00000000 00000000 3dcb6508 AUD_HDMIW_HDMIEDID_B HDMI ELD: 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_HDMIEDID_C HDMI ELD: 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_A HDMI audio Infoframe: 84010a70 01000000 00000000 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_B HDMI audio Infoframe: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 AUD_HDMIW_INFOFR_C HDMI audio Infoframe: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 It has written Port 2's ELD to Port 1, and zeroed out all other ports. So OF COURSE when the driver goes to query the ELD for port 2, it finds zeroed out data. Could this be it!? If so, this would be a bug related to the current TODO/FIXME of "needing per-port ELD parsing".