All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gratian Crisan <gratian.crisan@ni.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>,
	Hans de Goede <hdegoede@redhat.com>,
	linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
Date: Tue, 06 Aug 2024 16:35:11 -0500	[thread overview]
Message-ID: <87frrhqn8n.fsf@ni.com> (raw)
In-Reply-To: <dcde3b9f-ccc8-4e1e-8737-74768193f0af@intel.com>


Adrian Hunter <adrian.hunter@intel.com> writes:
> On 6/08/24 00:33, Gratian Crisan wrote:
>> 
>> We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
>> older) kernel(s):
>
> Do you know a kernel version that does not get an error?
>

Sorry for not being more clear in my original email - this is not a new issue. I
believe this Bay Trail hardware always had an issue with receiving "Tuning
Error" interrupts with certain SD cards. At least as far back as 4.9.47.

Up until commit b3855668d98c ("mmc: sdhci: Add support for "Tuning Error"
interrupts") these resulted in a "mmc0: Unexpected interrupt 0x04000000" splat,
which b3855668d98c fixed.

However, now that "Tuning Error" interrupts are treated as data interrupts and
handled in sdhci_data_irq() we are hitting a corner case where that tuning error
interrupt comes in after a MMC_SEND_STATUS command with no 'host->data'
associated resulting in the new splat.

Hence the question in my previous email: Should the tuning error interrupts be
handled in common code in sdhci_irq()?

>> 
>> [    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
>> [    4.793550]   with environment:
>> [    4.793786]     HOME=/
>> [    4.793985]     TERM=linux
>> [    4.794201]     BOOT_IMAGE=/runmode/bzImage
>> [    4.794485]     sys_reset=false
>> [    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
>> [    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
>> [    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
>> [    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
>> [    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
>> [    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
>> [    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
>> [    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
>> [    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
>> [    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>> [    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
>> [    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
>> [    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
>> [    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
>> [    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>> [    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
>> [    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
>> [    5.339859] mmc0: sdhci: ============================================
>> [    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
>> [    5.346127]  mmcblk0: p1 p2
>> 
>> This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.
>> 
>> The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
>> Add support for "Tuning Error" interrupts") and discussed here[1].
>
> Does reverting that commit help?
>

Reverting the commit brings back the original splat that commit fixed (albeit
without the I/O error):

[    4.893032] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.896238] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    4.905944] mmc0: Unexpected interrupt 0x04000000.
[    4.906272] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[    4.906664] mmc0: sdhci: Sys addr:  0x126e6200 | Version:  0x0000b502
[    4.907059] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[    4.907451] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x0000003b
[    4.907842] mmc0: sdhci: Present:   0x01ff0206 | Host ctl: 0x00000016
[    4.908234] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    4.908625] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
[    4.909015] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000002
[    4.909408] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
[    4.909800] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[    4.910193] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
[    4.910581] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[    4.910976] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00400900
[    4.911371] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00400900
[    4.911765] mmc0: sdhci: Host ctl2: 0x0000000c
[    4.912064] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x126e6200
[    4.912456] mmc0: sdhci: ============================================
[    4.913301]  mmcblk0: p1 p2
[    6.401855] EXT4-fs (mmcblk1p2): mounted filesystem d57a3d3c-a1f9-4f8e-8cbc-19dc5bb4fc4c r/w with ordered data mode. Quota mode: disabled.

>> I am new to this area of the kernel so I would appreciate any suggestions on the
>> direction to take here:
>> 
>>   - Should the tuning error interrupts be handled in common code in sdhci_irq()
>>     (or at least before the !host->data check in sdhci_data_irq())?
>> 
>>   - Is this more of an issue with tuning not happening when is expected or
>>     taking too long, since at first we do get the error during data transfer
>>     commands? Suggestions on what I should debug/trace next appreciated.
>
> SDHCI driver does not enable the "Tuning Error" interrupt, refer
> the kernel messages above:
>
> 	Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>
> but it happens anyway, so the "fix" was to handle it anyway.
>
> But it begs the question, wasn't the error happening already?
>

Kind of: Before we were getting "mmc0: Unexpected interrupt 0x04000000", but
somehow it didn't result in a I/O error. That may be just lucky timing.

Now we're getting "mmc0: Got data interrupt 0x04000000 even though no data operation was in
progress." followed by an I/O error on READ.

I appreciate your reply. I'm happy to work on a patch or test things if I'm
pointed in the right direction.

Thanks,
    Gratian

  reply	other threads:[~2024-08-06 22:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 21:33 mmc0: Got data interrupt 0x04000000 even though no data operation was in progress Gratian Crisan
2024-08-06  7:31 ` Adrian Hunter
2024-08-06 21:35   ` Gratian Crisan [this message]
2024-08-08  9:47     ` Adrian Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frrhqn8n.fsf@ni.com \
    --to=gratian.crisan@ni.com \
    --cc=adrian.hunter@intel.com \
    --cc=hdegoede@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.