mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
@ 2024-08-05 21:33 Gratian Crisan
  2024-08-06  7:31 ` Adrian Hunter
  0 siblings, 1 reply; 4+ messages in thread
From: Gratian Crisan @ 2024-08-05 21:33 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ulf Hansson, Hans de Goede, linux-mmc, linux-kernel

Hi all,

We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
older) kernel(s):

[    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.793550]   with environment:
[    4.793786]     HOME=/
[    4.793985]     TERM=linux
[    4.794201]     BOOT_IMAGE=/runmode/bzImage
[    4.794485]     sys_reset=false
[    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
[    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
[    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
[    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
[    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
[    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
[    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
[    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
[    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
[    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
[    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
[    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
[    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
[    5.339859] mmc0: sdhci: ============================================
[    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[    5.346127]  mmcblk0: p1 p2

This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.

The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
Add support for "Tuning Error" interrupts") and discussed here[1].

After adding some debug prints it appears that in our case we get a tuning error
interrupt during a MMC_SEND_STATUS (13) sdhci cmd which has no 'host->data'
associated with it (leading to the splat):

[    4.893298] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.896489] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    4.906048] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    4.963027] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    5.384960] mmc0: tuning err irq, sdhci cmd: 17, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
[    5.442877] mmc0: tuning err irq, sdhci cmd: 13, host->cmd: 00000000e1669bad, host->data: 0000000000000000
[    5.443463] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.

I am new to this area of the kernel so I would appreciate any suggestions on the
direction to take here:

  - Should the tuning error interrupts be handled in common code in sdhci_irq()
    (or at least before the !host->data check in sdhci_data_irq())?

  - Is this more of an issue with tuning not happening when is expected or
    taking too long, since at first we do get the error during data transfer
    commands? Suggestions on what I should debug/trace next appreciated.

Thanks,
    Gratian

[1] https://lore.kernel.org/r/20240410191639.526324-3-hdegoede@redhat.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
  2024-08-05 21:33 mmc0: Got data interrupt 0x04000000 even though no data operation was in progress Gratian Crisan
@ 2024-08-06  7:31 ` Adrian Hunter
  2024-08-06 21:35   ` Gratian Crisan
  0 siblings, 1 reply; 4+ messages in thread
From: Adrian Hunter @ 2024-08-06  7:31 UTC (permalink / raw)
  To: Gratian Crisan; +Cc: Ulf Hansson, Hans de Goede, linux-mmc, linux-kernel

On 6/08/24 00:33, Gratian Crisan wrote:
> Hi all,
> 
> We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
> older) kernel(s):

Do you know a kernel version that does not get an error?

> 
> [    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
> [    4.793550]   with environment:
> [    4.793786]     HOME=/
> [    4.793985]     TERM=linux
> [    4.794201]     BOOT_IMAGE=/runmode/bzImage
> [    4.794485]     sys_reset=false
> [    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
> [    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
> [    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
> [    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
> [    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
> [    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
> [    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> [    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
> [    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
> [    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
> [    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
> [    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
> [    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
> [    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
> [    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
> [    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
> [    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
> [    5.339859] mmc0: sdhci: ============================================
> [    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
> [    5.346127]  mmcblk0: p1 p2
> 
> This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.
> 
> The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
> Add support for "Tuning Error" interrupts") and discussed here[1].

Does reverting that commit help?

> 
> After adding some debug prints it appears that in our case we get a tuning error
> interrupt during a MMC_SEND_STATUS (13) sdhci cmd which has no 'host->data'
> associated with it (leading to the splat):
> 
> [    4.893298] mmc0: new ultra high speed DDR50 SDHC card at address 0001
> [    4.896489] mmcblk0: mmc0:0001 0016G 15.2 GiB
> [    4.906048] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
> [    4.963027] mmc0: tuning err irq, sdhci cmd: 18, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
> [    5.384960] mmc0: tuning err irq, sdhci cmd: 17, host->cmd: 0000000003b39249, host->data: 00000000c0b4ad8a
> [    5.442877] mmc0: tuning err irq, sdhci cmd: 13, host->cmd: 00000000e1669bad, host->data: 0000000000000000
> [    5.443463] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
> 
> I am new to this area of the kernel so I would appreciate any suggestions on the
> direction to take here:
> 
>   - Should the tuning error interrupts be handled in common code in sdhci_irq()
>     (or at least before the !host->data check in sdhci_data_irq())?
> 
>   - Is this more of an issue with tuning not happening when is expected or
>     taking too long, since at first we do get the error during data transfer
>     commands? Suggestions on what I should debug/trace next appreciated.

SDHCI driver does not enable the "Tuning Error" interrupt, refer
the kernel messages above:

	Int enab:  0x03ff008b | Sig enab: 0x03ff008b

but it happens anyway, so the "fix" was to handle it anyway.

But it begs the question, wasn't the error happening already?

> 
> Thanks,
>     Gratian
> 
> [1] https://lore.kernel.org/r/20240410191639.526324-3-hdegoede@redhat.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
  2024-08-06  7:31 ` Adrian Hunter
@ 2024-08-06 21:35   ` Gratian Crisan
  2024-08-08  9:47     ` Adrian Hunter
  0 siblings, 1 reply; 4+ messages in thread
From: Gratian Crisan @ 2024-08-06 21:35 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ulf Hansson, Hans de Goede, linux-mmc, linux-kernel


Adrian Hunter <adrian.hunter@intel.com> writes:
> On 6/08/24 00:33, Gratian Crisan wrote:
>> 
>> We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
>> older) kernel(s):
>
> Do you know a kernel version that does not get an error?
>

Sorry for not being more clear in my original email - this is not a new issue. I
believe this Bay Trail hardware always had an issue with receiving "Tuning
Error" interrupts with certain SD cards. At least as far back as 4.9.47.

Up until commit b3855668d98c ("mmc: sdhci: Add support for "Tuning Error"
interrupts") these resulted in a "mmc0: Unexpected interrupt 0x04000000" splat,
which b3855668d98c fixed.

However, now that "Tuning Error" interrupts are treated as data interrupts and
handled in sdhci_data_irq() we are hitting a corner case where that tuning error
interrupt comes in after a MMC_SEND_STATUS command with no 'host->data'
associated resulting in the new splat.

Hence the question in my previous email: Should the tuning error interrupts be
handled in common code in sdhci_irq()?

>> 
>> [    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
>> [    4.793550]   with environment:
>> [    4.793786]     HOME=/
>> [    4.793985]     TERM=linux
>> [    4.794201]     BOOT_IMAGE=/runmode/bzImage
>> [    4.794485]     sys_reset=false
>> [    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
>> [    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
>> [    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
>> [    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
>> [    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
>> [    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
>> [    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
>> [    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
>> [    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
>> [    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>> [    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
>> [    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
>> [    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
>> [    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
>> [    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>> [    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
>> [    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
>> [    5.339859] mmc0: sdhci: ============================================
>> [    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
>> [    5.346127]  mmcblk0: p1 p2
>> 
>> This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.
>> 
>> The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
>> Add support for "Tuning Error" interrupts") and discussed here[1].
>
> Does reverting that commit help?
>

Reverting the commit brings back the original splat that commit fixed (albeit
without the I/O error):

[    4.893032] mmc0: new ultra high speed DDR50 SDHC card at address 0001
[    4.896238] mmcblk0: mmc0:0001 0016G 15.2 GiB
[    4.905944] mmc0: Unexpected interrupt 0x04000000.
[    4.906272] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[    4.906664] mmc0: sdhci: Sys addr:  0x126e6200 | Version:  0x0000b502
[    4.907059] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[    4.907451] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x0000003b
[    4.907842] mmc0: sdhci: Present:   0x01ff0206 | Host ctl: 0x00000016
[    4.908234] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    4.908625] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
[    4.909015] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000002
[    4.909408] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
[    4.909800] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[    4.910193] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
[    4.910581] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[    4.910976] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00400900
[    4.911371] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00400900
[    4.911765] mmc0: sdhci: Host ctl2: 0x0000000c
[    4.912064] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x126e6200
[    4.912456] mmc0: sdhci: ============================================
[    4.913301]  mmcblk0: p1 p2
[    6.401855] EXT4-fs (mmcblk1p2): mounted filesystem d57a3d3c-a1f9-4f8e-8cbc-19dc5bb4fc4c r/w with ordered data mode. Quota mode: disabled.

>> I am new to this area of the kernel so I would appreciate any suggestions on the
>> direction to take here:
>> 
>>   - Should the tuning error interrupts be handled in common code in sdhci_irq()
>>     (or at least before the !host->data check in sdhci_data_irq())?
>> 
>>   - Is this more of an issue with tuning not happening when is expected or
>>     taking too long, since at first we do get the error during data transfer
>>     commands? Suggestions on what I should debug/trace next appreciated.
>
> SDHCI driver does not enable the "Tuning Error" interrupt, refer
> the kernel messages above:
>
> 	Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>
> but it happens anyway, so the "fix" was to handle it anyway.
>
> But it begs the question, wasn't the error happening already?
>

Kind of: Before we were getting "mmc0: Unexpected interrupt 0x04000000", but
somehow it didn't result in a I/O error. That may be just lucky timing.

Now we're getting "mmc0: Got data interrupt 0x04000000 even though no data operation was in
progress." followed by an I/O error on READ.

I appreciate your reply. I'm happy to work on a patch or test things if I'm
pointed in the right direction.

Thanks,
    Gratian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
  2024-08-06 21:35   ` Gratian Crisan
@ 2024-08-08  9:47     ` Adrian Hunter
  0 siblings, 0 replies; 4+ messages in thread
From: Adrian Hunter @ 2024-08-08  9:47 UTC (permalink / raw)
  To: Gratian Crisan; +Cc: Ulf Hansson, Hans de Goede, linux-mmc, linux-kernel

On 7/08/24 00:35, Gratian Crisan wrote:
> 
> Adrian Hunter <adrian.hunter@intel.com> writes:
>> On 6/08/24 00:33, Gratian Crisan wrote:
>>>
>>> We are getting the following splat on latest 6.11.0-rc2-00002-gc813111d19e6 (and
>>> older) kernel(s):
>>
>> Do you know a kernel version that does not get an error?
>>
> 
> Sorry for not being more clear in my original email - this is not a new issue. I
> believe this Bay Trail hardware always had an issue with receiving "Tuning
> Error" interrupts with certain SD cards. At least as far back as 4.9.47.
> 
> Up until commit b3855668d98c ("mmc: sdhci: Add support for "Tuning Error"
> interrupts") these resulted in a "mmc0: Unexpected interrupt 0x04000000" splat,
> which b3855668d98c fixed.
> 
> However, now that "Tuning Error" interrupts are treated as data interrupts and
> handled in sdhci_data_irq() we are hitting a corner case where that tuning error
> interrupt comes in after a MMC_SEND_STATUS command with no 'host->data'
> associated resulting in the new splat.

Ok, thanks for clarifying.

> Hence the question in my previous email: Should the tuning error interrupts be
> handled in common code in sdhci_irq()?
> 
>>>
>>> [    4.792991] mmc0: new ultra high speed DDR50 SDHC card at address 0001
>>> [    4.793550]   with environment:
>>> [    4.793786]     HOME=/
>>> [    4.793985]     TERM=linux
>>> [    4.794201]     BOOT_IMAGE=/runmode/bzImage
>>> [    4.794485]     sys_reset=false
>>> [    4.795791] mmcblk0: mmc0:0001 0016G 15.2 GiB
>>> [    5.333153] mmc0: Got data interrupt 0x04000000 even though no data operation was in progress.
>>> [    5.333676] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>> [    5.334069] mmc0: sdhci: Sys addr:  0x12454200 | Version:  0x0000b502
>>> [    5.334464] mmc0: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
>>> [    5.334860] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000010
>>> [    5.335253] mmc0: sdhci: Present:   0x01ff0000 | Host ctl: 0x00000016
>>> [    5.335648] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
>>> [    5.336040] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
>>> [    5.336432] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
>>> [    5.336824] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>>> [    5.337214] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
>>> [    5.337605] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
>>> [    5.337997] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000
>>> [    5.338389] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
>>> [    5.338780] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>>> [    5.339170] mmc0: sdhci: Host ctl2: 0x0000000c
>>> [    5.339468] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x12454200
>>> [    5.339859] mmc0: sdhci: ============================================
>>> [    5.340293] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>>> [    5.344663] Buffer I/O error on dev mmcblk0, logical block 0, async page read
>>> [    5.346127]  mmcblk0: p1 p2
>>>
>>> This is on an Intel Bay Trail based system: NI cRIO-9053 using an Atom E3805.
>>>
>>> The issue appears related to the one fixed by commit b3855668d98c ("mmc: sdhci:
>>> Add support for "Tuning Error" interrupts") and discussed here[1].
>>
>> Does reverting that commit help?
>>
> 
> Reverting the commit brings back the original splat that commit fixed (albeit
> without the I/O error):
> 
> [    4.893032] mmc0: new ultra high speed DDR50 SDHC card at address 0001
> [    4.896238] mmcblk0: mmc0:0001 0016G 15.2 GiB
> [    4.905944] mmc0: Unexpected interrupt 0x04000000.
> [    4.906272] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [    4.906664] mmc0: sdhci: Sys addr:  0x126e6200 | Version:  0x0000b502
> [    4.907059] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> [    4.907451] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x0000003b
> [    4.907842] mmc0: sdhci: Present:   0x01ff0206 | Host ctl: 0x00000016
> [    4.908234] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> [    4.908625] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
> [    4.909015] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000002
> [    4.909408] mmc0: sdhci: Int enab:  0x03ff008b | Sig enab: 0x03ff008b
> [    4.909800] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> [    4.910193] mmc0: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
> [    4.910581] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> [    4.910976] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00400900
> [    4.911371] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00400900
> [    4.911765] mmc0: sdhci: Host ctl2: 0x0000000c
> [    4.912064] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x126e6200
> [    4.912456] mmc0: sdhci: ============================================
> [    4.913301]  mmcblk0: p1 p2
> [    6.401855] EXT4-fs (mmcblk1p2): mounted filesystem d57a3d3c-a1f9-4f8e-8cbc-19dc5bb4fc4c r/w with ordered data mode. Quota mode: disabled.
> 
>>> I am new to this area of the kernel so I would appreciate any suggestions on the
>>> direction to take here:
>>>
>>>   - Should the tuning error interrupts be handled in common code in sdhci_irq()
>>>     (or at least before the !host->data check in sdhci_data_irq())?
>>>
>>>   - Is this more of an issue with tuning not happening when is expected or
>>>     taking too long, since at first we do get the error during data transfer
>>>     commands? Suggestions on what I should debug/trace next appreciated.
>>
>> SDHCI driver does not enable the "Tuning Error" interrupt, refer
>> the kernel messages above:
>>
>> 	Int enab:  0x03ff008b | Sig enab: 0x03ff008b
>>
>> but it happens anyway, so the "fix" was to handle it anyway.
>>
>> But it begs the question, wasn't the error happening already?
>>
> 
> Kind of: Before we were getting "mmc0: Unexpected interrupt 0x04000000", but
> somehow it didn't result in a I/O error. That may be just lucky timing.
> 
> Now we're getting "mmc0: Got data interrupt 0x04000000 even though no data operation was in
> progress." followed by an I/O error on READ.
> 
> I appreciate your reply. I'm happy to work on a patch or test things if I'm
> pointed in the right direction.

Note that neither "Got data interrupt ... even though no data
operation was in progress" nor "Unexpected interrupt ..." messages
result in I/O errors directly, but indicate that the host controller
is not behaving as expected.

However, according to the spec. "Tuning Error" interrupt status is
an unrecoverable error, so commit b3855668d98c ("mmc: sdhci: Add
support for "Tuning Error" interrupts") began treating it that way
and causing a re-tune and retry, as Hans reported that it made things
work for his devices.

You should first get a better idea of what circumstances errors are
occurring.  Enabling debug messages should help, but it will cause
a very large number of messages and you may also want to increase
the kernel message buffer size (CONFIG_LOG_BUF_SHIFT)

Dynamic debug for mmc:

    Kernel must be configured:

        CONFIG_DYNAMIC_DEBUG=y

    To enable mmc debug via sysfs:

        echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
        echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control

    To enable mmc debug via kernel command line:

        dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"

    To disable mmc debug:

        echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
        echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control

    More general information in kernel documentation in kernel tree:

        Documentation/admin-guide/dynamic-debug-howto.rst





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-08  9:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05 21:33 mmc0: Got data interrupt 0x04000000 even though no data operation was in progress Gratian Crisan
2024-08-06  7:31 ` Adrian Hunter
2024-08-06 21:35   ` Gratian Crisan
2024-08-08  9:47     ` Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox