linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux v6.6 sporadic reboot failures with ath9k on i.MX6Q
@ 2023-11-29  9:22 Luca Ceresoli
  2023-11-30  5:10 ` Florian Fainelli
  0 siblings, 1 reply; 3+ messages in thread
From: Luca Ceresoli @ 2023-11-29  9:22 UTC (permalink / raw)
  To: linux-wireless; +Cc: imx

Hello,

since several weeks I am investigating a sporadic reboot failure on a
custom board based on i.MX6Q. There is an ATH9K Wi-Fi card connected
over PCIe, and the main suspect is the ath9k driver.

Anybody aware of this kind of bug with ath9k?

Some details about my tests follow.

This is on mainline v6.6 Linux, with only the board dts and a defconfig
added. The board dts itself is based on imx6q.dtsi and among others it
adds:

&pcie {
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_pcie>;
        reset-gpio = <&gpio2 20 GPIO_ACTIVE_LOW>;
        status = "okay";
};

and:

&iomuxc {
/* ... */
        imx6qdl-sabresd {
/* ... */
                pinctrl_pcie: pciegrp {
                        fsl,pins = <
                                MX6QDL_PAD_EIM_A18__GPIO2_IO20  0x1b0b0
                        >;
                };
/* ... */
        };
};
                                                                                                                                                                                             
Reboot usually works fine, but fails randomly in 1-5% of the
cases. The symptom is that the console stops producing any messages
at some random point in the shutdown sequence, even in the middle of a
line.

After about 7000 reboot attempts with different configurations it is
clear that enabling or disabling CONFIG_ATH9K is what makes the
difference:

 1. kernels with CONFIG_ATH9K=n never fail
 2. kernels with CONFIG_ATH9K=y do fail

Kernels built with CONFIG_ATH9K=y do fail even disabling all optional
CONFIG_ATH9K* options (rfkill, pcoem, btcoex and no_eeprom).

Similarly:

 1. removing pcie from the device tree makes reboot work
 2. leaving pcie in the device tree and removing all the peripherals
    not required for booting, reboot does fail

On top of v6.6 I have applied all the potentially related commits from
master that appear as of now (8 in total):

  git log --oneline --reverse --format=%H v6.6..origin/master -- \
      drivers/net/wireless/ath/*.[ch] drivers/net/wireless/ath/ath9k/ \
    | xargs git cherry-pick

and reboot still fails.

I have tested these mainline kernel versions, which no result: 
v6.1.60, v5.15.137, v5.10.199, v5.10.

A first look at the ath9k driver code did not show anything obviously
wrong.

Any clues about how to further investigate would be very welcome.

I am obviously available to provide more info.

Best regards,
Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Linux v6.6 sporadic reboot failures with ath9k on i.MX6Q
  2023-11-29  9:22 Linux v6.6 sporadic reboot failures with ath9k on i.MX6Q Luca Ceresoli
@ 2023-11-30  5:10 ` Florian Fainelli
  2023-12-01 15:26   ` Luca Ceresoli
  0 siblings, 1 reply; 3+ messages in thread
From: Florian Fainelli @ 2023-11-30  5:10 UTC (permalink / raw)
  To: Luca Ceresoli, linux-wireless; +Cc: imx



On 11/29/2023 1:22 AM, Luca Ceresoli wrote:
> Hello,
> 
> since several weeks I am investigating a sporadic reboot failure on a
> custom board based on i.MX6Q. There is an ATH9K Wi-Fi card connected
> over PCIe, and the main suspect is the ath9k driver.
> 
> Anybody aware of this kind of bug with ath9k?
> 
> Some details about my tests follow.
> 
> This is on mainline v6.6 Linux, with only the board dts and a defconfig
> added. The board dts itself is based on imx6q.dtsi and among others it
> adds:
> 
> &pcie {
>          pinctrl-names = "default";
>          pinctrl-0 = <&pinctrl_pcie>;
>          reset-gpio = <&gpio2 20 GPIO_ACTIVE_LOW>;
>          status = "okay";
> };
> 
> and:
> 
> &iomuxc {
> /* ... */
>          imx6qdl-sabresd {
> /* ... */
>                  pinctrl_pcie: pciegrp {
>                          fsl,pins = <
>                                  MX6QDL_PAD_EIM_A18__GPIO2_IO20  0x1b0b0
>                          >;
>                  };
> /* ... */
>          };
> };
>                                                                                                                                                                                               
> Reboot usually works fine, but fails randomly in 1-5% of the
> cases. The symptom is that the console stops producing any messages
> at some random point in the shutdown sequence, even in the middle of a
> line.
> 
> After about 7000 reboot attempts with different configurations it is
> clear that enabling or disabling CONFIG_ATH9K is what makes the
> difference:
> 
>   1. kernels with CONFIG_ATH9K=n never fail
>   2. kernels with CONFIG_ATH9K=y do fail
> 
> Kernels built with CONFIG_ATH9K=y do fail even disabling all optional
> CONFIG_ATH9K* options (rfkill, pcoem, btcoex and no_eeprom).
> 
> Similarly:
> 
>   1. removing pcie from the device tree makes reboot work
>   2. leaving pcie in the device tree and removing all the peripherals
>      not required for booting, reboot does fail
> 
> On top of v6.6 I have applied all the potentially related commits from
> master that appear as of now (8 in total):
> 
>    git log --oneline --reverse --format=%H v6.6..origin/master -- \
>        drivers/net/wireless/ath/*.[ch] drivers/net/wireless/ath/ath9k/ \
>      | xargs git cherry-pick
> 
> and reboot still fails.
> 
> I have tested these mainline kernel versions, which no result:
> v6.1.60, v5.15.137, v5.10.199, v5.10.
> 
> A first look at the ath9k driver code did not show anything obviously
> wrong.
> 
> Any clues about how to further investigate would be very welcome.
> 
> I am obviously available to provide more info.

Do you have a reboot log with "initcall_debug debug" set on the kernel 
command line and if so, does it always point to the PCI bus shutting 
down the device drivers, pcie ports and ultimately the root complex?

We have seen something similar before with ath10k_pci and our 
pcie-brcmstb driver which eventually was a result of having made 
incorrect assumptions while implementing the platform_driver::shutdown 
routine. There was a hard hang in ath10k_remove(), I do not recall the 
details, but we were definitively doing something improper there.

imx6_pcie_shutdown() seems to much simpler, but my first guess would be 
there.

Hope this helps.
-- 
Florian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Linux v6.6 sporadic reboot failures with ath9k on i.MX6Q
  2023-11-30  5:10 ` Florian Fainelli
@ 2023-12-01 15:26   ` Luca Ceresoli
  0 siblings, 0 replies; 3+ messages in thread
From: Luca Ceresoli @ 2023-12-01 15:26 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: linux-wireless, imx

Hello Florian,

On Wed, 29 Nov 2023 21:10:44 -0800
Florian Fainelli <f.fainelli@gmail.com> wrote:

> Do you have a reboot log with "initcall_debug debug" set on the kernel 
> command line and if so, does it always point to the PCI bus shutting 
> down the device drivers, pcie ports and ultimately the root complex?
> 
> We have seen something similar before with ath10k_pci and our 
> pcie-brcmstb driver which eventually was a result of having made 
> incorrect assumptions while implementing the platform_driver::shutdown 
> routine. There was a hard hang in ath10k_remove(), I do not recall the 
> details, but we were definitively doing something improper there.
> 
> imx6_pcie_shutdown() seems to much simpler, but my first guess would be 
> there.

I had attempted using initcall_debug but the hang was happening on a
different line across tests, so it did not reliably point to a specific
place. Perhaps the serial port just stopped working before being able
to flush the last few lines.

I will have the shutdown code, even though it did not seems to the
problematic.

Thank you for your hints.

Best regards,
Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-12-01 15:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-29  9:22 Linux v6.6 sporadic reboot failures with ath9k on i.MX6Q Luca Ceresoli
2023-11-30  5:10 ` Florian Fainelli
2023-12-01 15:26   ` Luca Ceresoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).