* ath9k: hwrng blocks for several minutes when phy is un-associated
@ 2022-06-23 5:08 Gregory Erwin
2022-06-23 12:14 ` Jason A. Donenfeld
0 siblings, 1 reply; 7+ messages in thread
From: Gregory Erwin @ 2022-06-23 5:08 UTC (permalink / raw)
To: Jason A. Donenfeld, Miaoqing Pan, Rui Salvaterra
Cc: ath9k-devel, linux-crypto, linux-wireless
Hello,
I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
directly dumping into random.c'' while investigating a long delay when entering
suspend on kernels v5.18 onward. There are other reports of hangs or
unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.
AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
for up to 231 seconds before finally returning an input/output error. Similarly,
I get a kernel log message "hwrng: no data available" every 231 seconds.
The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
data when the phy is transmitting.
Admittedly, I don't actually know if this blocking behavior is
expected or not, but it certainly seems undesirable.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-23 5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
@ 2022-06-23 12:14 ` Jason A. Donenfeld
2022-06-23 12:16 ` Jason A. Donenfeld
0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:14 UTC (permalink / raw)
To: Gregory Erwin
Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto,
linux-wireless
Hi Gregory,
On Wed, Jun 22, 2022 at 10:08:15PM -0700, Gregory Erwin wrote:
> Hello,
>
> I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
> directly dumping into random.c'' while investigating a long delay when entering
> suspend on kernels v5.18 onward. There are other reports of hangs or
> unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.
>
> AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
> but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
> for up to 231 seconds before finally returning an input/output error. Similarly,
> I get a kernel log message "hwrng: no data available" every 231 seconds.
>
> The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
> but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
> data when the phy is transmitting.
>
> Admittedly, I don't actually know if this blocking behavior is
> expected or not, but it certainly seems undesirable.
Thanks for the report. I wish somebody from one of those bug reports
would have emailed earlier.
I don't have hardware to test this, but could you let me know if the
below patch does something? I'm sort of guessing, but maybe this is
right?
Jason
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..a6291f5f0d47 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -80,7 +80,7 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || fail_stats > 110 || kthread_should_stop())
break;
msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-23 12:14 ` Jason A. Donenfeld
@ 2022-06-23 12:16 ` Jason A. Donenfeld
2022-06-23 23:36 ` Gregory Erwin
0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:16 UTC (permalink / raw)
To: Gregory Erwin
Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto,
linux-wireless
Or perhaps more simply:
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..5b44cd918c2b 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
if (!wait || !max || likely(bytes_read) || fail_stats > 110)
break;
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
+ if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
+ break;
}
if (wait && !bytes_read && max)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-23 12:16 ` Jason A. Donenfeld
@ 2022-06-23 23:36 ` Gregory Erwin
2022-06-23 23:47 ` Jason A. Donenfeld
0 siblings, 1 reply; 7+ messages in thread
From: Gregory Erwin @ 2022-06-23 23:36 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: Rui Salvaterra, linux-crypto, linux-wireless
No luck.
The first patch caused a warning and oops with ath9k_rng_read() at the
top of the call stack when reading from /dev/hwrng:
WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
BUG: kernel NULL pointer dereference, address: 0000000000000000
The second didn't have a noticeable effect, for better or worse.
On Thu, Jun 23, 2022 at 5:16 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Or perhaps more simply:
>
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..5b44cd918c2b 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> if (!wait || !max || likely(bytes_read) || fail_stats > 110)
> break;
>
> - msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> + if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
> + break;
> }
>
> if (wait && !bytes_read && max)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-23 23:36 ` Gregory Erwin
@ 2022-06-23 23:47 ` Jason A. Donenfeld
2022-06-24 0:01 ` Jason A. Donenfeld
0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 23:47 UTC (permalink / raw)
To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless
Hey Gregory,
On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> No luck.
>
> The first patch caused a warning and oops with ath9k_rng_read() at the
> top of the call stack when reading from /dev/hwrng:
> WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> BUG: kernel NULL pointer dereference, address: 0000000000000000
>
> The second didn't have a noticeable effect, for better or worse.
Alright. That's actually getting us somewhere. So the path in question
here is from reading /dev/hwrng, not from the kthread that's doing the
same read.
Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
you do that causes everything to hang, and then while things are hung
in the bad way, look at the contents of /proc/[the pid of the cat you
just ran]/stack?
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-23 23:47 ` Jason A. Donenfeld
@ 2022-06-24 0:01 ` Jason A. Donenfeld
2022-06-24 0:50 ` Jason A. Donenfeld
0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24 0:01 UTC (permalink / raw)
To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless
Hey again,
On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Gregory,
>
> On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > No luck.
> >
> > The first patch caused a warning and oops with ath9k_rng_read() at the
> > top of the call stack when reading from /dev/hwrng:
> > WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > The second didn't have a noticeable effect, for better or worse.
>
> Alright. That's actually getting us somewhere. So the path in question
> here is from reading /dev/hwrng, not from the kthread that's doing the
> same read.
>
> Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> you do that causes everything to hang, and then while things are hung
> in the bad way, look at the contents of /proc/[the pid of the cat you
> just ran]/stack?
There's another flow I'm interested in. You said it prevents the
system from sleeping. Does it also make a `ip link set wlan0 down`
hang too? If so, could you send the `/proc/[pid of ip link set
down]/stack ` of a hung ip process? That seems like the more relevant
deadlock to look into.
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
2022-06-24 0:01 ` Jason A. Donenfeld
@ 2022-06-24 0:50 ` Jason A. Donenfeld
0 siblings, 0 replies; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24 0:50 UTC (permalink / raw)
To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless
Hey again again,
On Fri, Jun 24, 2022 at 02:01:22AM +0200, Jason A. Donenfeld wrote:
> Hey again,
>
> On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Hey Gregory,
> >
> > On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> > >
> > > No luck.
> > >
> > > The first patch caused a warning and oops with ath9k_rng_read() at the
> > > top of the call stack when reading from /dev/hwrng:
> > > WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >
> > > The second didn't have a noticeable effect, for better or worse.
> >
> > Alright. That's actually getting us somewhere. So the path in question
> > here is from reading /dev/hwrng, not from the kthread that's doing the
> > same read.
> >
> > Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> > you do that causes everything to hang, and then while things are hung
> > in the bad way, look at the contents of /proc/[the pid of the cat you
> > just ran]/stack?
>
> There's another flow I'm interested in. You said it prevents the
> system from sleeping. Does it also make a `ip link set wlan0 down`
> hang too? If so, could you send the `/proc/[pid of ip link set
> down]/stack ` of a hung ip process? That seems like the more relevant
> deadlock to look into.
I think I have a plausible theory. I'll send a real patch, and you can
test it. Incoming shortly...
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-06-24 0:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-23 5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
2022-06-23 12:14 ` Jason A. Donenfeld
2022-06-23 12:16 ` Jason A. Donenfeld
2022-06-23 23:36 ` Gregory Erwin
2022-06-23 23:47 ` Jason A. Donenfeld
2022-06-24 0:01 ` Jason A. Donenfeld
2022-06-24 0:50 ` Jason A. Donenfeld
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).