linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ath9k: hwrng blocks for several minutes when phy is un-associated
@ 2022-06-23  5:08 Gregory Erwin
  2022-06-23 12:14 ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Erwin @ 2022-06-23  5:08 UTC (permalink / raw)
  To: Jason A. Donenfeld, Miaoqing Pan, Rui Salvaterra
  Cc: ath9k-devel, linux-crypto, linux-wireless

Hello,

I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
directly dumping into random.c'' while investigating a long delay when entering
suspend on kernels v5.18 onward. There are other reports of hangs or
unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.

AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
for up to 231 seconds before finally returning an input/output error. Similarly,
I get a kernel log message "hwrng: no data available" every 231 seconds.

The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
data when the phy is transmitting.

Admittedly, I don't actually know if this blocking behavior is
expected or not, but it certainly seems undesirable.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23  5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
@ 2022-06-23 12:14 ` Jason A. Donenfeld
  2022-06-23 12:16   ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:14 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto,
	linux-wireless

Hi Gregory,

On Wed, Jun 22, 2022 at 10:08:15PM -0700, Gregory Erwin wrote:
> Hello,
> 
> I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
> directly dumping into random.c'' while investigating a long delay when entering
> suspend on kernels v5.18 onward. There are other reports of hangs or
> unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.
> 
> AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
> but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
> for up to 231 seconds before finally returning an input/output error. Similarly,
> I get a kernel log message "hwrng: no data available" every 231 seconds.
> 
> The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
> but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
> data when the phy is transmitting.
> 
> Admittedly, I don't actually know if this blocking behavior is
> expected or not, but it certainly seems undesirable.

Thanks for the report. I wish somebody from one of those bug reports
would have emailed earlier.

I don't have hardware to test this, but could you let me know if the
below patch does something? I'm sort of guessing, but maybe this is
right?

Jason

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..a6291f5f0d47 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -80,7 +80,7 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || fail_stats > 110 || kthread_should_stop())
 			break;

 		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 12:14 ` Jason A. Donenfeld
@ 2022-06-23 12:16   ` Jason A. Donenfeld
  2022-06-23 23:36     ` Gregory Erwin
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:16 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto,
	linux-wireless

Or perhaps more simply:

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..5b44cd918c2b 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
 			break;

-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
+		if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
+			break;
 	}

 	if (wait && !bytes_read && max)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 12:16   ` Jason A. Donenfeld
@ 2022-06-23 23:36     ` Gregory Erwin
  2022-06-23 23:47       ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Erwin @ 2022-06-23 23:36 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Rui Salvaterra, linux-crypto, linux-wireless

No luck.

The first patch caused a warning and oops with ath9k_rng_read() at the
top of the call stack when reading from /dev/hwrng:
  WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
  BUG: kernel NULL pointer dereference, address: 0000000000000000

The second didn't have a noticeable effect, for better or worse.


On Thu, Jun 23, 2022 at 5:16 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Or perhaps more simply:
>
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..5b44cd918c2b 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>                 if (!wait || !max || likely(bytes_read) || fail_stats > 110)
>                         break;
>
> -               msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> +               if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
> +                       break;
>         }
>
>         if (wait && !bytes_read && max)
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 23:36     ` Gregory Erwin
@ 2022-06-23 23:47       ` Jason A. Donenfeld
  2022-06-24  0:01         ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 23:47 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey Gregory,

On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> No luck.
>
> The first patch caused a warning and oops with ath9k_rng_read() at the
> top of the call stack when reading from /dev/hwrng:
>   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>
> The second didn't have a noticeable effect, for better or worse.

Alright. That's actually getting us somewhere. So the path in question
here is from reading /dev/hwrng, not from the kthread that's doing the
same read.

Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
you do that causes everything to hang, and then while things are hung
in the bad way, look at the contents of /proc/[the pid of the cat you
just ran]/stack?

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 23:47       ` Jason A. Donenfeld
@ 2022-06-24  0:01         ` Jason A. Donenfeld
  2022-06-24  0:50           ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24  0:01 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey again,

On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Gregory,
>
> On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > No luck.
> >
> > The first patch caused a warning and oops with ath9k_rng_read() at the
> > top of the call stack when reading from /dev/hwrng:
> >   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > The second didn't have a noticeable effect, for better or worse.
>
> Alright. That's actually getting us somewhere. So the path in question
> here is from reading /dev/hwrng, not from the kthread that's doing the
> same read.
>
> Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> you do that causes everything to hang, and then while things are hung
> in the bad way, look at the contents of /proc/[the pid of the cat you
> just ran]/stack?

There's another flow I'm interested in. You said it prevents the
system from sleeping. Does it also make a `ip link set wlan0 down`
hang too? If so, could you send the `/proc/[pid of ip link set
down]/stack ` of a hung ip process? That seems like the more relevant
deadlock to look into.

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-24  0:01         ` Jason A. Donenfeld
@ 2022-06-24  0:50           ` Jason A. Donenfeld
  0 siblings, 0 replies; 7+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24  0:50 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey again again,

On Fri, Jun 24, 2022 at 02:01:22AM +0200, Jason A. Donenfeld wrote:
> Hey again,
> 
> On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Hey Gregory,
> >
> > On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> > >
> > > No luck.
> > >
> > > The first patch caused a warning and oops with ath9k_rng_read() at the
> > > top of the call stack when reading from /dev/hwrng:
> > >   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >
> > > The second didn't have a noticeable effect, for better or worse.
> >
> > Alright. That's actually getting us somewhere. So the path in question
> > here is from reading /dev/hwrng, not from the kthread that's doing the
> > same read.
> >
> > Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> > you do that causes everything to hang, and then while things are hung
> > in the bad way, look at the contents of /proc/[the pid of the cat you
> > just ran]/stack?
> 
> There's another flow I'm interested in. You said it prevents the
> system from sleeping. Does it also make a `ip link set wlan0 down`
> hang too? If so, could you send the `/proc/[pid of ip link set
> down]/stack ` of a hung ip process? That seems like the more relevant
> deadlock to look into.

I think I have a plausible theory. I'll send a real patch, and you can
test it. Incoming shortly...

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-24  0:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-23  5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
2022-06-23 12:14 ` Jason A. Donenfeld
2022-06-23 12:16   ` Jason A. Donenfeld
2022-06-23 23:36     ` Gregory Erwin
2022-06-23 23:47       ` Jason A. Donenfeld
2022-06-24  0:01         ` Jason A. Donenfeld
2022-06-24  0:50           ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).