Fedora 20 RT hangs during suspend

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fedora 20 RT hangs during suspend
@ 2014-01-23  6:27 Don Estabrook
  2014-02-13 22:54 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Don Estabrook @ 2014-01-23  6:27 UTC (permalink / raw)
  To: linux-rt-users

Hello -

On my HP Elitebook 8540w laptop (Intel i7 @ 2.67 GHz, 8 GiB RAM,
nouveau nVidia driver),
I've found that the Planet CCRMA kernels

    3.12.5-302.rt7.1.fc20.ccrma.x86_64
    3.12.6-300.rt9.1.fc20.ccrma.x86_64

seem to run fine - until I try to suspend, which causes the computer
to freeze, requiring a hard power-off to reboot.  I'm able to suspend
and resume fine with the stock F20 kernel - currently
3.12.7-300.fc20.x86_64.  Fernando Lopez-Lezcano suggested I post my
result here.

Visible behaviour: Using the GUI method in KDE (f > Leave > Sleep),
the screen lock comes on, then it switches to console mode, I see a
small number of kernel messages, and very quickly the screen shuts off
-- all good -- but then nothing more happens.  The keyboard LEDs and
fan continue to run for as long as I had patience to wait (~30 min in
one case).  I also tried "pm-suspend" from console mode (run level 3),
to see if it might either behave differently or give some more clues,
but it ends up the same.

This is an excerpt from the end of journalctl output (after
re-booting, looking at the previous boot):

> Jan 17 14:25:43 systemd[1]: Starting Network Manager Script Dispatcher Service...
> Jan 17 14:25:43 dbus-daemon[963]: dbus[963]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> Jan 17 14:25:43 dbus[963]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> Jan 17 14:25:43 systemd[1]: Started Network Manager Script Dispatcher Service.
> Jan 17 14:25:43 chronyd[975]: Can't synchronise: no reachable sources
> Jan 17 14:25:47 systemd-logind[962]: Delay lock is active but inhibitor timeout is reached.
> Jan 17 14:25:47 systemd[1]: Starting Sleep.
> Jan 17 14:25:47 systemd[1]: Reached target Sleep.
> Jan 17 14:25:47 systemd[1]: Starting Suspend...
> Jan 17 14:25:47 systemd-sleep[9388]: Suspending system...

I'd be happy to provide more information if needed - please advise if
full journalctl output and/or other logs would be useful.


It may not be relevant, but these RT kernels also throw warnings with
backtraces during boot-up, like these (again from journalctl) :

> kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()

> kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()

> kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()

I don't seem to see those issues when I look at previous boots using
the stock kernel.

Thanks,
Don

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fedora 20 RT hangs during suspend
  2014-01-23  6:27 Fedora 20 RT hangs during suspend Don Estabrook
@ 2014-02-13 22:54 ` Thomas Gleixner
  2014-02-20 16:15   ` Don Estabrook
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2014-02-13 22:54 UTC (permalink / raw)
  To: Don Estabrook; +Cc: linux-rt-users

On Thu, 23 Jan 2014, Don Estabrook wrote:
> > kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
> 
> > kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
> 
> > kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
> 

These are definitely interesting. Please provide the plain
non-lennartized output from

# dmesg

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fedora 20 RT hangs during suspend
  2014-02-13 22:54 ` Thomas Gleixner
@ 2014-02-20 16:15   ` Don Estabrook
  2014-02-21 16:36     ` [PATCH RT] crypto: Reduce preempt disabled regions, more algos Sebastian Andrzej Siewior
  0 siblings, 1 reply; 4+ messages in thread
From: Don Estabrook @ 2014-02-20 16:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On 2014-02-13 16:54 -0600, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 23 Jan 2014, Don Estabrook wrote:
>> > kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
>>
>> > kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
>>
>> > kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
>>
>
> These are definitely interesting. Please provide the plain
> non-lennartized output from
>
> # dmesg

Apologies for the delay - finally had an opportunity to reboot.
Output is attached.  This is actually from the latest CCRMA RT kernel

    3.12.11-300.rt17.1.fc20.ccrma.x86_64+rt

rather than the ones that I originally posted about, but the behaviour
seems to be the same/similar with all three - including the hang when
I try to suspend.

> Thanks,
>
>         tglx

Thank you for your interest!
- Don

[-- Attachment #2: 3.12.11-300.rt17.1.fc20.ccrma.x86_64+rt.startup.dmesg.gz --]
[-- Type: application/x-gzip, Size: 24685 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RT] crypto: Reduce preempt disabled regions, more algos
  2014-02-20 16:15   ` Don Estabrook
@ 2014-02-21 16:36     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-21 16:36 UTC (permalink / raw)
  To: Don Estabrook; +Cc: Thomas Gleixner, linux-rt-users, linux-kernel

Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()

and his backtrace showed some crypto functions which looked fine.

The problem is the following sequence:

glue_xts_crypt_128bit()
{
	blkcipher_walk_virt(); /* normal migrate_disable() */

	glue_fpu_begin(); /* get atomic */

	while (nbytes) {
		__glue_xts_crypt_128bit();
		blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
					* while we are atomic */
	};
	glue_fpu_end() /* no longer atomic */
}

and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.

Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/crypto/cast5_avx_glue.c | 21 +++++++++------------
 arch/x86/crypto/glue_helper.c    | 31 +++++++++++++++----------------
 2 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index c663181..2d48e83 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -60,7 +60,7 @@ static inline void cast5_fpu_end(bool fpu_enabled)
 static int ecb_crypt(struct blkcipher_desc *desc, struct blkcipher_walk *walk,
 		     bool enc)
 {
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct cast5_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
 	const unsigned int bsize = CAST5_BLOCK_SIZE;
 	unsigned int nbytes;
@@ -76,7 +76,7 @@ static int ecb_crypt(struct blkcipher_desc *desc, struct blkcipher_walk *walk,
 		u8 *wsrc = walk->src.virt.addr;
 		u8 *wdst = walk->dst.virt.addr;
 
-		fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
+		fpu_enabled = cast5_fpu_begin(false, nbytes);
 
 		/* Process multi-block batch */
 		if (nbytes >= bsize * CAST5_PARALLEL_BLOCKS) {
@@ -104,10 +104,9 @@ static int ecb_crypt(struct blkcipher_desc *desc, struct blkcipher_walk *walk,
 		} while (nbytes >= bsize);
 
 done:
+		cast5_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, walk, nbytes);
 	}
-
-	cast5_fpu_end(fpu_enabled);
 	return err;
 }
 
@@ -231,7 +230,7 @@ static unsigned int __cbc_decrypt(struct blkcipher_desc *desc,
 static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 		       struct scatterlist *src, unsigned int nbytes)
 {
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct blkcipher_walk walk;
 	int err;
 
@@ -240,12 +239,11 @@ static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 	desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
 
 	while ((nbytes = walk.nbytes)) {
-		fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
+		fpu_enabled = cast5_fpu_begin(false, nbytes);
 		nbytes = __cbc_decrypt(desc, &walk);
+		cast5_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, &walk, nbytes);
 	}
-
-	cast5_fpu_end(fpu_enabled);
 	return err;
 }
 
@@ -315,7 +313,7 @@ static unsigned int __ctr_crypt(struct blkcipher_desc *desc,
 static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 		     struct scatterlist *src, unsigned int nbytes)
 {
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct blkcipher_walk walk;
 	int err;
 
@@ -324,13 +322,12 @@ static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 	desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
 
 	while ((nbytes = walk.nbytes) >= CAST5_BLOCK_SIZE) {
-		fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
+		fpu_enabled = cast5_fpu_begin(false, nbytes);
 		nbytes = __ctr_crypt(desc, &walk);
+		cast5_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, &walk, nbytes);
 	}
 
-	cast5_fpu_end(fpu_enabled);
-
 	if (walk.nbytes) {
 		ctr_crypt_final(desc, &walk);
 		err = blkcipher_walk_done(desc, &walk, 0);
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index 432f1d76..4a2bd21 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -39,7 +39,7 @@ static int __glue_ecb_crypt_128bit(const struct common_glue_ctx *gctx,
 	void *ctx = crypto_blkcipher_ctx(desc->tfm);
 	const unsigned int bsize = 128 / 8;
 	unsigned int nbytes, i, func_bytes;
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	int err;
 
 	err = blkcipher_walk_virt(desc, walk);
@@ -49,7 +49,7 @@ static int __glue_ecb_crypt_128bit(const struct common_glue_ctx *gctx,
 		u8 *wdst = walk->dst.virt.addr;
 
 		fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
-					     desc, fpu_enabled, nbytes);
+					     desc, false, nbytes);
 
 		for (i = 0; i < gctx->num_funcs; i++) {
 			func_bytes = bsize * gctx->funcs[i].num_blocks;
@@ -71,10 +71,10 @@ static int __glue_ecb_crypt_128bit(const struct common_glue_ctx *gctx,
 		}
 
 done:
+		glue_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, walk, nbytes);
 	}
 
-	glue_fpu_end(fpu_enabled);
 	return err;
 }
 
@@ -194,7 +194,7 @@ int glue_cbc_decrypt_128bit(const struct common_glue_ctx *gctx,
 			    struct scatterlist *src, unsigned int nbytes)
 {
 	const unsigned int bsize = 128 / 8;
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct blkcipher_walk walk;
 	int err;
 
@@ -203,12 +203,12 @@ int glue_cbc_decrypt_128bit(const struct common_glue_ctx *gctx,
 
 	while ((nbytes = walk.nbytes)) {
 		fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
-					     desc, fpu_enabled, nbytes);
+					     desc, false, nbytes);
 		nbytes = __glue_cbc_decrypt_128bit(gctx, desc, &walk);
+		glue_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, &walk, nbytes);
 	}
 
-	glue_fpu_end(fpu_enabled);
 	return err;
 }
 EXPORT_SYMBOL_GPL(glue_cbc_decrypt_128bit);
@@ -278,7 +278,7 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx *gctx,
 			  struct scatterlist *src, unsigned int nbytes)
 {
 	const unsigned int bsize = 128 / 8;
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct blkcipher_walk walk;
 	int err;
 
@@ -287,13 +287,12 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx *gctx,
 
 	while ((nbytes = walk.nbytes) >= bsize) {
 		fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
-					     desc, fpu_enabled, nbytes);
+					     desc, false, nbytes);
 		nbytes = __glue_ctr_crypt_128bit(gctx, desc, &walk);
+		glue_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, &walk, nbytes);
 	}
 
-	glue_fpu_end(fpu_enabled);
-
 	if (walk.nbytes) {
 		glue_ctr_crypt_final_128bit(
 			gctx->funcs[gctx->num_funcs - 1].fn_u.ctr, desc, &walk);
@@ -348,7 +347,7 @@ int glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
 			  void *tweak_ctx, void *crypt_ctx)
 {
 	const unsigned int bsize = 128 / 8;
-	bool fpu_enabled = false;
+	bool fpu_enabled;
 	struct blkcipher_walk walk;
 	int err;
 
@@ -361,21 +360,21 @@ int glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
 
 	/* set minimum length to bsize, for tweak_fn */
 	fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
-				     desc, fpu_enabled,
+				     desc, false,
 				     nbytes < bsize ? bsize : nbytes);
-
 	/* calculate first value of T */
 	tweak_fn(tweak_ctx, walk.iv, walk.iv);
+	glue_fpu_end(fpu_enabled);
 
 	while (nbytes) {
+		fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+				desc, false, nbytes);
 		nbytes = __glue_xts_crypt_128bit(gctx, crypt_ctx, desc, &walk);
 
+		glue_fpu_end(fpu_enabled);
 		err = blkcipher_walk_done(desc, &walk, nbytes);
 		nbytes = walk.nbytes;
 	}
-
-	glue_fpu_end(fpu_enabled);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(glue_xts_crypt_128bit);
-- 
1.9.0.rc3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-21 16:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-23  6:27 Fedora 20 RT hangs during suspend Don Estabrook
2014-02-13 22:54 ` Thomas Gleixner
2014-02-20 16:15   ` Don Estabrook
2014-02-21 16:36     ` [PATCH RT] crypto: Reduce preempt disabled regions, more algos Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).