All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hpe.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Waiman Long <Waiman.Long@hpe.com>, Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	<linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH] random: Fix kernel panic due to system_wq use before init
Date: Wed, 14 Sep 2016 15:14:49 -0400	[thread overview]
Message-ID: <57D9A1A9.8050506@hpe.com> (raw)
In-Reply-To: <1473879781-23819-1-git-send-email-Waiman.Long@hpe.com>

On 09/14/2016 03:03 PM, Waiman Long wrote:
> While booting a 4.8-rc6 kernel on a 16-socket 768-thread Broadwell-EX
> system, the kernel panic'ed with the following log:
>
> [   51.837010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
> [   51.845635] IP: [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [   52.004366] Call Trace:
> [   52.007053]<IRQ>
> [   52.009171]  [<ffffffff810a4de7>] queue_work_on+0x27/0x40
> [   52.015306]  [<ffffffff8146ebd7>] credit_entropy_bits+0x1d7/0x2a0
> [   52.022002]  [<ffffffff8146f339>] ?  add_interrupt_randomness+0x1b9/0x210
> [   52.029366]  [<ffffffff8146f339>] add_interrupt_randomness+0x1b9/0x210
> [   52.036544]  [<ffffffff810ea670>] handle_irq_event_percpu+0x40/0x80
> [   52.043430]  [<ffffffff810ea6eb>] handle_irq_event+0x3b/0x60
> [   52.049655]  [<ffffffff810ede48>] handle_level_irq+0x88/0x100
> [   52.055968]  [<ffffffff8103004b>] handle_irq+0xab/0x130
> [   52.061708]  [<ffffffff81092861>] ? _local_bh_enable+0x21/0x50
> [   52.068125]  [<ffffffff81037bf5>] ? __exit_idle+0x5/0x30
> [   52.073965]  [<ffffffff816fc9ed>] do_IRQ+0x4d/0xd0
> [   52.079229]  [<ffffffff816fa88c>] common_interrupt+0x8c/0x8c
> [   52.085444]<EOI>
> [   52.087568]  [<ffffffff8106bea9>] ? try_to_free_pmd_page+0x9/0x40
> [   52.094462]  [<ffffffff8106bea5>] ? try_to_free_pmd_page+0x5/0x40
> [   52.101157]  [<ffffffff8106bf2a>] ?  __unmap_pmd_range.part.5+0x4a/0x70
> [   52.108330]  [<ffffffff8106c330>] unmap_pmd_range+0x130/0x250
> [   52.114644]  [<ffffffff8106cb7b>] __cpa_process_fault+0x47b/0x5a0
> [   52.121339]  [<ffffffff8106d7cb>] __change_page_attr+0x78b/0x9e0
> [   52.127946]  [<ffffffff81065e45>] ?  __raw_callee_save___native_queued_spin_unlock+0x15/0x30
> [   52.137124]  [<ffffffff8106da98>] __change_page_attr_set_clr+0x78/0x300
> [   52.144404]  [<ffffffff81229180>] ? __slab_alloc+0x4d/0x5c
> [   52.150436]  [<ffffffff8106f15f>] kernel_map_pages_in_pgd+0x8f/0xd0
> [   52.157333]  [<ffffffff81dc0fb0>] efi_setup_page_tables+0xcc/0x1d9
> [   52.164124]  [<ffffffff81dc0a8d>] efi_enter_virtual_mode+0x35e/0x4af
> [   52.171117]  [<ffffffff81d9e109>] start_kernel+0x41f/0x4c8
> [   52.177142]  [<ffffffff81d9dad8>] ? set_init_arg+0x55/0x55
> [   52.183168]  [<ffffffff81d9d120>] ?  early_idt_handler_array+0x120/0x120
> [   52.190440]  [<ffffffff81d9d5d6>] x86_64_start_reservations+0x2a/0x2c
> [   52.197516]  [<ffffffff81d9d724>] x86_64_start_kernel+0x14c/0x16f
> [   52.204214] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 a0 a7 c2 81 f6 c4 02 0f 85 0d 03 00 00<41>  f6 86 02 01 00 00 01 0f 85 ae 02 00 00 49 c7 c7 18 41 01 00
> [   52.225516] RIP  [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [   52.231838]  RSP<ffff88bd7f403de8>
> [   52.235667] CR2: 0000000000000102
> [   52.239667] ---[ end trace 2ee7ea9d2908eb72 ]---
> [   52.244743] Kernel panic - not syncing: Fatal exception in interrupt
>
> Looking at the panic'ed instruction indicated that system_wq was
> used by credit_entropy_bits() before it it was initialized in an
> early_initcall.
>
> This patch prevents the schedule_work() call from being made before
> system_wq is initialized.
>
> Signed-off-by: Waiman Long<Waiman.Long@hpe.com>
> ---
>   drivers/char/random.c |    8 ++++++--
>   1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3efb3bf..3afc519 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -730,8 +730,12 @@ retry:
>   		    r->entropy_total>= 2*random_read_wakeup_bits) {
>   			struct entropy_store *other =&blocking_pool;
>
> -			if (other->entropy_count<=
> -			    3 * other->poolinfo->poolfracbits / 4) {
> +			/*
> +			 * We cannot call schedule_work() before system_wq
> +			 * is initialized.
> +			 */
> +			if (system_wq&&  (other->entropy_count<=
> +			    3 * other->poolinfo->poolfracbits / 4)) {
>   				schedule_work(&other->push_work);
>   				r->entropy_total = 0;
>   			}

This patch fixed the kernel panic, but the test system still seemed to 
hang after the following log messages:

[    0.276735] random: fast init done
[    6.230775] random: crng init done

In the stack backtrace above, the kernel hadn't even reached SMP boot 
after about 50s. That was extremely slow. I tried the 4.7.3 kernel and 
it booted up fine. So I suspect that there may be too many interrupts 
going on and it consumes most of the CPU cycles. The prime suspect is 
the random driver, I think.

I would like to hear your thought on that.

Cheers,
Longman

  parent reply	other threads:[~2016-09-14 19:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 19:03 [PATCH] random: Fix kernel panic due to system_wq use before init Waiman Long
2016-09-14 19:14 ` Linus Torvalds
2016-09-14 19:24   ` Waiman Long
2016-09-14 19:55   ` Tejun Heo
2016-09-14 22:26     ` Tejun Heo
2016-09-14 19:14 ` Waiman Long [this message]
2016-09-14 19:19   ` Linus Torvalds
2016-09-14 19:34     ` Waiman Long
2016-09-14 21:06       ` Linus Torvalds
2016-09-14 22:15         ` Waiman Long
2016-09-19  3:09     ` Waiman Long
2016-09-19  9:25       ` Matt Fleming
2016-09-19 12:43       ` Matt Fleming
2016-09-19 14:48         ` Waiman Long
2016-09-19 14:51           ` Matt Fleming
2016-09-19 17:09             ` Waiman Long
2016-09-20 14:04               ` Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57D9A1A9.8050506@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=arnd@arndb.de \
    --cc=doug.hatch@hpe.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott.norton@hpe.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.