From: Waiman Long <waiman.long@hpe.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Waiman Long <Waiman.Long@hpe.com>, Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
<linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Scott J Norton <scott.norton@hpe.com>,
Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH] random: Fix kernel panic due to system_wq use before init
Date: Wed, 14 Sep 2016 15:14:49 -0400 [thread overview]
Message-ID: <57D9A1A9.8050506@hpe.com> (raw)
In-Reply-To: <1473879781-23819-1-git-send-email-Waiman.Long@hpe.com>
On 09/14/2016 03:03 PM, Waiman Long wrote:
> While booting a 4.8-rc6 kernel on a 16-socket 768-thread Broadwell-EX
> system, the kernel panic'ed with the following log:
>
> [ 51.837010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
> [ 51.845635] IP: [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [ 52.004366] Call Trace:
> [ 52.007053]<IRQ>
> [ 52.009171] [<ffffffff810a4de7>] queue_work_on+0x27/0x40
> [ 52.015306] [<ffffffff8146ebd7>] credit_entropy_bits+0x1d7/0x2a0
> [ 52.022002] [<ffffffff8146f339>] ? add_interrupt_randomness+0x1b9/0x210
> [ 52.029366] [<ffffffff8146f339>] add_interrupt_randomness+0x1b9/0x210
> [ 52.036544] [<ffffffff810ea670>] handle_irq_event_percpu+0x40/0x80
> [ 52.043430] [<ffffffff810ea6eb>] handle_irq_event+0x3b/0x60
> [ 52.049655] [<ffffffff810ede48>] handle_level_irq+0x88/0x100
> [ 52.055968] [<ffffffff8103004b>] handle_irq+0xab/0x130
> [ 52.061708] [<ffffffff81092861>] ? _local_bh_enable+0x21/0x50
> [ 52.068125] [<ffffffff81037bf5>] ? __exit_idle+0x5/0x30
> [ 52.073965] [<ffffffff816fc9ed>] do_IRQ+0x4d/0xd0
> [ 52.079229] [<ffffffff816fa88c>] common_interrupt+0x8c/0x8c
> [ 52.085444]<EOI>
> [ 52.087568] [<ffffffff8106bea9>] ? try_to_free_pmd_page+0x9/0x40
> [ 52.094462] [<ffffffff8106bea5>] ? try_to_free_pmd_page+0x5/0x40
> [ 52.101157] [<ffffffff8106bf2a>] ? __unmap_pmd_range.part.5+0x4a/0x70
> [ 52.108330] [<ffffffff8106c330>] unmap_pmd_range+0x130/0x250
> [ 52.114644] [<ffffffff8106cb7b>] __cpa_process_fault+0x47b/0x5a0
> [ 52.121339] [<ffffffff8106d7cb>] __change_page_attr+0x78b/0x9e0
> [ 52.127946] [<ffffffff81065e45>] ? __raw_callee_save___native_queued_spin_unlock+0x15/0x30
> [ 52.137124] [<ffffffff8106da98>] __change_page_attr_set_clr+0x78/0x300
> [ 52.144404] [<ffffffff81229180>] ? __slab_alloc+0x4d/0x5c
> [ 52.150436] [<ffffffff8106f15f>] kernel_map_pages_in_pgd+0x8f/0xd0
> [ 52.157333] [<ffffffff81dc0fb0>] efi_setup_page_tables+0xcc/0x1d9
> [ 52.164124] [<ffffffff81dc0a8d>] efi_enter_virtual_mode+0x35e/0x4af
> [ 52.171117] [<ffffffff81d9e109>] start_kernel+0x41f/0x4c8
> [ 52.177142] [<ffffffff81d9dad8>] ? set_init_arg+0x55/0x55
> [ 52.183168] [<ffffffff81d9d120>] ? early_idt_handler_array+0x120/0x120
> [ 52.190440] [<ffffffff81d9d5d6>] x86_64_start_reservations+0x2a/0x2c
> [ 52.197516] [<ffffffff81d9d724>] x86_64_start_kernel+0x14c/0x16f
> [ 52.204214] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 a0 a7 c2 81 f6 c4 02 0f 85 0d 03 00 00<41> f6 86 02 01 00 00 01 0f 85 ae 02 00 00 49 c7 c7 18 41 01 00
> [ 52.225516] RIP [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [ 52.231838] RSP<ffff88bd7f403de8>
> [ 52.235667] CR2: 0000000000000102
> [ 52.239667] ---[ end trace 2ee7ea9d2908eb72 ]---
> [ 52.244743] Kernel panic - not syncing: Fatal exception in interrupt
>
> Looking at the panic'ed instruction indicated that system_wq was
> used by credit_entropy_bits() before it it was initialized in an
> early_initcall.
>
> This patch prevents the schedule_work() call from being made before
> system_wq is initialized.
>
> Signed-off-by: Waiman Long<Waiman.Long@hpe.com>
> ---
> drivers/char/random.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3efb3bf..3afc519 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -730,8 +730,12 @@ retry:
> r->entropy_total>= 2*random_read_wakeup_bits) {
> struct entropy_store *other =&blocking_pool;
>
> - if (other->entropy_count<=
> - 3 * other->poolinfo->poolfracbits / 4) {
> + /*
> + * We cannot call schedule_work() before system_wq
> + * is initialized.
> + */
> + if (system_wq&& (other->entropy_count<=
> + 3 * other->poolinfo->poolfracbits / 4)) {
> schedule_work(&other->push_work);
> r->entropy_total = 0;
> }
This patch fixed the kernel panic, but the test system still seemed to
hang after the following log messages:
[ 0.276735] random: fast init done
[ 6.230775] random: crng init done
In the stack backtrace above, the kernel hadn't even reached SMP boot
after about 50s. That was extremely slow. I tried the 4.7.3 kernel and
it booted up fine. So I suspect that there may be too many interrupts
going on and it consumes most of the CPU cycles. The prime suspect is
the random driver, I think.
I would like to hear your thought on that.
Cheers,
Longman
next prev parent reply other threads:[~2016-09-14 19:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-14 19:03 [PATCH] random: Fix kernel panic due to system_wq use before init Waiman Long
2016-09-14 19:14 ` Linus Torvalds
2016-09-14 19:24 ` Waiman Long
2016-09-14 19:55 ` Tejun Heo
2016-09-14 22:26 ` Tejun Heo
2016-09-14 19:14 ` Waiman Long [this message]
2016-09-14 19:19 ` Linus Torvalds
2016-09-14 19:34 ` Waiman Long
2016-09-14 21:06 ` Linus Torvalds
2016-09-14 22:15 ` Waiman Long
2016-09-19 3:09 ` Waiman Long
2016-09-19 9:25 ` Matt Fleming
2016-09-19 12:43 ` Matt Fleming
2016-09-19 14:48 ` Waiman Long
2016-09-19 14:51 ` Matt Fleming
2016-09-19 17:09 ` Waiman Long
2016-09-20 14:04 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57D9A1A9.8050506@hpe.com \
--to=waiman.long@hpe.com \
--cc=arnd@arndb.de \
--cc=doug.hatch@hpe.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=scott.norton@hpe.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.