All of lore.kernel.org
 help / color / mirror / Atom feed
From: baiyaowei <baiyaowei@cmss.chinamobile.com>
To: tj@kernel.org, jiangshanlai@gmail.com
Cc: akpm@linux-foundation.org, baiyaowei@cmss.chinamobile.com,
	linux-kernel@vger.kernel.org
Subject: NULL pointer dereference in process_one_work
Date: Fri, 24 Nov 2017 02:23:44 -0500	[thread overview]
Message-ID: <20171124072344.GA313@byw> (raw)

Hi,tj and jiangshan,

I build a ceph storage pool to run some benchmarks with 3.10 kernel.
Occasionally, when the cpus' load is very high, some nodes crash with
message below.

[292273.612014] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[292273.612057] IP: [<ffffffff8109d4b1>] process_one_work+0x31/0x470
[292273.612087] PGD 0 
[292273.612099] Oops: 0000 [#1] SMP 
[292273.612117] Modules linked in: rbd(OE) bcache(OE) ip_vs xfs
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter bonding
intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper
cryptd mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf pcspkr
ipmi_ssif mei_me sg lpc_ich mei sb_edac ipmi_si mfd_core edac_core
ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl
lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif
crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit
drm_kms_helper
[292273.612495]  crct10dif_pclmul crct10dif_common ttm crc32c_intel drm
ahci nvme bnx2x libahci i2c_core libata mdio libcrc32c megaraid_sas ptp
pps_core dm_mirror dm_region_hash dm_log dm_mod
[292273.612580] CPU: 16 PID: 353223 Comm: kworker/16:2 Tainted: G
OE  ------------   3.10.0-327.el7.x86_64 #1
[292273.612620] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS
2.4.3 01/17/2017
[292273.612655] task: ffff8801f55e6780 ti: ffff882a199b0000 task.ti:
ffff882a199b0000
[292273.612685] RIP: 0010:[<ffffffff8109d4b1>]  [<ffffffff8109d4b1>]
process_one_work+0x31/0x470
[292273.612721] RSP: 0018:ffff882a199b3e28  EFLAGS: 00010046
[292273.612743] RAX: 0000000000000000 RBX: ffff88088b273028 RCX:
ffff882a199b3fd8
[292273.612771] RDX: 0000000000000000 RSI: ffff88088b273028 RDI:
ffff88088b273000
[292273.612799] RBP: ffff882a199b3e60 R08: 0000000000000000 R09:
0000000000000770
[292273.612827] R10: ffff8822a3bb1f80 R11: ffff8822a3bb1f80 R12:
ffff88088b273000
[292273.612855] R13: ffff881fff313fc0 R14: 0000000000000000 R15:
ffff881fff313fc0
[292273.612883] FS:  0000000000000000(0000) GS:ffff881fff300000(0000)
knlGS:0000000000000000
[292273.612914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[292273.612937] CR2: 00000000000000b8 CR3: 000000000194a000 CR4:
00000000003407e0
[292273.612965] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[292273.612994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[292273.613021] Stack:
[292273.613031]  00000000ff313fd8 0000000000000000 ffff881fff313fd8
000188088b273030
[292273.613069]  ffff8801f55e6780 ffff88088b273000 ffff881fff313fc0
ffff882a199b3ec0
[292273.613108]  ffffffff8109e4cc ffff882a199b3fd8 ffff882a199b3fd8
ffff8801f55e6780
[292273.613146] Call Trace:
[292273.613160]  [<ffffffff8109e4cc>] worker_thread+0x21c/0x400
[292273.613185]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[292273.613212]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[292273.613234]  [<ffffffff810a5a20>] ?
kthread_create_on_node+0x140/0x140
[292273.613263]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[292273.613287]  [<ffffffff810a5a20>] ?
kthread_create_on_node+0x140/0x140
[292273.614303] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc
53 48 89 f3 48 83 ec 10 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f
45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01 
[292273.617971] RIP  [<ffffffff8109d4b1>] process_one_work+0x31/0x470
[292273.620011]  RSP <ffff882a199b3e28>
[292273.621940] CR2: 0000000000000008

Some crash messsages:

crash> sys
      KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Wed Oct 18 05:21:14 2017
      UPTIME: 3 days, 09:07:25
LOAD AVERAGE: 221.70, 222.22, 224.96
       TASKS: 3115
    NODENAME: node121
     RELEASE: 3.10.0-327.el7.x86_64
     VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
     MACHINE: x86_64  (2099 Mhz)
      MEMORY: 255.9 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0000000000000008"
crash> bt
PID: 353223  TASK: ffff8801f55e6780  CPU: 16  COMMAND: "kworker/16:2"
 #0 [ffff882a199b3af0] machine_kexec at ffffffff81051beb
 #1 [ffff882a199b3b50] crash_kexec at ffffffff810f2542
 #2 [ffff882a199b3c20] oops_end at ffffffff8163e1a8
 #3 [ffff882a199b3c48] no_context at ffffffff8162e2b8
 #4 [ffff882a199b3c98] __bad_area_nosemaphore at ffffffff8162e34e
 #5 [ffff882a199b3ce0] bad_area_nosemaphore at ffffffff8162e4b8
 #6 [ffff882a199b3cf0] __do_page_fault at ffffffff81640fce
 #7 [ffff882a199b3d48] do_page_fault at ffffffff81641113
 #8 [ffff882a199b3d70] page_fault at ffffffff8163d408
    [exception RIP: process_one_work+49]
    RIP: ffffffff8109d4b1  RSP: ffff882a199b3e28  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffff88088b273028  RCX: ffff882a199b3fd8
    RDX: 0000000000000000  RSI: ffff88088b273028  RDI: ffff88088b273000
    RBP: ffff882a199b3e60   R8: 0000000000000000   R9: 0000000000000770
    R10: ffff8822a3bb1f80  R11: ffff8822a3bb1f80  R12: ffff88088b273000
    R13: ffff881fff313fc0  R14: 0000000000000000  R15: ffff881fff313fc0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff882a199b3e68] worker_thread at ffffffff8109e4cc
#10 [ffff882a199b3ec8] kthread at ffffffff810a5aef
#11 [ffff882a199b3f50] ret_from_fork at ffffffff81645858
crash> dis process_one_work
0xffffffff8109d480 <process_one_work>:  nopl   0x0(%rax,%rax,1) [FTRACE
NOP]
0xffffffff8109d485 <process_one_work+5>:        push   %rbp
0xffffffff8109d486 <process_one_work+6>:        mov    %rsp,%rbp
0xffffffff8109d489 <process_one_work+9>:        push   %r15
0xffffffff8109d48b <process_one_work+11>:       push   %r14
0xffffffff8109d48d <process_one_work+13>:       xor    %r14d,%r14d
0xffffffff8109d490 <process_one_work+16>:       push   %r13
0xffffffff8109d492 <process_one_work+18>:       push   %r12
0xffffffff8109d494 <process_one_work+20>:       mov    %rdi,%r12
0xffffffff8109d497 <process_one_work+23>:       push   %rbx
0xffffffff8109d498 <process_one_work+24>:       mov    %rsi,%rbx
0xffffffff8109d49b <process_one_work+27>:       sub    $0x10,%rsp
0xffffffff8109d49f <process_one_work+31>:       mov    (%rsi),%rax
0xffffffff8109d4a2 <process_one_work+34>:       mov    0x48(%rdi),%r13
0xffffffff8109d4a6 <process_one_work+38>:       mov    %rax,%rdx
0xffffffff8109d4a9 <process_one_work+41>:       xor    %dl,%dl
0xffffffff8109d4ab <process_one_work+43>:       test   $0x4,%al
0xffffffff8109d4ad <process_one_work+45>:       cmovne %rdx,%r14
0xffffffff8109d4b1 <process_one_work+49>:       mov    0x8(%r14),%rax
0xffffffff8109d4b5 <process_one_work+53>:       mov    0x100(%rax),%r15d
0xffffffff8109d4bc <process_one_work+60>:       shr    $0x5,%r15d
0xffffffff8109d4c0 <process_one_work+64>:       mov    %r15d,%eax
0xffffffff8109d4c3 <process_one_work+67>:       and    $0x1,%eax
0xffffffff8109d4c6 <process_one_work+70>:       testb  $0x80,0x58(%rdi)
0xffffffff8109d4ca <process_one_work+74>:       mov    %al,-0x30(%rbp)
0xffffffff8109d4cd <process_one_work+77>:       jne
0xffffffff8109d4da <process_one_work+90>
crash> *work_struct ffff88088b273028
struct work_struct {
  data = {
    counter = 0
  }, 
  entry = {
    next = 0xffff88088b273030, 
    prev = 0xffff88088b273030
  }, 
  func = 0xffff8801f55e6780
}
crash> *worker ffff88088b273000
struct worker {
  {
    entry = {
      next = 0x0, 
      prev = 0x0
    }, 
    hentry = {
      next = 0x0, 
      pprev = 0x0
    }
  }, 
  current_work = 0x0, 
  current_func = 0x0, 
  current_pwq = 0x0, 
  desc_valid = false, 
  scheduled = {
    next = 0xffff88088b273030, 
    prev = 0xffff88088b273030
  }, 
  task = 0xffff8801f55e6780, 
  pool = 0xffff881fff313fc0, 
  last_active = 4586712874, 
  flags = 1, 
  id = 2, 
  desc =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
  rescue_wq = 0x0
}
crash> 

It looks like the work parameter is mistakenly pointing into the worker
parameter. I noticed there was one similar case like this before, but not
exactly same.

https://lists.gt.net/linux/kernel/2022989

I wonder how this situation could happen. When process_one_work is called,
the work being processed shall be on worker_pool->worklist, but not
here.

crash> *worker_pool 0xffff881fff313fc0
struct worker_pool {
  lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 4142331618, 
            tickets = {
              head = 63202, 
              tail = 63206
            }
          }
        }
      }
    }
  }, 
  cpu = 16, 
  node = 0, 
  id = 32, 
  flags = 0, 
  worklist = {
    next = 0xffff881fff313fd8, 
    prev = 0xffff881fff313fd8
  }, 
  nr_workers = 3, 
  nr_idle = 2, 
  idle_list = {
    next = 0xffff880245a18280, 
    prev = 0xffff88078d3c5780
  }, 
...

Maybe we'd add if(get_work_pwq(work)) in process_one_work or add
BUG_ON like this:

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dde6298..82a92e0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2012,11 +2012,15 @@ static void process_one_work(struct worker
*worker, struct work_struct *work)
 __releases(&pool->lock)
 __acquires(&pool->lock)
 {
-       struct pool_workqueue *pwq = get_work_pwq(work);
+       struct pool_workqueue *pwq;
        struct worker_pool *pool = worker->pool;
-       bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
+       bool cpu_intensive;
        int work_color;
        struct worker *collision;
+
+       BUG_ON(!(pwq = get_work_pwq(work)));
+       bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
+
 #ifdef CONFIG_LOCKDEP

I really appreciate your any ideas and please let me know if you want 
any more information from the crashed system. 

Thanks,
Yaowei,Bai

                 reply	other threads:[~2017-11-24  7:33 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171124072344.GA313@byw \
    --to=baiyaowei@cmss.chinamobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.