From: "Nadolski, Edmund" <edmund.nadolski@intel.com>
To: Stefan Majer <stefan.majer@gmail.com>, sagi grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>,
linux-nvme <linux-nvme@lists.infradead.org>
Subject: Re: null pointer dereference in nvme_tcp_io_work
Date: Tue, 7 Jan 2020 09:48:56 -0700 [thread overview]
Message-ID: <cb5caef9-097a-9e1d-f42d-7e6f6612bbb2@intel.com> (raw)
In-Reply-To: <CADdPHGumJGQcuHtzFZKmezgec1Jx1fBjAJeQDf5_n-YWvowdeQ@mail.gmail.com>
On 1/7/2020 8:41 AM, Stefan Majer wrote:
> Hi,
>
> is there anything i can help with to further nail down the problem ?
>
> please let me know.
> Stefan
Is there any kind of reset in progress when this hits?
Thanks,
Ed
> On Sat, Dec 28, 2019 at 6:53 PM Stefan Majer <stefan.majer@gmail.com> wrote:
>>
>> I have to add:
>>
>> ./faddr2line /var/lib/debug/lib/modules/5.3.0-24-generic/kernel/drivers/nvme/host/nvme-tcp.ko
>> nvme_tcp_io_work+0x341/0x7f0
>> nvme_tcp_io_work+0x341/0x7f0:
>> nvme_tcp_req_cur_length at
>> /build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:189
>> (inlined by) nvme_tcp_try_send_data at
>> /build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:854
>> (inlined by) nvme_tcp_try_send at
>> /build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:1011
>> (inlined by) nvme_tcp_io_work at
>> /build/linux-4AS01l/linux-5.3.0/drivers/nvme/host/tcp.c:1048
>>
>> On Sat, Dec 28, 2019 at 6:49 PM Stefan Majer <stefan.majer@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > took a while, but now reproduced with ubuntu-19.10 kernel 5.3.x i
>> > installed the debug symbols and ran decodestacktrace.sh from kernel
>> > sources which gives me:
>> >
>> > [ 29.266954] nvme nvme0: new ctrl: NQN
>> > "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.22.1:4420
>> > [ 29.267477] nvme nvme0: Removing ctrl: NQN
>> > "nqn.2014-08.org.nvmexpress.discovery"
>> > [ 29.285732] nvme nvme0: creating 1 I/O queues.
>> > [ 29.286632] nvme nvme0: mapped 1/0 default/read queues.
>> > [ 29.288565] nvme nvme0: new ctrl: NQN "nvmet-test", addr
>> > 192.168.22.1:4420
>> > [ 29.293146] nvme0n1: detected capacity change from 0 to 1084227584
>> > [ 39.196846] BUG: kernel NULL pointer dereference, address:
>> > 0000000000000008
>> > [ 39.198524] #PF: supervisor read access in kernel mode
>> > [ 39.199786] #PF: error_code(0x0000) - not-present page
>> > [ 39.201198] PGD 0 P4D 0
>> > [ 39.201849] Oops: 0000 [#1] SMP PTI
>> > [ 39.202679] CPU: 0 PID: 223 Comm: kworker/0:1H Kdump: loaded Not
>> > tainted 5.3.0-24-generic #26-Ubuntu
>> > [ 39.204830] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> > BIOS 0.0.0 02/06/2015
>> > [ 39.207205] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
>> > [ 39.209005] RIP: 0010:nvme_tcp_io_work+0x341/0x7f0 nvme_tcp
>> > [ 39.210686] Code: 8b 87 98 00 00 00 83 f8 02 0f 85 34 fd ff ff 49 8b
>> > 47 28 4d 89 fe 48 89 45 a8 49 8b 46 78 49 8b 56 68 45 8b 66 34 45 2b
>> > 66 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48 89
>> > 75
>> > All code
>> > ========
>> > 0: 8b 87 98 00 00 00 mov 0x98(%rdi),%eax
>> > 6: 83 f8 02 cmp $0x2,%eax
>> > 9: 0f 85 34 fd ff ff jne 0xfffffffffffffd43
>> > f: 49 8b 47 28 mov 0x28(%r15),%rax
>> > 13: 4d 89 fe mov %r15,%r14
>> > 16: 48 89 45 a8 mov %rax,-0x58(%rbp)
>> > 1a: 49 8b 46 78 mov 0x78(%r14),%rax
>> > 1e: 49 8b 56 68 mov 0x68(%r14),%rdx
>> > 22: 45 8b 66 34 mov 0x34(%r14),%r12d
>> > 26: 45 2b 66 38 sub 0x38(%r14),%r12d
>> > 2a:* 8b 58 08 mov 0x8(%rax),%ebx <--
>> > trapping instruction
>> > 2d: 8b 48 0c mov 0xc(%rax),%ecx
>> > 30: 4c 8b 28 mov (%rax),%r13
>> > 33: 48 29 d3 sub %rdx,%rbx
>> > 36: 48 8d 34 11 lea (%rcx,%rdx,1),%rsi
>> > 3a: 4c 39 e3 cmp %r12,%rbx
>> > 3d: 48 rex.W
>> > 3e: 89 .byte 0x89
>> > 3f: 75 .byte 0x75
>> >
>> > Code starting with the faulting instruction
>> > ===========================================
>> > 0: 8b 58 08 mov 0x8(%rax),%ebx
>> > 3: 8b 48 0c mov 0xc(%rax),%ecx
>> > 6: 4c 8b 28 mov (%rax),%r13
>> > 9: 48 29 d3 sub %rdx,%rbx
>> > c: 48 8d 34 11 lea (%rcx,%rdx,1),%rsi
>> > 10: 4c 39 e3 cmp %r12,%rbx
>> > 13: 48 rex.W
>> > 14: 89 .byte 0x89
>> > 15: 75 .byte 0x75
>> > [ 39.216464] RSP: 0018:ffffb0f8c0453dd8 EFLAGS: 00010206
>> > [ 39.218053] RAX: 0000000000000000 RBX: 00000000b4e42801 RCX: 0000000000000000
>> > [ 39.219803] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9dd8e6e49478
>> > [ 39.221766] RBP: ffffb0f8c0453e60 R08: 0000000000001000 R09: 0000000002800809
>> > [ 39.223635] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
>> > [ 39.226010] R13: 0000000000000048 R14: ffff9dd8e6e49418 R15: ffff9dd8e6e49418
>> > [ 39.228992] FS: 0000000000000000(0000) GS:ffff9dd8ff600000(0000)
>> > knlGS:0000000000000000
>> > [ 39.233660] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [ 39.237863] CR2: 0000000000000008 CR3: 0000000067c6a005 CR4: 0000000000360ef0
>> > [ 39.241807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > [ 39.244496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > [ 39.246569] Call Trace:
>> > [ 39.247272] process_one_work
>> > (/build/linux-4AS01l/linux-5.3.0/arch/x86/include/asm/jump_label.h:25
>> > /build/linux-4AS01l/linux-5.3.0/include/linux/jump_label.h:200
>> > /build/linux-4AS01l/linux-5.3.0/include/trace/events/workqueu
>> > e.h:114 /build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2274)
>> > [ 39.248361] worker_thread
>> > (/build/linux-4AS01l/linux-5.3.0/include/linux/compiler.h:199
>> > /build/linux-4AS01l/linux-5.3.0/include/linux/list.h:268
>> > /build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2416)
>> > [ 39.249364] kthread (/build/linux-4AS01l/linux-5.3.0/kernel/kthread.c:255)
>> > [ 39.250243] ? process_one_work
>> > (/build/linux-4AS01l/linux-5.3.0/kernel/workqueue.c:2358)
>> > [ 39.251485] ? kthread_park
>> > (/build/linux-4AS01l/linux-5.3.0/kernel/kthread.c:215)
>> > [ 39.252474] ret_from_fork
>> > (/build/linux-4AS01l/linux-5.3.0/arch/x86/entry/entry_64.S:358)
>> > [ 39.253476] Modules linked in: nvme_tcp nvme_fabrics nvme nvme_core
>> > xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user
>> > xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack
>> > nf_defrag_ipv6 nf_
>> > defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs
>> > overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass
>> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
>> > aes_x86_64 crypto_simd cirrus nls_i
>> > so8859_1 cryptd glue_helper drm_kms_helper drm input_leds joydev
>> > fb_sys_fops serio_raw syscopyarea sysfillrect sysimgblt mac_hid
>> > qemu_fw_cfg bonding sch_fq_codel ipmi_watchdog ipmi_devintf
>> > ipmi_msghandler virtio_rng ip_tables
>> > x_tables autofs4 psmouse virtio_net net_failover failover ahci libahci
>> > i2c_piix4 pata_acpi floppy
>> > [ 39.269809] CR2: 0000000000000008
>> >
>> > greetings
>> > Stefan
>> >
>> > On Fri, Dec 27, 2019 at 8:54 AM Stefan Majer <stefan.majer@gmail.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > no problem, i am also on vacation.
>> > >
>> > > the issue is not reproducible in a pure bare metal environment, target
>> > > and host are physical machines.
>> > > The environment where it happens both machines are kvm based.
>> > >
>> > > I first have to figure out howto gdb on the kernel crash, thats not my
>> > > daily jobs, so please be patient.
>> > >
>> > > Greetings
>> > > Stefan
>> > >
>> > > On Fri, Dec 27, 2019 at 8:49 AM sagi grimberg <sagi@grimberg.me> wrote:
>> > > >
>> > > > Hey,
>> > > >
>> > > > On vacation so not able to take a look right now, but can you provide a line info from gdb on the RIP line?
>> > > >
>> > > > Also, did you say that the issue is not reproducible when the host is on bare metal but only on kvm? ( You said the target, but I'm asking about the host).
>> > > >
>> > > > On Thu, Dec 26, 2019, 23:18 Stefan Majer <stefan.majer@gmail.com> wrote:
>> > > >>
>> > > >> Hi,
>> > > >>
>> > > >> i have to add that doing the same on bare metal does work without any problems.
>> > > >> I suspect that this is probably caused by the fact that in the above
>> > > >> example my target is a qemu-kvm machine with a emulated nvme device.
>> > > >> Greetings
>> > > >> Stefan
>> > > >>
>> > > >> On Thu, Dec 26, 2019 at 6:47 PM Keith Busch <kbusch@kernel.org> wrote:
>> > > >> >
>> > > >> > Adding Sagi.
>> > > >> >
>> > > >> > On Wed, Dec 25, 2019 at 11:06:17AM +0100, Stefan Majer wrote:
>> > > >> > > Hi,
>> > > >> > >
>> > > >> > > im trying to setup a nvme-over-tcp test environment with a qemu-kvm
>> > > >> > > based nvmet-tcp target based on ubuntu-19.10 and a ubuntu-19.10 host
>> > > >> > > with kernel 5.4.6 installed. Kernel was taken from
>> > > >> > > https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4.6/ . Same Panic
>> > > >> > > occurs with ubuntu 19.10 kernel 5.3.x
>> > > >> > >
>> > > >> > > After setup the target i can discover and connect the exported nvme
>> > > >> > > device on the host with:
>> > > >> > > modprobe nvme
>> > > >> > > modprobe nvme-tcp
>> > > >> > > nvme discover -t tcp -a 192.168.22.1 -s 4420
>> > > >> > > nvme connect -t tcp -n nvmet-test -a 192.168.22.1 -s 4420
>> > > >> > >
>> > > >> > > No errors so far, but when i try to format the device with:
>> > > >> > >
>> > > >> > > mkfs.ext4 /dev/nvme0n1
>> > > >> > >
>> > > >> > > The kernel panics with:
>> > > >> > > Writing inode tables:
>> > > >> > > [ 692.651243] BUG: kernel NULL pointer dereference, address: 0000000000000008
>> > > >> > > [ 692.653158] #PF: supervisor read access in kernel mode
>> > > >> > > [ 692.653922] #PF: error_code(0x0000) - not-present page
>> > > >> > > [ 692.653922] PGD 0 P4D 0
>> > > >> > > [ 692.653922] Oops: 0000 [#1] SMP PTI
>> > > >> > > [ 692.653922] CPU: 0 PID: 224 Comm: kworker/0:1H Not tainted
>> > > >> > > 5.4.6-050406-generic #201912211140
>> > > >> > > [ 692.653922] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> > > >> > > BIOS 0.0.0 02/06/2015
>> > > >> > > [ 692.653922] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
>> > > >> > > [ 692.653922] RIP: 0010:nvme_tcp_io_work+0x308/0x790 [nvme_tcp]
>> > > >> > > [ 692.653922] Code: 8b 86 98 00 00 00 83 f8 02 0f 85 6d fd ff ff 49
>> > > >> > > 8b 46 28 4d 89 f7 48 89 45 a8 49 8b 47 78 49 8b 57 68 45 8b 67 34 45
>> > > >> > > 2b 67 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48
>> > > >> > > 89 75
>> > > >> > > [ 692.653922] RSP: 0018:ffffa49a00447dd8 EFLAGS: 00010206
>> > > >> > > [ 692.653922] RAX: 0000000000000000 RBX: 0000000077bd3601 RCX: 0000000000000000
>> > > >> > > [ 692.653922] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9376781c0500
>> > > >> > > [ 692.653922] RBP: ffffa49a00447e60 R08: 0000000000001000 R09: 0000000005000809
>> > > >> > > [ 692.653922] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
>> > > >> > > [ 692.653922] R13: 0000000000000048 R14: ffff9376781c04a0 R15: ffff9376781c04a0
>> > > >> > > [ 692.653922] FS: 0000000000000000(0000) GS:ffff93767f600000(0000)
>> > > >> > > knlGS:0000000000000000
>> > > >> > > [ 692.653922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > >> > > [ 692.653922] CR2: 0000000000000008 CR3: 000000007b488003 CR4: 0000000000360ef0
>> > > >> > > [ 692.653922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > > >> > > [ 692.653922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > > >> > > [ 692.653922] Call Trace:
>> > > >> > > [ 692.653922] process_one_work+0x1ec/0x3a0
>> > > >> > > [ 692.653922] worker_thread+0x4d/0x400
>> > > >> > > [ 692.653922] kthread+0x104/0x140
>> > > >> > > [ 692.653922] ? process_one_work+0x3a0/0x3a0
>> > > >> > > [ 692.653922] ? kthread_park+0x90/0x90
>> > > >> > > [ 692.653922] ret_from_fork+0x35/0x40
>> > > >> > > [ 692.653922] Modules linked in: binfmt_misc nvme_tcp nvme_fabrics
>> > > >> > > nvme nvme_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink
>> > > >> > > nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat
>> > > >> > > nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter
>> > > >> > > br_netfilter bridge stp llc overlay intel_rapl_msr intel_rapl_common
>> > > >> > > kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
>> > > >> > > ghash_clmulni_intel aesni_intel nls_iso8859_1 crypto_simd cryptd
>> > > >> > > cirrus glue_helper drm_kms_helper drm input_leds fb_sys_fops joydev
>> > > >> > > serio_raw syscopyarea sysfillrect sysimgblt mac_hid qemu_fw_cfg
>> > > >> > > bonding sch_fq_codel ipmi_watchdog ipmi_devintf ipmi_msghandler
>> > > >> > > virtio_rng ip_tables x_tables autofs4 ahci psmouse virtio_net
>> > > >> > > net_failover failover libahci i2c_piix4 pata_acpi floppy
>> > > >> > > [ 692.653922] CR2: 0000000000000008
>> > > >> > > [ 692.653922] ---[ end trace d688c2c182feef87 ]---
>> > > >> > > [ 692.653922] RIP: 0010:nvme_tcp_io_work+0x308/0x790 [nvme_tcp]
>> > > >> > > [ 692.653922] Code: 8b 86 98 00 00 00 83 f8 02 0f 85 6d fd ff ff 49
>> > > >> > > 8b 46 28 4d 89 f7 48 89 45 a8 49 8b 47 78 49 8b 57 68 45 8b 67 34 45
>> > > >> > > 2b 67 38 <8b> 58 08 8b 48 0c 4c 8b 28 48 29 d3 48 8d 34 11 4c 39 e3 48
>> > > >> > > 89 75
>> > > >> > > [ 692.653922] RSP: 0018:ffffa49a00447dd8 EFLAGS: 00010206
>> > > >> > > [ 692.653922] RAX: 0000000000000000 RBX: 0000000077bd3601 RCX: 0000000000000000
>> > > >> > > [ 692.653922] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff9376781c0500
>> > > >> > > [ 692.653922] RBP: ffffa49a00447e60 R08: 0000000000001000 R09: 0000000005000809
>> > > >> > > [ 692.653922] R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000001000
>> > > >> > > [ 692.653922] R13: 0000000000000048 R14: ffff9376781c04a0 R15: ffff9376781c04a0
>> > > >> > > [ 692.653922] FS: 0000000000000000(0000) GS:ffff93767f600000(0000)
>> > > >> > > knlGS:0000000000000000
>> > > >> > > [ 692.653922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > >> > > [ 692.653922] CR2: 0000000000000008 CR3: 000000007b488003 CR4: 0000000000360ef0
>> > > >> > > [ 692.653922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > > >> > > [ 692.653922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > > >> > >
>> > > >> > >
>> > > >> > > Any help appreciated.
>> > > >> > >
>> > > >> > > Greetings
>> > > >> > >
>> > > >> > > --
>> > > >> > > Stefan Majer
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> Stefan Majer
>> > >
>> > >
>> > >
>> > > --
>> > > Stefan Majer
>> >
>> >
>> >
>> > --
>> > Stefan Majer
>>
>>
>>
>> --
>> Stefan Majer
>
>
>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-01-07 16:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-25 10:06 null pointer dereference in nvme_tcp_io_work Stefan Majer
2019-12-26 17:47 ` Keith Busch
2019-12-27 7:18 ` Stefan Majer
[not found] ` <CAB5Wxwco3KD1e_nRGQ_mWAMa_2d-wP2-1Aao4ZXtDeVgFQQM_w@mail.gmail.com>
2019-12-27 7:54 ` Stefan Majer
2019-12-28 17:49 ` Stefan Majer
2019-12-28 17:53 ` Stefan Majer
2020-01-07 15:41 ` Stefan Majer
2020-01-07 16:48 ` Nadolski, Edmund [this message]
2020-01-15 20:03 ` Sagi Grimberg
2020-01-16 6:28 ` Stefan Majer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb5caef9-097a-9e1d-f42d-7e6f6612bbb2@intel.com \
--to=edmund.nadolski@intel.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=stefan.majer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox