linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0
@ 2013-12-30 15:55 Jack Wang
  2014-01-02  9:36 ` Jack Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Jack Wang @ 2013-12-30 15:55 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel@vger.kernel.org

Hi,

We saw NULL pointer dereference below:

Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000008
Dec 28 16:24:26 server kernel: [979193.076401] IP: [<ffffffff8116952f>]
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
4bdc43067 PMD 0
Dec 28 16:24:26 server kernel: [979193.076410] Oops: 0000 [#1] SMP
Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
thermal_sys mlx4_core
Dec 28 16:24:26 server kernel: [979193.076440]
Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
multipath Tainted: G           O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
Dec 28 16:24:26 server kernel: [979193.076445] RIP:
0010:[<ffffffff8116952f>]  [<ffffffff8116952f>] __blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076448] RSP:
0018:ffff882802f4beb8  EFLAGS: 00010246
Dec 28 16:24:26 server kernel: [979193.076449] RAX: 0000000000000000
RBX: ffff881ff78b0d00 RCX: 0000000000000001
Dec 28 16:24:26 server kernel: [979193.076451] RDX: 0000000000000000
RSI: 000000000000001d RDI: ffff881ff78b0d18
Dec 28 16:24:26 server kernel: [979193.076452] RBP: 0000000000000000
R08: 0000000000000000 R09: 0000000000000000
Dec 28 16:24:26 server kernel: [979193.076453] R10: 0000000000000000
R11: 0000000000000246 R12: 000000000000001d
Dec 28 16:24:26 server kernel: [979193.076455] R13: ffff881ff78b0d18
R14: ffff8807f9e7f400 R15: ffff8804a8d77710
Dec 28 16:24:26 server kernel: [979193.076457] FS:
00007ff8c80fe7a0(0000) GS:ffff880807d80000(0000) knlGS:0000000000000000
Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0000000000000008
CR3: 000000064765f000 CR4: 00000000000407e0
Dec 28 16:24:26 server kernel: [979193.076461] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Dec 28 16:24:26 server kernel: [979193.076463] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
56544, threadinfo ffff882802f4a000, task ffff8828020106d0)
Dec 28 16:24:26 server kernel: [979193.076466] Stack:
Dec 28 16:24:26 server kernel: [979193.076466]  0000000000000000
0000000000000000 ffff880803cf2580 ffff8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076468]  0000000000000010
ffff88100363eff0 ffff881004609b00 ffff882003c20020
Dec 28 16:24:26 server kernel: [979193.076470]  ffff8804a8d77710
ffffffff81136bad 00007fffbdc8f420 ffff8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
Dec 28 16:24:26 server kernel: [979193.076477]  [<ffffffff81136bad>] ?
fput+0xdd/0x270
Dec 28 16:24:26 server kernel: [979193.076479]  [<ffffffff81132f0c>] ?
filp_close+0x5c/0x90
Dec 28 16:24:26 server kernel: [979193.076481]  [<ffffffff81132fb1>] ?
sys_close+0x71/0xc0
Dec 28 16:24:26 server kernel: [979193.076484]  [<ffffffff816801b9>] ?
system_call_fastpath+0x16/0x1b
Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
83 c4 48 c3 66 90 49 8b 86 48 03 00 00 <48> 8b 40 08 48 85 c0 0f 84 fc
fe ff ff 44 89 e6 4c 89 f7 ff d0
Dec 28 16:24:26 server kernel: [979193.076500] RIP  [<ffffffff8116952f>]
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076503]  RSP <ffff882802f4beb8>
Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0000000000000008
Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
23f39da823d257f9 ]---

disassamble results show:
1465	static int __blkdev_put(struct block_device *bdev, fmode_t mode,
int for_part)
1466	{
   0xffffffff81162d10 <+0>:	sub    $0x48,%rsp
   0xffffffff81162d14 <+4>:	mov    %r13,0x30(%rsp)
   0xffffffff81162d1d <+13>:	mov    %rbx,0x18(%rsp)
   0xffffffff81162d22 <+18>:	mov    %rbp,0x20(%rsp)
   0xffffffff81162d27 <+23>:	mov    %r12,0x28(%rsp)
   0xffffffff81162d2c <+28>:	mov    %edx,%ebp
   0xffffffff81162d2e <+30>:	mov    %r14,0x38(%rsp)
   0xffffffff81162d33 <+35>:	mov    %r15,0x40(%rsp)
   0xffffffff81162d38 <+40>:	mov    %rdi,%rbx
   0xffffffff81162d45 <+53>:	mov    %esi,%r12d

1467		int ret = 0;
   0xffffffff81162d8e <+126>:	xor    %ebp,%ebp

1468		struct gendisk *disk = bdev->bd_disk;
   0xffffffff81162d3b <+43>:	mov    0x90(%rdi),%r14

1469		struct block_device *victim = NULL;
1470	
1471		mutex_lock_nested(&bdev->bd_mutex, for_part);
   0xffffffff81162d19 <+9>:	lea    0x18(%rdi),%r13
---Type <return> to continue, or q <return> to quit---
   0xffffffff81162d42 <+50>:	mov    %r13,%rdi
   0xffffffff81162d48 <+56>:	callq  0xffffffff8166ece0 <mutex_lock>

1472		if (for_part)
   0xffffffff81162d4d <+61>:	test   %ebp,%ebp
   0xffffffff81162d4f <+63>:	je     0xffffffff81162d57 <__blkdev_put+71>

1473			bdev->bd_part_count--;
   0xffffffff81162d51 <+65>:	decl   0x88(%rbx)

1474	
1475		if (!--bdev->bd_openers) {
   0xffffffff81162d57 <+71>:	mov    0x4(%rbx),%eax
   0xffffffff81162d5a <+74>:	dec    %eax
   0xffffffff81162d5c <+76>:	test   %eax,%eax
   0xffffffff81162d5e <+78>:	mov    %eax,0x4(%rbx)
   0xffffffff81162d61 <+81>:	jne    0xffffffff81162d8e <__blkdev_put+126>

1476			WARN_ON_ONCE(bdev->bd_holders);
   0xffffffff81162d63 <+83>:	mov    0x58(%rbx),%edx
   0xffffffff81162d66 <+86>:	test   %edx,%edx
   0xffffffff81162d68 <+88>:	jne    0xffffffff81162e9b <__blkdev_put+395>
   0xffffffff81162e9b <+395>:	cmpb   $0x1,0x936b1e(%rip)        #
0xffffffff81a999c0 <__warned.29603>
   0xffffffff81162ea2 <+402>:	je     0xffffffff81162d6e <__blkdev_put+94>
   0xffffffff81162ea8 <+408>:	mov    $0x5c4,%esi
   0xffffffff81162ead <+413>:	mov    $0xffffffff8193f5a7,%rdi
   0xffffffff81162eb4 <+420>:	callq  0xffffffff81036ee0 <warn_slowpath_null>
   0xffffffff81162eb9 <+425>:	movb   $0x1,0x936b00(%rip)        #
0xffffffff81a999c0 <__warned.29603>
   0xffffffff81162ec0 <+432>:	jmpq   0xffffffff81162d6e <__blkdev_put+94>
   0xffffffff81162ec5:	data32 nopw %cs:0x0(%rax,%rax,1)

1484		}
1485		if (bdev->bd_contains == bdev) {
   0xffffffff81162d90 <+128>:	cmp    %rbx,0x70(%rbx)
   0xffffffff81162d94 <+132>:	je     0xffffffff81162e78 <__blkdev_put+360>

1486			if (disk->fops->release)
---Type <return> to continue, or q <return> to quit---
   0xffffffff81162e78 <+360>:	mov    0x348(%r14),%raxat
   0xffffffff81162e7f <+367>:	mov    0x8(%rax),%rax
   0xffffffff81162e83 <+371>:	test   %rax,%rax
   0xffffffff81162e86 <+374>:	je     0xffffffff81162d9a <__blkdev_put+138>

1487				ret = disk->fops->release(disk, mode);
   0xffffffff81162e8c <+380>:	mov    %r12d,%esi
   0xffffffff81162e8f <+383>:	mov    %r14,%rdi
   0xffffffff81162e92 <+386>:	callq  *%rax
   0xffffffff81162e94 <+388>:	mov    %eax,%ebp
   0xffffffff81162e96 <+390>:	jmpq   0xffffffff81162d9a <__blkdev_put+138>
snip

Bug happened at line 1486, looks disk->fops is NULL here for some
reason, is it reasonable to add a check like:

if (disk->fops)
	if (disk->fops->release)
		ret = disk->fops->release(disk, mode);


Happy New Year and Best regards:)
Jack

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0
  2013-12-30 15:55 [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0 Jack Wang
@ 2014-01-02  9:36 ` Jack Wang
  2014-01-04  6:09   ` Al Viro
  0 siblings, 1 reply; 4+ messages in thread
From: Jack Wang @ 2014-01-02  9:36 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel@vger.kernel.org,
	Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 8107 bytes --]

On 12/30/2013 04:55 PM, Jack Wang wrote:
> Hi,
> 
> We saw NULL pointer dereference below:
> 
> Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000008
> Dec 28 16:24:26 server kernel: [979193.076401] IP: [<ffffffff8116952f>]
> __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
> 4bdc43067 PMD 0
> Dec 28 16:24:26 server kernel: [979193.076410] Oops: 0000 [#1] SMP
> Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
> Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
> stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
> raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
> scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
> nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
> ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
> ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
> mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
> evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
> thermal_sys mlx4_core
> Dec 28 16:24:26 server kernel: [979193.076440]
> Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
> multipath Tainted: G           O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
> Dec 28 16:24:26 server kernel: [979193.076445] RIP:
> 0010:[<ffffffff8116952f>]  [<ffffffff8116952f>] __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076448] RSP:
> 0018:ffff882802f4beb8  EFLAGS: 00010246
> Dec 28 16:24:26 server kernel: [979193.076449] RAX: 0000000000000000
> RBX: ffff881ff78b0d00 RCX: 0000000000000001
> Dec 28 16:24:26 server kernel: [979193.076451] RDX: 0000000000000000
> RSI: 000000000000001d RDI: ffff881ff78b0d18
> Dec 28 16:24:26 server kernel: [979193.076452] RBP: 0000000000000000
> R08: 0000000000000000 R09: 0000000000000000
> Dec 28 16:24:26 server kernel: [979193.076453] R10: 0000000000000000
> R11: 0000000000000246 R12: 000000000000001d
> Dec 28 16:24:26 server kernel: [979193.076455] R13: ffff881ff78b0d18
> R14: ffff8807f9e7f400 R15: ffff8804a8d77710
> Dec 28 16:24:26 server kernel: [979193.076457] FS:
> 00007ff8c80fe7a0(0000) GS:ffff880807d80000(0000) knlGS:0000000000000000
> Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0000000000000008
> CR3: 000000064765f000 CR4: 00000000000407e0
> Dec 28 16:24:26 server kernel: [979193.076461] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Dec 28 16:24:26 server kernel: [979193.076463] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
> 56544, threadinfo ffff882802f4a000, task ffff8828020106d0)
> Dec 28 16:24:26 server kernel: [979193.076466] Stack:
> Dec 28 16:24:26 server kernel: [979193.076466]  0000000000000000
> 0000000000000000 ffff880803cf2580 ffff8804a8d77700
> Dec 28 16:24:26 server kernel: [979193.076468]  0000000000000010
> ffff88100363eff0 ffff881004609b00 ffff882003c20020
> Dec 28 16:24:26 server kernel: [979193.076470]  ffff8804a8d77710
> ffffffff81136bad 00007fffbdc8f420 ffff8804a8d77700
> Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
> Dec 28 16:24:26 server kernel: [979193.076477]  [<ffffffff81136bad>] ?
> fput+0xdd/0x270
> Dec 28 16:24:26 server kernel: [979193.076479]  [<ffffffff81132f0c>] ?
> filp_close+0x5c/0x90
> Dec 28 16:24:26 server kernel: [979193.076481]  [<ffffffff81132fb1>] ?
> sys_close+0x71/0xc0
> Dec 28 16:24:26 server kernel: [979193.076484]  [<ffffffff816801b9>] ?
> system_call_fastpath+0x16/0x1b
> Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
> 6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
> 83 c4 48 c3 66 90 49 8b 86 48 03 00 00 <48> 8b 40 08 48 85 c0 0f 84 fc
> fe ff ff 44 89 e6 4c 89 f7 ff d0
> Dec 28 16:24:26 server kernel: [979193.076500] RIP  [<ffffffff8116952f>]
> __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076503]  RSP <ffff882802f4beb8>
> Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0000000000000008
> Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
> 23f39da823d257f9 ]---
> 
> disassamble results show:
> 1465	static int __blkdev_put(struct block_device *bdev, fmode_t mode,
> int for_part)
> 1466	{
>    0xffffffff81162d10 <+0>:	sub    $0x48,%rsp
>    0xffffffff81162d14 <+4>:	mov    %r13,0x30(%rsp)
>    0xffffffff81162d1d <+13>:	mov    %rbx,0x18(%rsp)
>    0xffffffff81162d22 <+18>:	mov    %rbp,0x20(%rsp)
>    0xffffffff81162d27 <+23>:	mov    %r12,0x28(%rsp)
>    0xffffffff81162d2c <+28>:	mov    %edx,%ebp
>    0xffffffff81162d2e <+30>:	mov    %r14,0x38(%rsp)
>    0xffffffff81162d33 <+35>:	mov    %r15,0x40(%rsp)
>    0xffffffff81162d38 <+40>:	mov    %rdi,%rbx
>    0xffffffff81162d45 <+53>:	mov    %esi,%r12d
> 
> 1467		int ret = 0;
>    0xffffffff81162d8e <+126>:	xor    %ebp,%ebp
> 
> 1468		struct gendisk *disk = bdev->bd_disk;
>    0xffffffff81162d3b <+43>:	mov    0x90(%rdi),%r14
> 
> 1469		struct block_device *victim = NULL;
> 1470	
> 1471		mutex_lock_nested(&bdev->bd_mutex, for_part);
>    0xffffffff81162d19 <+9>:	lea    0x18(%rdi),%r13
> ---Type <return> to continue, or q <return> to quit---
>    0xffffffff81162d42 <+50>:	mov    %r13,%rdi
>    0xffffffff81162d48 <+56>:	callq  0xffffffff8166ece0 <mutex_lock>
> 
> 1472		if (for_part)
>    0xffffffff81162d4d <+61>:	test   %ebp,%ebp
>    0xffffffff81162d4f <+63>:	je     0xffffffff81162d57 <__blkdev_put+71>
> 
> 1473			bdev->bd_part_count--;
>    0xffffffff81162d51 <+65>:	decl   0x88(%rbx)
> 
> 1474	
> 1475		if (!--bdev->bd_openers) {
>    0xffffffff81162d57 <+71>:	mov    0x4(%rbx),%eax
>    0xffffffff81162d5a <+74>:	dec    %eax
>    0xffffffff81162d5c <+76>:	test   %eax,%eax
>    0xffffffff81162d5e <+78>:	mov    %eax,0x4(%rbx)
>    0xffffffff81162d61 <+81>:	jne    0xffffffff81162d8e <__blkdev_put+126>
> 
> 1476			WARN_ON_ONCE(bdev->bd_holders);
>    0xffffffff81162d63 <+83>:	mov    0x58(%rbx),%edx
>    0xffffffff81162d66 <+86>:	test   %edx,%edx
>    0xffffffff81162d68 <+88>:	jne    0xffffffff81162e9b <__blkdev_put+395>
>    0xffffffff81162e9b <+395>:	cmpb   $0x1,0x936b1e(%rip)        #
> 0xffffffff81a999c0 <__warned.29603>
>    0xffffffff81162ea2 <+402>:	je     0xffffffff81162d6e <__blkdev_put+94>
>    0xffffffff81162ea8 <+408>:	mov    $0x5c4,%esi
>    0xffffffff81162ead <+413>:	mov    $0xffffffff8193f5a7,%rdi
>    0xffffffff81162eb4 <+420>:	callq  0xffffffff81036ee0 <warn_slowpath_null>
>    0xffffffff81162eb9 <+425>:	movb   $0x1,0x936b00(%rip)        #
> 0xffffffff81a999c0 <__warned.29603>
>    0xffffffff81162ec0 <+432>:	jmpq   0xffffffff81162d6e <__blkdev_put+94>
>    0xffffffff81162ec5:	data32 nopw %cs:0x0(%rax,%rax,1)
> 
> 1484		}
> 1485		if (bdev->bd_contains == bdev) {
>    0xffffffff81162d90 <+128>:	cmp    %rbx,0x70(%rbx)
>    0xffffffff81162d94 <+132>:	je     0xffffffff81162e78 <__blkdev_put+360>
> 
> 1486			if (disk->fops->release)
> ---Type <return> to continue, or q <return> to quit---
>    0xffffffff81162e78 <+360>:	mov    0x348(%r14),%raxat
>    0xffffffff81162e7f <+367>:	mov    0x8(%rax),%rax
>    0xffffffff81162e83 <+371>:	test   %rax,%rax
>    0xffffffff81162e86 <+374>:	je     0xffffffff81162d9a <__blkdev_put+138>
> 
> 1487				ret = disk->fops->release(disk, mode);
>    0xffffffff81162e8c <+380>:	mov    %r12d,%esi
>    0xffffffff81162e8f <+383>:	mov    %r14,%rdi
>    0xffffffff81162e92 <+386>:	callq  *%rax
>    0xffffffff81162e94 <+388>:	mov    %eax,%ebp
>    0xffffffff81162e96 <+390>:	jmpq   0xffffffff81162d9a <__blkdev_put+138>
> snip
> 
> Bug happened at line 1486, looks disk->fops is NULL here for some
> reason, is it reasonable to add a check like:
> 
> if (disk->fops)
> 	if (disk->fops->release)
> 		ret = disk->fops->release(disk, mode);
> 
> 
> Happy New Year and Best regards:)
> Jack
> 

Ping, could you share opnions on this, attached with patch I proposaled.

Jack

[-- Attachment #2: 0001-fix-null-pointer-dereference-in-__blkdev_put.patch --]
[-- Type: text/x-patch, Size: 2575 bytes --]

>From 153918f99e45c685700e919a92384395dc18fd5d Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Thu, 2 Jan 2014 10:24:29 +0100
Subject: [PATCH] fix null pointer dereference in __blkdev_put We were hit by
 bug below: Dec 28 16:24:26 pserver1812 kernel: [979193.076399] BUG: unable to
 handle kernel NULL pointer dereference at 0000000000000008 Dec 28 16:24:26
 pserver1812 kernel: [979193.076401] IP: [<ffffffff8116952f>]
 __blkdev_put+0x17f/0x1d0 Dec 28 16:24:26 pserver1812 kernel: [979193.076442]
 Pid: 56544, comm: multipath Tainted: G           O 3.4.71-3-pserver #1
 Supermicro BHQGE/BHQGE Dec 28 16:24:26 pserver1812 kernel: [979193.076445]
 RIP: 0010:[<ffffffff8116952f>]  [<ffffffff8116952f>] __blkdev_put+0x17f/0x1d0
 Dec 28 16:24:26 pserver1812 kernel: [979193.076472] Call Trace: Dec 28
 16:24:26 pserver1812 kernel: [979193.076477]  [<ffffffff81136bad>] ?
 fput+0xdd/0x270 Dec 28 16:24:26 pserver1812 kernel: [979193.076479] 
 [<ffffffff81132f0c>] ? filp_close+0x5c/0x90 Dec 28 16:24:26 pserver1812
 kernel: [979193.076481]  [<ffffffff81132fb1>] ? sys_close+0x71/0xc0 Dec 28
 16:24:26 pserver1812 kernel: [979193.076484]  [<ffffffff816801b9>] ?
 system_call_fastpath+0x16/0x1b Dec 28 16:24:26 pserver1812 kernel:
 [979193.076486] Code: 8b 5c 24 18 48 8b 6c 24 20 4c 8b 64 24 28 4c 8b 6c 24
 30 4c 8b 74 24 38 4c 8b 7c 24 40 48 83 c4 48 c3 66 90 49 8b 86 48 03 00 00
 <48> 8b 40 08 48 85 c0 0f 84 fc fe ff ff 44 89 e6 4c 89 f7 ff d0 Dec 28
 16:24:26 pserver1812 kernel: [979193.076500] RIP  [<ffffffff8116952f>]
 __blkdev_put+0x17f/0x1d0 Dec 28 16:24:26 pserver1812 kernel: [979193.076503] 
 RSP <ffff882802f4beb8> Dec 28 16:24:26 pserver1812 kernel: [979193.076504]
 CR2: 0000000000000008 Dec 28 16:24:26 pserver1812 kernel: [979193.077599]
 ---[ end trace 23f39da823d257f9 ]---

Disamble code show null pointer happened in fops, fix by check before use it.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 fs/block_dev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 319d9c7..d3c45b4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1483,8 +1483,9 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 					&default_backing_dev_info);
 	}
 	if (bdev->bd_contains == bdev) {
-		if (disk->fops->release)
-			ret = disk->fops->release(disk, mode);
+		if (disk->fops)
+			if (disk->fops->release)
+				ret = disk->fops->release(disk, mode);
 	}
 	if (!bdev->bd_openers) {
 		struct module *owner = disk->fops->owner;
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0
  2014-01-02  9:36 ` Jack Wang
@ 2014-01-04  6:09   ` Al Viro
  2014-01-06  8:45     ` Jack Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2014-01-04  6:09 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-fsdevel, linux-kernel@vger.kernel.org, Jens Axboe

On Thu, Jan 02, 2014 at 10:36:30AM +0100, Jack Wang wrote:

> > Bug happened at line 1486, looks disk->fops is NULL here for some
> > reason, is it reasonable to add a check like:
> > 
> > if (disk->fops)
> > 	if (disk->fops->release)
> > 		ret = disk->fops->release(disk, mode);
> > 
> > 
> > Happy New Year and Best regards:)
> > Jack
> > 
> 
> Ping, could you share opnions on this, attached with patch I proposaled.

Sorry, had been sick since mid-December ;-/  The patch is not a good idea -
in the best case it's papering over a bug (and insufficiently so, at that,
since there are other places where disk->fops->some_method is checked).

gendisk->fops should never be assigned NULL; it starts life with NULL
->fops, but that should be assigned a non-NULL value (and never modified
afterwards) before anyone can see it.  Moreover, even if some driver has
fscked up and forgot to initialize the damn thing, get_gendisk() would've
refused to return such a thing to any callers (including __blkdev_get()).
Note that __blkdev_get() would oops on such a thing if get_gendisk()
somehow returned it.

Looks like something is shitting over bdev->bd_disk or bdev->bd_disk->fops.
The offsets in the disassembled code are all wrong (including that from
beginning of function to oopsing instruction), but the code match is good,
so I agree that we are hitting bdev->bd_disk->fops == NULL here.  The
question is how it has happened - that's where the real bug is...

How reproducible it is?  And which kernel, while we are at it?  This area
didn't get a lot of changes lately, but still...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0
  2014-01-04  6:09   ` Al Viro
@ 2014-01-06  8:45     ` Jack Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Jack Wang @ 2014-01-06  8:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel@vger.kernel.org, Jens Axboe

On 01/04/2014 07:09 AM, Al Viro wrote:
> On Thu, Jan 02, 2014 at 10:36:30AM +0100, Jack Wang wrote:
> 
>>> Bug happened at line 1486, looks disk->fops is NULL here for some
>>> reason, is it reasonable to add a check like:
>>>
>>> if (disk->fops)
>>> 	if (disk->fops->release)
>>> 		ret = disk->fops->release(disk, mode);
>>>
>>>
>>> Happy New Year and Best regards:)
>>> Jack
>>>
>>
>> Ping, could you share opnions on this, attached with patch I proposaled.
> 
> Sorry, had been sick since mid-December ;-/  The patch is not a good idea -
> in the best case it's papering over a bug (and insufficiently so, at that,
> since there are other places where disk->fops->some_method is checked).
> 
> gendisk->fops should never be assigned NULL; it starts life with NULL
> ->fops, but that should be assigned a non-NULL value (and never modified
> afterwards) before anyone can see it.  Moreover, even if some driver has
> fscked up and forgot to initialize the damn thing, get_gendisk() would've
> refused to return such a thing to any callers (including __blkdev_get()).
> Note that __blkdev_get() would oops on such a thing if get_gendisk()
> somehow returned it.
> 
> Looks like something is shitting over bdev->bd_disk or bdev->bd_disk->fops.
> The offsets in the disassembled code are all wrong (including that from
> beginning of function to oopsing instruction), but the code match is good,
> so I agree that we are hitting bdev->bd_disk->fops == NULL here.  The
> question is how it has happened - that's where the real bug is...
> 
> How reproducible it is?  And which kernel, while we are at it?  This area
> didn't get a lot of changes lately, but still...
> 
Thanks Al for reply, and look into this.
We're using 3.4.71, and this happened in production, we can not
reproduce it yet. What I could see is: before this happened, we saw scsi
devices offlined, and multipath failed path, raid1 failed member device.

Possible the bug lies in drivers md-raid1, dm-multipath or sd? How could
I narrow it down? Could you teach me?

Thanks, wish you happy and healthy!

Jack

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-06  8:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-30 15:55 [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0 Jack Wang
2014-01-02  9:36 ` Jack Wang
2014-01-04  6:09   ` Al Viro
2014-01-06  8:45     ` Jack Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).