Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
@ 2018-06-11 11:02 Wolfgang Walter
  2018-06-11 14:22 ` Lars Ellenberg
  0 siblings, 1 reply; 2+ messages in thread
From: Wolfgang Walter @ 2018-06-11 11:02 UTC (permalink / raw)
  To: drbd-dev

After switching from 4.9.102 to 4.14.48 I got the following warning:


[204738.619214] ------------[ cut here ]------------
[204738.619225] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
[204738.619226] Modules linked in: cbc cts rpcsec_gss_krb5 8021q garp mrp stp llc dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio binfmt_misc iTCO_wdt iTCO_vendor_support intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore snd_pcm snd_timer intel_rapl_perf snd soundcore pcspkr ast ttm joydev drm_kms_helper drm i2c_algo_bit lpc_ich mfd_core mei_me mei ioatdma sg shpchp evdev wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button poly1305_x86_64 poly1305_generic sha256_ssse3 sha1_ssse3 chacha20_x86_64 chacha20_generic pcbc aesni_intel aes_x86_64 crypto_simd glue_helper nfsd cryptd nfs_acl lockd grace drbd auth_rpcgss lru_cache sunrpc ip_tables x_tables autofs4
[204738.619289]  ext4 crc16 mbcache jbd2 fscrypto raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 hid_generic usbhid hid md_mod ses enclosure sd_mod ahci libahci ixgbe mpt3sas raid_class xhci_pci dca crc32c_intel ehci_pci libata scsi_transport_sas xhci_hcd ehci_hcd ptp pps_core i2c_i801 usbcore mdio scsi_mod
[204738.619322] CPU: 5 PID: 8321 Comm: kworker/5:6 Not tainted 4.14.48-d64.all+1.1 #1
[204738.619324] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1a 10/16/2015
[204738.619338] Workqueue: md md_submit_flush_data [md_mod]
[204738.619339] task: ffff92eb15a1d040 task.stack: ffffa0eb8dba4000
[204738.619342] RIP: 0010:blk_status_to_errno+0x1a/0x30
[204738.619343] RSP: 0018:ffffa0eb8dba7e30 EFLAGS: 00010296
[204738.619345] RAX: 00000000000000d1 RBX: ffff92eb0f7c8000 RCX: ffffd511dbc9b6df
[204738.619346] RDX: 0000000000000023 RSI: ffffffffa4a3bd25 RDI: 00000000000000d1
[204738.619347] RBP: ffff92d64a8a78c0 R08: ffff930aef83e0c8 R09: ffffa0eb8dba7dc0
[204738.619348] R10: 0000000000000034 R11: 0000000000000075 R12: ffff92e9f00c9b00
[204738.619349] R13: 0000000000000015 R14: ffff92d4c15a8c00 R15: ffff92eb10790c20
[204738.619351] FS:  0000000000000000(0000) GS:ffff92eb1fb40000(0000) knlGS:0000000000000000
[204738.619352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[204738.619353] CR2: 00007fecbec70000 CR3: 00000021ef00a006 CR4: 00000000001606e0
[204738.619354] Call Trace:
[204738.619366]  drbd_request_endio+0x5d/0x280 [drbd]
[204738.619376]  clone_endio+0x88/0x130 [dm_mod]
[204738.619382]  process_one_work+0x172/0x3b0
[204738.619385]  worker_thread+0x2e/0x380
[204738.619387]  ? process_one_work+0x3b0/0x3b0
[204738.619389]  kthread+0x11a/0x130
[204738.619391]  ? kthread_create_on_node+0x70/0x70
[204738.619396]  ret_from_fork+0x35/0x40
[204738.619398] Code: 74 da e9 0a ff ff ff e8 25 79 d2 ff 0f 1f 44 00 00 0f 1f 44 00 00 40 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 40 db 86 a4 c3 <0f> 0b b8 fb ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 
[204738.619429] ---[ end trace 8bb05fa93f1d6add ]---

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Drbd-dev] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
  2018-06-11 11:02 [Drbd-dev] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 Wolfgang Walter
@ 2018-06-11 14:22 ` Lars Ellenberg
  0 siblings, 0 replies; 2+ messages in thread
From: Lars Ellenberg @ 2018-06-11 14:22 UTC (permalink / raw)
  To: drbd-dev; +Cc: lars.ellenberg, philipp.reisner

On Mon, Jun 11, 2018 at 01:02:26PM +0200, Wolfgang Walter wrote:
> After switching from 4.9.102 to 4.14.48 I got the following warning:
> 
> 
> [204738.619214] ------------[ cut here ]------------
> [204738.619225] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
> [204738.619354] Call Trace:
> [204738.619366]  drbd_request_endio+0x5d/0x280 [drbd]


This is the same issue as was found and reported on 30 April 2018 by
Sarah Newman [PATCH] drbd: avoid use-after-free in drbd_request_endio
I was under the impression that she'd push for upstream inclusion of the fix.
Apparently not, so we'll have to followup with upstream ourselves.

It is broken since 4246a0b 2015-07 (during the v4.3 release cycle),
which changed:
        bio_put(req->private_bio);
-       req->private_bio = ERR_PTR(error);
+       req->private_bio = ERR_PTR(bio->bi_error);

which is an access after (potential) free,
because req->private_bio == bio (before the assignment).

That later changed to
 req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));

Which now "sometimes" catches the access-after-free
with its WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors));

In the DRBD driver upstream (our development happens out-of-tree),
we don't have this, but still use an on-stack "status" variable.
The effect of this (potential) access-after-free is invisible,
unless you run your kernel with "CONFIG_DEBUG_PAGEALLOC".
This is why this was never catched.

I think the correct fix would be:

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 1476cb3439f4..5e793dd7adfb 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -282,8 +282,8 @@ void drbd_request_endio(struct bio *bio)
 		what = COMPLETED_OK;
 	}
 
-	bio_put(req->private_bio);
 	req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));
+	bio_put(bio);
 
 	/* not req_mod(), we need irqsave here! */
 	spin_lock_irqsave(&device->resource->req_lock, flags);


The behaviour without that patch is effectively identical to the
behaviour with this patch, though sometimes in multiple failure
scenarios (both local disk failure AND replication / remote IO errors)
we might give back an "EIO" instead of a more specific error, if such
more specific error had been handed to us in the first place.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-06-11 14:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-11 11:02 [Drbd-dev] WARNING: CPU: 5 PID: 8321 at block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 Wolfgang Walter
2018-06-11 14:22 ` Lars Ellenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox