public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com,
	Luis Chamberlain <mcgrof@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Omar Sandoval <osandov@fb.com>, Hannes Reinecke <hare@suse.com>,
	Nicolai Stange <nstange@suse.de>,
	Michal Hocko <mhocko@kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	yu kuai <yukuai3@huawei.com>, Jens Axboe <axboe@kernel.dk>,
	Ben Hutchings <ben.hutchings@codethink.co.uk>
Subject: [PATCH 4.14 10/48] blktrace: fix debugfs use after free
Date: Mon,  9 Nov 2020 13:55:19 +0100	[thread overview]
Message-ID: <20201109125017.244391571@linuxfoundation.org> (raw)
In-Reply-To: <20201109125016.734107741@linuxfoundation.org>

From: Luis Chamberlain <mcgrof@kernel.org>

commit bad8e64fb19d3a0de5e564d9a7271c31bd684369 upstream.

On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
merged on v4.12 Omar fixed the original blktrace code for request-based
drivers (multiqueue). This however left in place a possible crash, if you
happen to abuse blktrace while racing to remove / add a device.

We used to use asynchronous removal of the request_queue, and with that
the issue was easier to reproduce. Now that we have reverted to
synchronous removal of the request_queue, the issue is still possible to
reproduce, its however just a bit more difficult.

We essentially run two instances of break-blktrace which add/remove
a loop device, and setup a blktrace and just never tear the blktrace
down. We do this twice in parallel. This is easily reproduced with the
script run_0004.sh from break-blktrace [0].

We can end up with two types of panics each reflecting where we
race, one a failed blktrace setup:

[  252.426751] debugfs: Directory 'loop0' with parent 'block' already present!
[  252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  252.436592] #PF: supervisor write access in kernel mode
[  252.439822] #PF: error_code(0x0002) - not-present page
[  252.442967] PGD 0 P4D 0
[  252.444656] Oops: 0002 [#1] SMP NOPTI
[  252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
[  252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  252.456343] RIP: 0010:down_write+0x15/0x40
[  252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
               cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
               00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89
               45 08 5d
[  252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246
[  252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000
[  252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001
[  252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000
[  252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0
[  252.473346] FS:  00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000
[  252.475225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0
[  252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  252.479866] Call Trace:
[  252.480322]  simple_recursive_removal+0x4e/0x2e0
[  252.481078]  ? debugfs_remove+0x60/0x60
[  252.481725]  ? relay_destroy_buf+0x77/0xb0
[  252.482662]  debugfs_remove+0x40/0x60
[  252.483518]  blk_remove_buf_file_callback+0x5/0x10
[  252.484328]  relay_close_buf+0x2e/0x60
[  252.484930]  relay_open+0x1ce/0x2c0
[  252.485520]  do_blk_trace_setup+0x14f/0x2b0
[  252.486187]  __blk_trace_setup+0x54/0xb0
[  252.486803]  blk_trace_ioctl+0x90/0x140
[  252.487423]  ? do_sys_openat2+0x1ab/0x2d0
[  252.488053]  blkdev_ioctl+0x4d/0x260
[  252.488636]  block_ioctl+0x39/0x40
[  252.489139]  ksys_ioctl+0x87/0xc0
[  252.489675]  __x64_sys_ioctl+0x16/0x20
[  252.490380]  do_syscall_64+0x52/0x180
[  252.491032]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

And the other on the device removal:

[  128.528940] debugfs: Directory 'loop0' with parent 'block' already present!
[  128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  128.619537] #PF: supervisor write access in kernel mode
[  128.622700] #PF: error_code(0x0002) - not-present page
[  128.625842] PGD 0 P4D 0
[  128.627585] Oops: 0002 [#1] SMP NOPTI
[  128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
[  128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  128.640471] RIP: 0010:down_write+0x15/0x40
[  128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
               cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
               00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89
               45 08 5d
[  128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246
[  128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000
[  128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0
[  128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000
[  128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0
[  128.660500] FS:  00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000
[  128.662204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0
[  128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  128.667282] Call Trace:
[  128.667801]  simple_recursive_removal+0x4e/0x2e0
[  128.668663]  ? debugfs_remove+0x60/0x60
[  128.669368]  debugfs_remove+0x40/0x60
[  128.669985]  blk_trace_free+0xd/0x50
[  128.670593]  __blk_trace_remove+0x27/0x40
[  128.671274]  blk_trace_shutdown+0x30/0x40
[  128.671935]  blk_release_queue+0x95/0xf0
[  128.672589]  kobject_put+0xa5/0x1b0
[  128.673188]  disk_release+0xa2/0xc0
[  128.673786]  device_release+0x28/0x80
[  128.674376]  kobject_put+0xa5/0x1b0
[  128.674915]  loop_remove+0x39/0x50 [loop]
[  128.675511]  loop_control_ioctl+0x113/0x130 [loop]
[  128.676199]  ksys_ioctl+0x87/0xc0
[  128.676708]  __x64_sys_ioctl+0x16/0x20
[  128.677274]  do_syscall_64+0x52/0x180
[  128.677823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The common theme here is:

debugfs: Directory 'loop0' with parent 'block' already present

This crash happens because of how blktrace uses the debugfs directory
where it places its files. Upon init we always create the same directory
which would be needed by blktrace but we only do this for make_request
drivers (multiqueue) block drivers. When you race a removal of these
devices with a blktrace setup you end up in a situation where the
make_request recursive debugfs removal will sweep away the blktrace
files and then later blktrace will also try to remove individual
dentries which are already NULL. The inverse is also possible and hence
the two types of use after frees.

We don't create the block debugfs directory on init for these types of
block devices:

  * request-based block driver block devices
  * every possible partition
  * scsi-generic

And so, this race should in theory only be possible with make_request
drivers.

We can fix the UAF by simply re-using the debugfs directory for
make_request drivers (multiqueue) and only creating the ephemeral
directory for the other type of block devices. The new clarifications
on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace
*prior* to processing a blktrace ensures the debugfs directories are
only created if no possible directory name clashes are possible.

This goes tested with:

  o nvme partitions
  o ISCSI with tgt, and blktracing against scsi-generic with:
    o block
    o tape
    o cdrom
    o media changer
  o blktests

This patch is part of the work which disputes the severity of
CVE-2019-19770 which shows this issue is not a core debugfs issue, but
a misuse of debugfs within blktace.

Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 4.14: open-code queue_is_mq()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/trace/blktrace.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -533,10 +533,18 @@ static int do_blk_trace_setup(struct req
 	if (!bt->msg_data)
 		goto err;
 
-	ret = -ENOENT;
-
-	dir = debugfs_lookup(buts->name, blk_debugfs_root);
-	if (!dir)
+#ifdef CONFIG_BLK_DEBUG_FS
+	/*
+	 * When tracing whole make_request drivers (multiqueue) block devices,
+	 * reuse the existing debugfs directory created by the block layer on
+	 * init. For request-based block devices, all partitions block devices,
+	 * and scsi-generic block devices we create a temporary new debugfs
+	 * directory that will be removed once the trace ends.
+	 */
+	if (q->mq_ops && bdev && bdev == bdev->bd_contains)
+		dir = q->debugfs_dir;
+	else
+#endif
 		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
 	if (!dir)
 		goto err;
@@ -595,8 +603,6 @@ static int do_blk_trace_setup(struct req
 
 	ret = 0;
 err:
-	if (dir && !bt->dir)
-		dput(dir);
 	if (ret)
 		blk_trace_free(bt);
 	return ret;



  parent reply	other threads:[~2020-11-09 13:34 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-09 12:55 [PATCH 4.14 00/48] 4.14.205-rc1 review Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 01/48] drm/i915: Break up error capture compression loops with cond_resched() Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 02/48] xen/events: dont use chip_data for legacy IRQs Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 03/48] tipc: fix use-after-free in tipc_bcast_get_mode Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 04/48] gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 05/48] gianfar: Account for Tx PTP timestamp in the skb headroom Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 06/48] net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 07/48] sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 08/48] sfp: Fix error handing in sfp_probe() Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 09/48] Blktrace: bail out early if block debugfs is not configured Greg Kroah-Hartman
2020-11-09 12:55 ` Greg Kroah-Hartman [this message]
2020-11-09 12:55 ` [PATCH 4.14 11/48] i40e: Fix a potential NULL pointer dereference Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 12/48] i40e: add num_vectors checker in iwarp handler Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 13/48] i40e: Wrong truncation from u16 to u8 Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 14/48] i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 15/48] i40e: Memory leak in i40e_config_iwarp_qvlist Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 16/48] Fonts: Replace discarded const qualifier Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 17/48] ALSA: usb-audio: Add implicit feedback quirk for Qu-16 Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 18/48] lib/crc32test: remove extra local_irq_disable/enable Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 19/48] kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 20/48] mm: always have io_remap_pfn_range() set pgprot_decrypted() Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 21/48] gfs2: Wake up when sd_glock_disposal becomes zero Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 22/48] ftrace: Fix recursion check for NMI test Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 23/48] ftrace: Handle tracing when switching between context Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 24/48] tracing: Fix out of bounds write in get_trace_buf Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 25/48] futex: Handle transient "ownerless" rtmutex state correctly Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 26/48] ARM: dts: sun4i-a10: fix cpu_alert temperature Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 27/48] x86/kexec: Use up-to-dated screen_info copy to fill boot params Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 28/48] of: Fix reserved-memory overlap detection Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 29/48] blk-cgroup: Fix memleak on error path Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 30/48] blk-cgroup: Pre-allocate tree node on blkg_conf_prep Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 31/48] scsi: core: Dont start concurrent async scan on same host Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 32/48] vsock: use ns_capable_noaudit() on socket create Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 33/48] drm/vc4: drv: Add error handding for bind Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 34/48] ACPI: NFIT: Fix comparison to -ENXIO Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 35/48] vt: Disable KD_FONT_OP_COPY Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 36/48] fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 37/48] serial: 8250_mtk: Fix uart_get_baud_rate warning Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 38/48] serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 39/48] USB: serial: cyberjack: fix write-URB completion race Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 40/48] USB: serial: option: add Quectel EC200T module support Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 41/48] USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 42/48] USB: serial: option: add Telit FN980 composition 0x1055 Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 43/48] USB: Add NO_LPM quirk for Kingston flash drive Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 44/48] usb: mtu3: fix panic in mtu3_gadget_stop() Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 45/48] ARC: stack unwinding: avoid indefinite looping Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 46/48] Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE" Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 47/48] PM: runtime: Resume the device earlier in __device_release_driver() Greg Kroah-Hartman
2020-11-09 12:55 ` [PATCH 4.14 48/48] arm64: dts: marvell: espressobin: add ethernet alias Greg Kroah-Hartman
2020-11-09 23:04 ` [PATCH 4.14 00/48] 4.14.205-rc1 review Guenter Roeck
2020-11-09 23:24 ` Shuah Khan
2020-11-09 23:26 ` Shuah Khan
2020-11-10 10:04 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201109125017.244391571@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=ben.hutchings@codethink.co.uk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=nstange@suse.de \
    --cc=osandov@fb.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox