* BUG: iomap_dio_rw() accesses freed memory @ 2019-01-10 10:17 Chandan Rajendra 2019-01-10 14:25 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Chandan Rajendra @ 2019-01-10 10:17 UTC (permalink / raw) To: linux-xfs; +Cc: hch, darrick.wong The following call trace is observed on next-20190110, when generic/323 test is executed on a 4k block sized XFS filesystem on ppc64le machine, [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3 [ T7710] Faulting instruction address: 0xc00000000090372c [ T7710] Oops: Kernel access of bad area, sig: 11 [#12] [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries [ T7710] Modules linked in: [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36 [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0 [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110) [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000 [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0 [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0 [ T7710] Call Trace: [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0 [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250 [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160 [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0 [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0 [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380 [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70 [ T7710] Instruction dump: [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 [ T7710] ---[ end trace b14f7219ccf85a08 ]--- [ T7710] The is happening due to the following sequence of events, 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1. 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio. 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero, !atomic_dec_and_test(&dio->ref) would evaluate to true. 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io() decrements dio->ref to the value of 0. This would inturn call iomap_dio_complete() which frees the iomap_dio structure. 5. iomap_dio_rw() can now deference a freed structure via either "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue, dio->submit.cookie, true)" -- chandan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra @ 2019-01-10 14:25 ` Christoph Hellwig 2019-01-10 16:49 ` Chandan Rajendra 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2019-01-10 14:25 UTC (permalink / raw) To: Chandan Rajendra; +Cc: linux-xfs, hch, darrick.wong On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote: > The following call trace is observed on next-20190110, when generic/323 test > is executed on a 4k block sized XFS filesystem on ppc64le machine, > > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3 > [ T7710] Faulting instruction address: 0xc00000000090372c > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12] > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries > [ T7710] Modules linked in: > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36 > [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0 > [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110) > [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000 > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0 > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0 > [ T7710] Call Trace: > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0 > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250 > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160 > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0 > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0 > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380 > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70 > [ T7710] Instruction dump: > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 > [ T7710] ---[ end trace b14f7219ccf85a08 ]--- > [ T7710] > > > The is happening due to the following sequence of events, > > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1. > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio. > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero, > !atomic_dec_and_test(&dio->ref) would evaluate to true. > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io() > decrements dio->ref to the value of 0. This would inturn call > iomap_dio_complete() which frees the iomap_dio structure. If iomap_dio_bio_end_io completes the dio this should be an AIO request. > 5. iomap_dio_rw() can now deference a freed structure via either > "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue, > dio->submit.cookie, true)" But then again this code is only executed for the wait_for_completion case. So I wonder how you manage to hit this. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-10 14:25 ` Christoph Hellwig @ 2019-01-10 16:49 ` Chandan Rajendra 2019-01-10 17:09 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Chandan Rajendra @ 2019-01-10 16:49 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs, darrick.wong On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote: > On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote: > > The following call trace is observed on next-20190110, when generic/323 test > > is executed on a 4k block sized XFS filesystem on ppc64le machine, > > > > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3 > > [ T7710] Faulting instruction address: 0xc00000000090372c > > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12] > > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries > > [ T7710] Modules linked in: > > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36 > > [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0 > > [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110) > > [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000 > > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 > > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b > > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 > > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 > > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 > > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 > > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 > > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 > > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 > > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0 > > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0 > > [ T7710] Call Trace: > > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0 > > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250 > > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160 > > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0 > > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0 > > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380 > > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70 > > [ T7710] Instruction dump: > > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 > > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 > > [ T7710] ---[ end trace b14f7219ccf85a08 ]--- > > [ T7710] > > > > > > The is happening due to the following sequence of events, > > > > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1. > > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio. > > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero, > > !atomic_dec_and_test(&dio->ref) would evaluate to true. > > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io() > > decrements dio->ref to the value of 0. This would inturn call > > iomap_dio_complete() which frees the iomap_dio structure. > > If iomap_dio_bio_end_io completes the dio this should be an AIO request. > > > 5. iomap_dio_rw() can now deference a freed structure via either > > "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue, > > dio->submit.cookie, true)" > > But then again this code is only executed for the wait_for_completion > case. So I wonder how you manage to hit this. A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in iomap_dio_rw() gave, iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107 This is most probably because the freed dio's memory is already being written to by its owning code. -- chandan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-10 16:49 ` Chandan Rajendra @ 2019-01-10 17:09 ` Darrick J. Wong 2019-01-15 18:27 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2019-01-10 17:09 UTC (permalink / raw) To: Chandan Rajendra; +Cc: Christoph Hellwig, linux-xfs On Thu, Jan 10, 2019 at 10:19:26PM +0530, Chandan Rajendra wrote: > On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote: > > On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote: > > > The following call trace is observed on next-20190110, when generic/323 test > > > is executed on a 4k block sized XFS filesystem on ppc64le machine, > > > > > > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3 > > > [ T7710] Faulting instruction address: 0xc00000000090372c > > > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12] > > > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries > > > [ T7710] Modules linked in: > > > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36 > > > [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0 > > > [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110) > > > [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000 > > > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 > > > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b > > > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 > > > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 > > > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 > > > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 > > > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 > > > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 > > > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 > > > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0 > > > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0 > > > [ T7710] Call Trace: > > > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0 > > > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250 > > > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160 > > > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0 > > > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0 > > > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380 > > > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70 > > > [ T7710] Instruction dump: > > > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 > > > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 > > > [ T7710] ---[ end trace b14f7219ccf85a08 ]--- > > > [ T7710] > > > > > > > > > The is happening due to the following sequence of events, > > > > > > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1. > > > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio. > > > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero, > > > !atomic_dec_and_test(&dio->ref) would evaluate to true. > > > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io() > > > decrements dio->ref to the value of 0. This would inturn call > > > iomap_dio_complete() which frees the iomap_dio structure. > > > > If iomap_dio_bio_end_io completes the dio this should be an AIO request. > > > > > 5. iomap_dio_rw() can now deference a freed structure via either > > > "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue, > > > dio->submit.cookie, true)" > > > > But then again this code is only executed for the wait_for_completion > > case. So I wonder how you manage to hit this. > > A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in > iomap_dio_rw() gave, > > iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107 > > This is most probably because the freed dio's memory is already being written > to by its owning code. generic/323 on a very fast scsi device reproduces (and hangs) easily. The UAF happens (on my machine, anyway) because the bio endio function gets called before the submitting process even reaches blk_finish_plug. This is the exact same issue being discussed in "Broken dio refcounting leads to livelock?" I think Dave Chinner was working on a less gross fix than the one we'd come up with in that other thread. --D > -- > chandan > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-10 17:09 ` Darrick J. Wong @ 2019-01-15 18:27 ` Christoph Hellwig 2019-01-15 20:51 ` Dave Chinner 2019-01-16 5:59 ` Chandan Rajendra 0 siblings, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2019-01-15 18:27 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Chandan Rajendra, Dave Chinner, linux-xfs So after review that thread and this thread I still don't really see a refcounting problem, just a use after free of the wait_for_completion member for fast aio completions. Chandan, can you give this patch a spin? diff --git a/fs/iomap.c b/fs/iomap.c index cb184ff68680..47362397cb82 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, loff_t pos = iocb->ki_pos, start = pos; loff_t end = iocb->ki_pos + count - 1, ret = 0; unsigned int flags = IOMAP_DIRECT; + bool wait_for_completion = is_sync_kiocb(iocb); struct blk_plug plug; struct iomap_dio *dio; @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->end_io = end_io; dio->error = 0; dio->flags = 0; - dio->wait_for_completion = is_sync_kiocb(iocb); dio->submit.iter = iter; dio->submit.waiter = current; @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio_warn_stale_pagecache(iocb->ki_filp); ret = 0; - if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion && + if (iov_iter_rw(iter) == WRITE && !wait_for_completion && !inode->i_sb->s_dio_done_wq) { ret = sb_init_dio_done_wq(inode->i_sb); if (ret < 0) @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret <= 0) { /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { - dio->wait_for_completion = true; + wait_for_completion = true; ret = 0; } break; @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_FUA) dio->flags &= ~IOMAP_DIO_NEED_SYNC; + /* + * We are about to drop our additional submission reference, which + * might be the last reference to the dio. There are three three + * different ways we can progress here: + * + * (a) If this is the last reference we will always complete and free + * the dio ourselves. right here. + * (b) If this is not the last reference, and we serve an asynchronous + * iocb, we must never touch the dio after the decrement, the + * I/O completion handler will complete and free it. + * (c) If this is not the last reference, but we serve a synchronous + * iocb, the I/O completion handler will wake us up on the drop + * of the final reference, and we will complete and free it here + * after we got woken by the I/O completion handler. + */ + dio->wait_for_completion = wait_for_completion; if (!atomic_dec_and_test(&dio->ref)) { - if (!dio->wait_for_completion) + if (!wait_for_completion) return -EIOCBQUEUED; for (;;) { @@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, __set_current_state(TASK_RUNNING); } - ret = iomap_dio_complete(dio); - - return ret; + return iomap_dio_complete(dio); out_free_dio: kfree(dio); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-15 18:27 ` Christoph Hellwig @ 2019-01-15 20:51 ` Dave Chinner 2019-01-15 21:09 ` Christoph Hellwig 2019-01-16 5:59 ` Chandan Rajendra 1 sibling, 1 reply; 10+ messages in thread From: Dave Chinner @ 2019-01-15 20:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs On Tue, Jan 15, 2019 at 07:27:45PM +0100, Christoph Hellwig wrote: > So after review that thread and this thread I still don't really > see a refcounting problem, just a use after free of the > wait_for_completion member for fast aio completions. > > Chandan, can you give this patch a spin? > > diff --git a/fs/iomap.c b/fs/iomap.c > index cb184ff68680..47362397cb82 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > loff_t pos = iocb->ki_pos, start = pos; > loff_t end = iocb->ki_pos + count - 1, ret = 0; > unsigned int flags = IOMAP_DIRECT; > + bool wait_for_completion = is_sync_kiocb(iocb); > struct blk_plug plug; > struct iomap_dio *dio; > > @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio->end_io = end_io; > dio->error = 0; > dio->flags = 0; > - dio->wait_for_completion = is_sync_kiocb(iocb); > > dio->submit.iter = iter; > dio->submit.waiter = current; > @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio_warn_stale_pagecache(iocb->ki_filp); > ret = 0; > > - if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion && > + if (iov_iter_rw(iter) == WRITE && !wait_for_completion && > !inode->i_sb->s_dio_done_wq) { > ret = sb_init_dio_done_wq(inode->i_sb); > if (ret < 0) > @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (ret <= 0) { > /* magic error code to fall back to buffered I/O */ > if (ret == -ENOTBLK) { > - dio->wait_for_completion = true; > + wait_for_completion = true; > ret = 0; > } > break; > @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (dio->flags & IOMAP_DIO_WRITE_FUA) > dio->flags &= ~IOMAP_DIO_NEED_SYNC; > > + /* > + * We are about to drop our additional submission reference, which > + * might be the last reference to the dio. There are three three > + * different ways we can progress here: > + * > + * (a) If this is the last reference we will always complete and free > + * the dio ourselves. right here. > + * (b) If this is not the last reference, and we serve an asynchronous > + * iocb, we must never touch the dio after the decrement, the > + * I/O completion handler will complete and free it. > + * (c) If this is not the last reference, but we serve a synchronous > + * iocb, the I/O completion handler will wake us up on the drop > + * of the final reference, and we will complete and free it here > + * after we got woken by the I/O completion handler. > + */ > + dio->wait_for_completion = wait_for_completion; > if (!atomic_dec_and_test(&dio->ref)) { Atomic operations don't imply a memory barrier for dependent data, right? So in completion we do: if (atomic_dec_and_test(&dio->ref)) { if (dio->wait_for_completion) { /* do wakeup */ } else .... So if we get: CPU 0 (submission) CPU 1 (completion) set dio->wfc dec dio->ref, val = 1 dec dio->ref, val = 0 enter wait loop check dio->wfc Where's the memory barrier to ensure that CPU 1 sees the value that CPU 0 set in dio->wait_for_completion? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-15 20:51 ` Dave Chinner @ 2019-01-15 21:09 ` Christoph Hellwig 2019-01-15 23:18 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2019-01-15 21:09 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Darrick J. Wong, Chandan Rajendra, linux-xfs On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote: > Atomic operations don't imply a memory barrier for dependent data, > right? Documentation/atomic_t.txt says: -------------------------- snip -------------------------- The rule of thumb: - non-RMW operations are unordered; - RMW operations that have no return value are unordered; - RMW operations that have a return value are fully ordered; [...] Fully ordered primitives are ordered against everything prior and everything subsequent. Therefore a fully ordered primitive is like having an smp_mb() before and an smp_mb() after the primitive. -------------------------- snip -------------------------- I think atomic_dec_and_test clearly falls into the third category, and I can't see how much of the kernel could work if that wasn't the case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-15 21:09 ` Christoph Hellwig @ 2019-01-15 23:18 ` Dave Chinner 2019-01-15 23:24 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2019-01-15 23:18 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote: > On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote: > > Atomic operations don't imply a memory barrier for dependent data, > > right? > > Documentation/atomic_t.txt says: > > -------------------------- snip -------------------------- > The rule of thumb: > > - non-RMW operations are unordered; > > - RMW operations that have no return value are unordered; > > - RMW operations that have a return value are fully ordered; > > [...] > > Fully ordered primitives are ordered against everything prior and everything > subsequent. Therefore a fully ordered primitive is like having an smp_mb() > before and an smp_mb() after the primitive. I guess I haven't looked at the documentation for a while. Or the implementation for that matter. /me goes off and looks. Oh, they are now implemented with built in, explicit smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok, so the necessary barriers are there, my brain was telling me they still needed to be added manually and needed updating..... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-15 23:18 ` Dave Chinner @ 2019-01-15 23:24 ` Darrick J. Wong 0 siblings, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2019-01-15 23:24 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, Chandan Rajendra, linux-xfs On Wed, Jan 16, 2019 at 10:18:51AM +1100, Dave Chinner wrote: > On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote: > > On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote: > > > Atomic operations don't imply a memory barrier for dependent data, > > > right? > > > > Documentation/atomic_t.txt says: > > > > -------------------------- snip -------------------------- > > The rule of thumb: > > > > - non-RMW operations are unordered; > > > > - RMW operations that have no return value are unordered; > > > > - RMW operations that have a return value are fully ordered; > > > > [...] > > > > Fully ordered primitives are ordered against everything prior and everything > > subsequent. Therefore a fully ordered primitive is like having an smp_mb() > > before and an smp_mb() after the primitive. > > I guess I haven't looked at the documentation for a while. Or the > implementation for that matter. > > /me goes off and looks. > > Oh, they are now implemented with built in, explicit > smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok, > so the necessary barriers are there, my brain was telling me they > still needed to be added manually and needed updating..... FWIW this fixes the machine that was cranking out generic/323 hangs, having run it repeatedly in a loop for the ~4 hours in between falling down the stairs this morning and finally getting back from all the errands I had to do today. Ready for this crap year to be over with, which means this ought to have someone other than me look over it: Tested-by: Darrick J. Wong <darrick.wong@oracle.com> --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory 2019-01-15 18:27 ` Christoph Hellwig 2019-01-15 20:51 ` Dave Chinner @ 2019-01-16 5:59 ` Chandan Rajendra 1 sibling, 0 replies; 10+ messages in thread From: Chandan Rajendra @ 2019-01-16 5:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs On Tuesday, January 15, 2019 11:57:45 PM IST Christoph Hellwig wrote: > So after review that thread and this thread I still don't really > see a refcounting problem, just a use after free of the > wait_for_completion member for fast aio completions. > > Chandan, can you give this patch a spin? > This fixes the bug, Tested-by: Chandan Rajendra <chandan@linux.ibm.com> > diff --git a/fs/iomap.c b/fs/iomap.c > index cb184ff68680..47362397cb82 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > loff_t pos = iocb->ki_pos, start = pos; > loff_t end = iocb->ki_pos + count - 1, ret = 0; > unsigned int flags = IOMAP_DIRECT; > + bool wait_for_completion = is_sync_kiocb(iocb); > struct blk_plug plug; > struct iomap_dio *dio; > > @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio->end_io = end_io; > dio->error = 0; > dio->flags = 0; > - dio->wait_for_completion = is_sync_kiocb(iocb); > > dio->submit.iter = iter; > dio->submit.waiter = current; > @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio_warn_stale_pagecache(iocb->ki_filp); > ret = 0; > > - if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion && > + if (iov_iter_rw(iter) == WRITE && !wait_for_completion && > !inode->i_sb->s_dio_done_wq) { > ret = sb_init_dio_done_wq(inode->i_sb); > if (ret < 0) > @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (ret <= 0) { > /* magic error code to fall back to buffered I/O */ > if (ret == -ENOTBLK) { > - dio->wait_for_completion = true; > + wait_for_completion = true; > ret = 0; > } > break; > @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (dio->flags & IOMAP_DIO_WRITE_FUA) > dio->flags &= ~IOMAP_DIO_NEED_SYNC; > > + /* > + * We are about to drop our additional submission reference, which > + * might be the last reference to the dio. There are three three > + * different ways we can progress here: > + * > + * (a) If this is the last reference we will always complete and free > + * the dio ourselves. right here. > + * (b) If this is not the last reference, and we serve an asynchronous > + * iocb, we must never touch the dio after the decrement, the > + * I/O completion handler will complete and free it. > + * (c) If this is not the last reference, but we serve a synchronous > + * iocb, the I/O completion handler will wake us up on the drop > + * of the final reference, and we will complete and free it here > + * after we got woken by the I/O completion handler. > + */ > + dio->wait_for_completion = wait_for_completion; > if (!atomic_dec_and_test(&dio->ref)) { > - if (!dio->wait_for_completion) > + if (!wait_for_completion) > return -EIOCBQUEUED; > > for (;;) { > @@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > __set_current_state(TASK_RUNNING); > } > > - ret = iomap_dio_complete(dio); > - > - return ret; > + return iomap_dio_complete(dio); > > out_free_dio: > kfree(dio); > > -- chandan ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-01-16 5:59 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra 2019-01-10 14:25 ` Christoph Hellwig 2019-01-10 16:49 ` Chandan Rajendra 2019-01-10 17:09 ` Darrick J. Wong 2019-01-15 18:27 ` Christoph Hellwig 2019-01-15 20:51 ` Dave Chinner 2019-01-15 21:09 ` Christoph Hellwig 2019-01-15 23:18 ` Dave Chinner 2019-01-15 23:24 ` Darrick J. Wong 2019-01-16 5:59 ` Chandan Rajendra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).