* [syzbot] BUG: sleeping function called from invalid context in __fdget_pos @ 2021-06-28 19:22 syzbot 2021-06-29 14:46 ` Dave Hansen 0 siblings, 1 reply; 6+ messages in thread From: syzbot @ 2021-06-28 19:22 UTC (permalink / raw) To: bp, dave.hansen, hpa, jpa, kan.liang, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86 Hello, syzbot found the following issue on: HEAD commit: 7426cedc Merge tag 'spi-fix-v5.13-rc7' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000 kernel config: https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+5d1bad8042a8f0e8117a@syzkaller.appspotmail.com BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0 no locks held by syz-executor.0/29652. Preemption disabled at: [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126 CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:8337 __mutex_lock_common kernel/locking/mutex.c:938 [inline] __mutex_lock+0xa9/0x10c0 kernel/locking/mutex.c:1104 __fdget_pos+0xe9/0x100 fs/file.c:974 fdget_pos include/linux/file.h:75 [inline] ksys_read+0x6e/0x250 fs/read_write.c:625 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x41935c Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f fd ff ff 48 RSP: 002b:00007f4701c5d170 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 000000000041935c RDX: 000000000000000f RSI: 00007f4701c5d1e0 RDI: 0000000000000005 RBP: 00007f4701c5d1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffc539a90af R14: 00007f4701c5d300 R15: 0000000000022000 BUG: scheduling while atomic: syz-executor.0/29652/0x00000002 no locks held by syz-executor.0/29652. Modules linked in: Preemption disabled at: [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos 2021-06-28 19:22 [syzbot] BUG: sleeping function called from invalid context in __fdget_pos syzbot @ 2021-06-29 14:46 ` Dave Hansen 2021-06-30 7:42 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Dave Hansen @ 2021-06-29 14:46 UTC (permalink / raw) To: syzbot, bp, hpa, jpa, kan.liang, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86, Ard Biesheuvel, Herbert Xu ... adding Ard who was recently modifying some of the kernel_fpu_begin/end() sites in the AESNI crypto code. On 6/28/21 12:22 PM, syzbot wrote: > console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000 > kernel config: https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c > dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a > > Unfortunately, I don't have any reproducer for this issue yet. ... > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0 > no locks held by syz-executor.0/29652. > Preemption disabled at: > [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126 > CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0 There's a better backtrace in the log before the rather useless backtrace from lockdep: > [ 1341.360547][T29635] FAULT_INJECTION: forcing a failure. > [ 1341.360547][T29635] name failslab, interval 1, probability 0, space 0, times 0 > [ 1341.374439][T29635] CPU: 1 PID: 29635 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0 > [ 1341.374712][T29630] FAT-fs (loop2): bogus number of reserved sectors > [ 1341.383571][T29635] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > [ 1341.383591][T29635] Call Trace: > [ 1341.383603][T29635] dump_stack+0x141/0x1d7 > [ 1341.383630][T29635] should_fail.cold+0x5/0xa > [ 1341.383651][T29635] ? skcipher_walk_next+0x6e2/0x1680 > [ 1341.383673][T29635] should_failslab+0x5/0x10 > [ 1341.383691][T29635] __kmalloc+0x72/0x330 > [ 1341.383720][T29635] skcipher_walk_next+0x6e2/0x1680 > [ 1341.383744][T29635] ? kfree+0xe5/0x7f0 > [ 1341.383776][T29635] skcipher_walk_first+0xf8/0x3c0 > [ 1341.383805][T29635] skcipher_walk_virt+0x523/0x760 > [ 1341.445438][T29635] xts_crypt+0x137/0x7f0 > [ 1341.449689][T29635] ? aesni_encrypt+0x80/0x80 There's one suspect-looking site in xts_crypt(): > kernel_fpu_begin(); > > /* calculate first value of T */ > aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv); > > while (walk.nbytes > 0) { > int nbytes = walk.nbytes; > > ... > > err = skcipher_walk_done(&walk, walk.nbytes - nbytes); > > kernel_fpu_end(); > > if (walk.nbytes > 0) > kernel_fpu_begin(); > } I wonder if a slab allocation failure could leave us with walk.nbytes==0. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos 2021-06-29 14:46 ` Dave Hansen @ 2021-06-30 7:42 ` Ard Biesheuvel 2021-06-30 8:09 ` Herbert Xu 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2021-06-30 7:42 UTC (permalink / raw) To: Dave Hansen Cc: syzbot, Borislav Petkov, H. Peter Anvin, jpa, kan.liang, Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML, Herbert Xu On Tue, 29 Jun 2021 at 16:46, Dave Hansen <dave.hansen@intel.com> wrote: > > ... adding Ard who was recently modifying some of the > kernel_fpu_begin/end() sites in the AESNI crypto code. > > On 6/28/21 12:22 PM, syzbot wrote: > > console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c > > dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a > > > > Unfortunately, I don't have any reproducer for this issue yet. > ... > > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0 > > no locks held by syz-executor.0/29652. > > Preemption disabled at: > > [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126 > > CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0 > > There's a better backtrace in the log before the rather useless > backtrace from lockdep: > > > [ 1341.360547][T29635] FAULT_INJECTION: forcing a failure. > > [ 1341.360547][T29635] name failslab, interval 1, probability 0, space 0, times 0 > > [ 1341.374439][T29635] CPU: 1 PID: 29635 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0 > > [ 1341.374712][T29630] FAT-fs (loop2): bogus number of reserved sectors > > [ 1341.383571][T29635] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > [ 1341.383591][T29635] Call Trace: > > [ 1341.383603][T29635] dump_stack+0x141/0x1d7 > > [ 1341.383630][T29635] should_fail.cold+0x5/0xa > > [ 1341.383651][T29635] ? skcipher_walk_next+0x6e2/0x1680 > > [ 1341.383673][T29635] should_failslab+0x5/0x10 > > [ 1341.383691][T29635] __kmalloc+0x72/0x330 > > [ 1341.383720][T29635] skcipher_walk_next+0x6e2/0x1680 > > [ 1341.383744][T29635] ? kfree+0xe5/0x7f0 > > [ 1341.383776][T29635] skcipher_walk_first+0xf8/0x3c0 > > [ 1341.383805][T29635] skcipher_walk_virt+0x523/0x760 > > [ 1341.445438][T29635] xts_crypt+0x137/0x7f0 > > [ 1341.449689][T29635] ? aesni_encrypt+0x80/0x80 > > There's one suspect-looking site in xts_crypt(): > > > kernel_fpu_begin(); > > > > /* calculate first value of T */ > > aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv); > > > > while (walk.nbytes > 0) { > > int nbytes = walk.nbytes; > > > > ... > > > > err = skcipher_walk_done(&walk, walk.nbytes - nbytes); > > > > kernel_fpu_end(); > > > > if (walk.nbytes > 0) > > kernel_fpu_begin(); > > } > > I wonder if a slab allocation failure could leave us with walk.nbytes==0. The code is actually the other way around: kernel_fpu_end() comes before the call to skcipher_walk_done(). So IIUC, this code forces an allocation failure, and checks whether the code deals with this gracefully, right? The skcipher walk API guarantees that walk.nbytes == 0 if an error is returned, so the pairing of FPU begin/end looks correct to me. And skcipher_walk_next() should not invoke anything that might sleep from this particular context. Herbert, any ideas? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos 2021-06-30 7:42 ` Ard Biesheuvel @ 2021-06-30 8:09 ` Herbert Xu 2021-06-30 9:13 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Herbert Xu @ 2021-06-30 8:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa, kan.liang, Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML Hi Ard: On Wed, Jun 30, 2021 at 09:42:14AM +0200, Ard Biesheuvel wrote: > > > There's one suspect-looking site in xts_crypt(): > > > > > kernel_fpu_begin(); > > > > > > /* calculate first value of T */ > > > aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv); > > > > > > while (walk.nbytes > 0) { > > > int nbytes = walk.nbytes; > > > > > > ... > > > > > > err = skcipher_walk_done(&walk, walk.nbytes - nbytes); > > > > > > kernel_fpu_end(); > > > > > > if (walk.nbytes > 0) > > > kernel_fpu_begin(); > > > } > > > > I wonder if a slab allocation failure could leave us with walk.nbytes==0. > > The code is actually the other way around: kernel_fpu_end() comes > before the call to skcipher_walk_done(). > > So IIUC, this code forces an allocation failure, and checks whether > the code deals with this gracefully, right? > > The skcipher walk API guarantees that walk.nbytes == 0 if an error is > returned, so the pairing of FPU begin/end looks correct to me. And > skcipher_walk_next() should not invoke anything that might sleep from > this particular context. > > Herbert, any ideas? xts_crypt looks buggy to me. In particular, if the second skcipher_walk_virt call (the one in the if clause) fails, then we will return without calling kernel_fpu_end. Another issue, we are not checking for errors on the first skcipher_walk_virt call, this may cause a double-free with the subsequent skcipher_walk_abort inside the if clause. With skcikpher_walk_virt, you must check for errors explicitly *unless* you use it in a loop construct which exits on !walk->nbytes. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos 2021-06-30 8:09 ` Herbert Xu @ 2021-06-30 9:13 ` Ard Biesheuvel 2021-06-30 12:04 ` Herbert Xu 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2021-06-30 9:13 UTC (permalink / raw) To: Herbert Xu Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa, kan.liang, Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML On Wed, 30 Jun 2021 at 10:10, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Hi Ard: > > On Wed, Jun 30, 2021 at 09:42:14AM +0200, Ard Biesheuvel wrote: > > > > > There's one suspect-looking site in xts_crypt(): > > > > > > > kernel_fpu_begin(); > > > > > > > > /* calculate first value of T */ > > > > aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv); > > > > > > > > while (walk.nbytes > 0) { > > > > int nbytes = walk.nbytes; > > > > > > > > ... > > > > > > > > err = skcipher_walk_done(&walk, walk.nbytes - nbytes); > > > > > > > > kernel_fpu_end(); > > > > > > > > if (walk.nbytes > 0) > > > > kernel_fpu_begin(); > > > > } > > > > > > I wonder if a slab allocation failure could leave us with walk.nbytes==0. > > > > The code is actually the other way around: kernel_fpu_end() comes > > before the call to skcipher_walk_done(). > > > > So IIUC, this code forces an allocation failure, and checks whether > > the code deals with this gracefully, right? > > > > The skcipher walk API guarantees that walk.nbytes == 0 if an error is > > returned, so the pairing of FPU begin/end looks correct to me. And > > skcipher_walk_next() should not invoke anything that might sleep from > > this particular context. > > > > Herbert, any ideas? > > xts_crypt looks buggy to me. In particular, if the second > skcipher_walk_virt call (the one in the if clause) fails, then > we will return without calling kernel_fpu_end. > > Another issue, we are not checking for errors on the first > skcipher_walk_virt call, this may cause a double-free with > the subsequent skcipher_walk_abort inside the if clause. > > With skcikpher_walk_virt, you must check for errors explicitly > *unless* you use it in a loop construct which exits on !walk->nbytes. > So something like this, I suppose? --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -849,6 +849,8 @@ return -EINVAL; err = skcipher_walk_virt(&walk, req, false); + if (err) + return err; if (unlikely(tail > 0 && walk.nbytes < walk.total)) { int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; @@ -862,7 +864,10 @@ skcipher_request_set_crypt(&subreq, req->src, req->dst, blocks * AES_BLOCK_SIZE, req->iv); req = &subreq; + err = skcipher_walk_virt(&walk, req, false); + if (err) + return err; } else { tail = 0; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos 2021-06-30 9:13 ` Ard Biesheuvel @ 2021-06-30 12:04 ` Herbert Xu 0 siblings, 0 replies; 6+ messages in thread From: Herbert Xu @ 2021-06-30 12:04 UTC (permalink / raw) To: Ard Biesheuvel Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa, kan.liang, Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML On Wed, Jun 30, 2021 at 11:13:00AM +0200, Ard Biesheuvel wrote: > > So something like this, I suppose? Yes this should work. Ideally I think it should only call the walker once instead of potentially three times but I can live with that :) Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-06-30 12:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-28 19:22 [syzbot] BUG: sleeping function called from invalid context in __fdget_pos syzbot 2021-06-29 14:46 ` Dave Hansen 2021-06-30 7:42 ` Ard Biesheuvel 2021-06-30 8:09 ` Herbert Xu 2021-06-30 9:13 ` Ard Biesheuvel 2021-06-30 12:04 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox