[syzbot] BUG: sleeping function called from invalid context in __fdget

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
@ 2021-06-28 19:22 syzbot
  2021-06-29 14:46 ` Dave Hansen
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2021-06-28 19:22 UTC (permalink / raw)
  To: bp, dave.hansen, hpa, jpa, kan.liang, linux-kernel, luto, mingo,
	syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    7426cedc Merge tag 'spi-fix-v5.13-rc7' of git://git.kernel..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c
dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+5d1bad8042a8f0e8117a@syzkaller.appspotmail.com

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0
no locks held by syz-executor.0/29652.
Preemption disabled at:
[<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126
CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:8337
 __mutex_lock_common kernel/locking/mutex.c:938 [inline]
 __mutex_lock+0xa9/0x10c0 kernel/locking/mutex.c:1104
 __fdget_pos+0xe9/0x100 fs/file.c:974
 fdget_pos include/linux/file.h:75 [inline]
 ksys_read+0x6e/0x250 fs/read_write.c:625
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x41935c
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f fd ff ff 48
RSP: 002b:00007f4701c5d170 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 000000000041935c
RDX: 000000000000000f RSI: 00007f4701c5d1e0 RDI: 0000000000000005
RBP: 00007f4701c5d1d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc539a90af R14: 00007f4701c5d300 R15: 0000000000022000
BUG: scheduling while atomic: syz-executor.0/29652/0x00000002
no locks held by syz-executor.0/29652.
Modules linked in:
Preemption disabled at:
[<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
  2021-06-28 19:22 [syzbot] BUG: sleeping function called from invalid context in __fdget_pos syzbot
@ 2021-06-29 14:46 ` Dave Hansen
  2021-06-30  7:42   ` Ard Biesheuvel
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2021-06-29 14:46 UTC (permalink / raw)
  To: syzbot, bp, hpa, jpa, kan.liang, linux-kernel, luto, mingo,
	syzkaller-bugs, tglx, x86, Ard Biesheuvel, Herbert Xu

... adding Ard who was recently modifying some of the
kernel_fpu_begin/end() sites in the AESNI crypto code.

On 6/28/21 12:22 PM, syzbot wrote:
> console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c
> dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a
> 
> Unfortunately, I don't have any reproducer for this issue yet.
...
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0
> no locks held by syz-executor.0/29652.
> Preemption disabled at:
> [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126
> CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0

There's a better backtrace in the log before the rather useless
backtrace from lockdep:

> [ 1341.360547][T29635] FAULT_INJECTION: forcing a failure.
> [ 1341.360547][T29635] name failslab, interval 1, probability 0, space 0, times 0
> [ 1341.374439][T29635] CPU: 1 PID: 29635 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0
> [ 1341.374712][T29630] FAT-fs (loop2): bogus number of reserved sectors
> [ 1341.383571][T29635] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 1341.383591][T29635] Call Trace:
> [ 1341.383603][T29635]  dump_stack+0x141/0x1d7
> [ 1341.383630][T29635]  should_fail.cold+0x5/0xa
> [ 1341.383651][T29635]  ? skcipher_walk_next+0x6e2/0x1680
> [ 1341.383673][T29635]  should_failslab+0x5/0x10
> [ 1341.383691][T29635]  __kmalloc+0x72/0x330
> [ 1341.383720][T29635]  skcipher_walk_next+0x6e2/0x1680
> [ 1341.383744][T29635]  ? kfree+0xe5/0x7f0
> [ 1341.383776][T29635]  skcipher_walk_first+0xf8/0x3c0
> [ 1341.383805][T29635]  skcipher_walk_virt+0x523/0x760
> [ 1341.445438][T29635]  xts_crypt+0x137/0x7f0
> [ 1341.449689][T29635]  ? aesni_encrypt+0x80/0x80

There's one suspect-looking site in xts_crypt():

>	kernel_fpu_begin();
> 
>	/* calculate first value of T */
>	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
> 
>	while (walk.nbytes > 0) {
>		int nbytes = walk.nbytes;
> 	
> 		...
> 
>		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> 
>		kernel_fpu_end();
> 
>               if (walk.nbytes > 0)
>			kernel_fpu_begin();
>	}

I wonder if a slab allocation failure could leave us with walk.nbytes==0.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
  2021-06-29 14:46 ` Dave Hansen
@ 2021-06-30  7:42   ` Ard Biesheuvel
  2021-06-30  8:09     ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2021-06-30  7:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: syzbot, Borislav Petkov, H. Peter Anvin, jpa, kan.liang,
	Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, X86 ML, Herbert Xu

On Tue, 29 Jun 2021 at 16:46, Dave Hansen <dave.hansen@intel.com> wrote:
>
> ... adding Ard who was recently modifying some of the
> kernel_fpu_begin/end() sites in the AESNI crypto code.
>
> On 6/28/21 12:22 PM, syzbot wrote:
> > console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> ...
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0
> > no locks held by syz-executor.0/29652.
> > Preemption disabled at:
> > [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126
> > CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0
>
> There's a better backtrace in the log before the rather useless
> backtrace from lockdep:
>
> > [ 1341.360547][T29635] FAULT_INJECTION: forcing a failure.
> > [ 1341.360547][T29635] name failslab, interval 1, probability 0, space 0, times 0
> > [ 1341.374439][T29635] CPU: 1 PID: 29635 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0
> > [ 1341.374712][T29630] FAT-fs (loop2): bogus number of reserved sectors
> > [ 1341.383571][T29635] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > [ 1341.383591][T29635] Call Trace:
> > [ 1341.383603][T29635]  dump_stack+0x141/0x1d7
> > [ 1341.383630][T29635]  should_fail.cold+0x5/0xa
> > [ 1341.383651][T29635]  ? skcipher_walk_next+0x6e2/0x1680
> > [ 1341.383673][T29635]  should_failslab+0x5/0x10
> > [ 1341.383691][T29635]  __kmalloc+0x72/0x330
> > [ 1341.383720][T29635]  skcipher_walk_next+0x6e2/0x1680
> > [ 1341.383744][T29635]  ? kfree+0xe5/0x7f0
> > [ 1341.383776][T29635]  skcipher_walk_first+0xf8/0x3c0
> > [ 1341.383805][T29635]  skcipher_walk_virt+0x523/0x760
> > [ 1341.445438][T29635]  xts_crypt+0x137/0x7f0
> > [ 1341.449689][T29635]  ? aesni_encrypt+0x80/0x80
>
> There's one suspect-looking site in xts_crypt():
>
> >       kernel_fpu_begin();
> >
> >       /* calculate first value of T */
> >       aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
> >
> >       while (walk.nbytes > 0) {
> >               int nbytes = walk.nbytes;
> >
> >               ...
> >
> >               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> >
> >               kernel_fpu_end();
> >
> >               if (walk.nbytes > 0)
> >                       kernel_fpu_begin();
> >       }
>
> I wonder if a slab allocation failure could leave us with walk.nbytes==0.

The code is actually the other way around: kernel_fpu_end() comes
before the call to skcipher_walk_done().

So IIUC, this code forces an allocation failure, and checks whether
the code deals with this gracefully, right?

The skcipher walk API guarantees that walk.nbytes == 0 if an error is
returned, so the pairing of FPU begin/end looks correct to me. And
skcipher_walk_next() should not invoke anything that might sleep from
this particular context.

Herbert, any ideas?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
  2021-06-30  7:42   ` Ard Biesheuvel
@ 2021-06-30  8:09     ` Herbert Xu
  2021-06-30  9:13       ` Ard Biesheuvel
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2021-06-30  8:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa,
	kan.liang, Linux Kernel Mailing List, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML

Hi Ard:

On Wed, Jun 30, 2021 at 09:42:14AM +0200, Ard Biesheuvel wrote:
>
> > There's one suspect-looking site in xts_crypt():
> >
> > >       kernel_fpu_begin();
> > >
> > >       /* calculate first value of T */
> > >       aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
> > >
> > >       while (walk.nbytes > 0) {
> > >               int nbytes = walk.nbytes;
> > >
> > >               ...
> > >
> > >               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> > >
> > >               kernel_fpu_end();
> > >
> > >               if (walk.nbytes > 0)
> > >                       kernel_fpu_begin();
> > >       }
> >
> > I wonder if a slab allocation failure could leave us with walk.nbytes==0.
> 
> The code is actually the other way around: kernel_fpu_end() comes
> before the call to skcipher_walk_done().
> 
> So IIUC, this code forces an allocation failure, and checks whether
> the code deals with this gracefully, right?
> 
> The skcipher walk API guarantees that walk.nbytes == 0 if an error is
> returned, so the pairing of FPU begin/end looks correct to me. And
> skcipher_walk_next() should not invoke anything that might sleep from
> this particular context.
> 
> Herbert, any ideas?

xts_crypt looks buggy to me.  In particular, if the second
skcipher_walk_virt call (the one in the if clause) fails, then
we will return without calling kernel_fpu_end.

Another issue, we are not checking for errors on the first
skcipher_walk_virt call, this may cause a double-free with
the subsequent skcipher_walk_abort inside the if clause.

With skcikpher_walk_virt, you must check for errors explicitly
*unless* you use it in a loop construct which exits on !walk->nbytes.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
  2021-06-30  8:09     ` Herbert Xu
@ 2021-06-30  9:13       ` Ard Biesheuvel
  2021-06-30 12:04         ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2021-06-30  9:13 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa,
	kan.liang, Linux Kernel Mailing List, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML

On Wed, 30 Jun 2021 at 10:10, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Hi Ard:
>
> On Wed, Jun 30, 2021 at 09:42:14AM +0200, Ard Biesheuvel wrote:
> >
> > > There's one suspect-looking site in xts_crypt():
> > >
> > > >       kernel_fpu_begin();
> > > >
> > > >       /* calculate first value of T */
> > > >       aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
> > > >
> > > >       while (walk.nbytes > 0) {
> > > >               int nbytes = walk.nbytes;
> > > >
> > > >               ...
> > > >
> > > >               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> > > >
> > > >               kernel_fpu_end();
> > > >
> > > >               if (walk.nbytes > 0)
> > > >                       kernel_fpu_begin();
> > > >       }
> > >
> > > I wonder if a slab allocation failure could leave us with walk.nbytes==0.
> >
> > The code is actually the other way around: kernel_fpu_end() comes
> > before the call to skcipher_walk_done().
> >
> > So IIUC, this code forces an allocation failure, and checks whether
> > the code deals with this gracefully, right?
> >
> > The skcipher walk API guarantees that walk.nbytes == 0 if an error is
> > returned, so the pairing of FPU begin/end looks correct to me. And
> > skcipher_walk_next() should not invoke anything that might sleep from
> > this particular context.
> >
> > Herbert, any ideas?
>
> xts_crypt looks buggy to me.  In particular, if the second
> skcipher_walk_virt call (the one in the if clause) fails, then
> we will return without calling kernel_fpu_end.
>
> Another issue, we are not checking for errors on the first
> skcipher_walk_virt call, this may cause a double-free with
> the subsequent skcipher_walk_abort inside the if clause.
>
> With skcikpher_walk_virt, you must check for errors explicitly
> *unless* you use it in a loop construct which exits on !walk->nbytes.
>

So something like this, I suppose?

--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -849,6 +849,8 @@
                return -EINVAL;

        err = skcipher_walk_virt(&walk, req, false);
+       if (err)
+               return err;

        if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
                int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
@@ -862,7 +864,10 @@
                skcipher_request_set_crypt(&subreq, req->src, req->dst,
                                           blocks * AES_BLOCK_SIZE, req->iv);
                req = &subreq;
+
                err = skcipher_walk_virt(&walk, req, false);
+               if (err)
+                       return err;
        } else {
                tail = 0;
        }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos
  2021-06-30  9:13       ` Ard Biesheuvel
@ 2021-06-30 12:04         ` Herbert Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2021-06-30 12:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Dave Hansen, syzbot, Borislav Petkov, H. Peter Anvin, jpa,
	kan.liang, Linux Kernel Mailing List, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner, X86 ML

On Wed, Jun 30, 2021 at 11:13:00AM +0200, Ard Biesheuvel wrote:
>
> So something like this, I suppose?

Yes this should work.  Ideally I think it should only call the
walker once instead of potentially three times but I can live
with that :)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-30 12:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-28 19:22 [syzbot] BUG: sleeping function called from invalid context in __fdget_pos syzbot
2021-06-29 14:46 ` Dave Hansen
2021-06-30  7:42   ` Ard Biesheuvel
2021-06-30  8:09     ` Herbert Xu
2021-06-30  9:13       ` Ard Biesheuvel
2021-06-30 12:04         ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox