* Re: [PROBLEM] WARNING: at kernel/exit.c:910 do_exit [not found] <AANLkTi=ej5guZ2R72=fqe5mciwjVeY5LkDL3Qx1W0GvA@mail.gmail.com> @ 2010-11-21 17:42 ` Linus Torvalds 2010-11-21 18:51 ` Oleg Nesterov 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2010-11-21 17:42 UTC (permalink / raw) To: Pekka Enberg; +Cc: oleg, LKML, Peter Zijlstra, Andrew Morton, Linux Netdev List On Sun, Nov 21, 2010 at 7:35 AM, Pekka Enberg <penberg@kernel.org> wrote: > > The following warning triggered on me while I was browsing the web: > > http://twitpic.com/38vxxg > > [ Click on the "Rotate photo" button for landscape version. ] > > It's > > WARN_ON(atomic_read(&tsk->fs_excl)); > > in do_exit(). There was a prior oops in __pipe_free_info() called in > sys_recvmsg() paths that unfortunately scrolled away. That WARN_ON() is almost certainly due to the previous oops. The previous oops may have scrolled away, but you can see the call-chain, since it's part of the later oops. Except the photo is hard to read ;) In fact, you can see that there has been _two_ oopses before that. The "free_pipe_info()" oops comes from the "do_exit()" path of the _first_ oops. So the original oops seems to be around here: (*probably* oopsed in __scm_destroy) (the fd_install on the stack is likely from scm_detach_fds calling it before calling __scm_destroy - just a stale pointer remaining on the stack) scm_detach_fds unix_stream_recvmsg sock_recvmsg __sys_recvmsg sys_recvmsg which means that this is almost certainly in networking. Then, when that oops caused us to die, do_exit() tried to clean up the state, and _that_ caused us to oops again (now in free_pipe_info). That second oops is the partial one you see. And then the _third_ oops is the one you actually caught. The free_pipe_info() oops in turn must be because we passed in an invalid "inode" pointer. It's almost certainly the "inode->i_pipe" dereference, so inode was NULL or something. I don't see why that would happen, but with a previous oops it's not necessarily clear that there _is_ a reason. And who knows? It may be that the networking oops was due to some other earlier problem that isn't part of this particular callchain and that has long since scrolled away. I don't see any unix domain changes since -rc1. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PROBLEM] WARNING: at kernel/exit.c:910 do_exit 2010-11-21 17:42 ` [PROBLEM] WARNING: at kernel/exit.c:910 do_exit Linus Torvalds @ 2010-11-21 18:51 ` Oleg Nesterov 2010-11-21 19:11 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Oleg Nesterov @ 2010-11-21 18:51 UTC (permalink / raw) To: Linus Torvalds Cc: Pekka Enberg, LKML, Peter Zijlstra, Andrew Morton, Linux Netdev List On 11/21, Linus Torvalds wrote: > > On Sun, Nov 21, 2010 at 7:35 AM, Pekka Enberg <penberg@kernel.org> wrote: > > > > WARN_ON(atomic_read(&tsk->fs_excl)); > > > > in do_exit(). There was a prior oops in __pipe_free_info() called in > > sys_recvmsg() paths that unfortunately scrolled away. > > That WARN_ON() is almost certainly due to the previous oops. > > The previous oops may have scrolled away, but you can see the > call-chain, since it's part of the later oops. Except the photo is > hard to read ;) > > In fact, you can see that there has been _two_ oopses before that. The > "free_pipe_info()" oops comes from the "do_exit()" path of the _first_ > oops. > > So the original oops seems to be around here: > > (*probably* oopsed in __scm_destroy) > (the fd_install on the stack is likely from scm_detach_fds calling > it before calling __scm_destroy - just a stale pointer remaining on > the stack) > scm_detach_fds > unix_stream_recvmsg > sock_recvmsg > __sys_recvmsg > sys_recvmsg Yes, but still I am puzzled a bit. Where ->fs_excl != 0 comes from? Not that I really understand what it means, but nothing in this path can do lock_super(), I think. This means it was already nonzero or the bug caused the memory corruption. Btw, why it is atomic_t ? > And who knows? It may be that the networking oops was due to some > other earlier problem that isn't part of this particular callchain and > that has long since scrolled away. Agreed, probably this is false alarm. The oopsing task can trigger a lot of "wrong" warnings. Oleg. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PROBLEM] WARNING: at kernel/exit.c:910 do_exit 2010-11-21 18:51 ` Oleg Nesterov @ 2010-11-21 19:11 ` Linus Torvalds 2010-11-21 19:29 ` Oleg Nesterov 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2010-11-21 19:11 UTC (permalink / raw) To: Oleg Nesterov, Jens Axboe Cc: Pekka Enberg, LKML, Peter Zijlstra, Andrew Morton, Linux Netdev List On Sun, Nov 21, 2010 at 10:51 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > Yes, but still I am puzzled a bit. Where ->fs_excl != 0 comes from? > Not that I really understand what it means, but nothing in this path > can do lock_super(), I think. This means it was already nonzero or > the bug caused the memory corruption. I would guess that by the time you do three recursive oopses, you've probably used up all the kernel stack and you've stomped on the thread_info itself. At that point, thread->tsk might be totally random. So it's possible that "current->fs_excl" is nonzero simply because "current" is a random pointer at this point. Or it might be memory corruption, and the same thing that caused the original oops. I dunno. I do wonder if we should just flag a thread as "busy oopsing" before we call "do_exit(), so that _if_ we do a recursive oops we (a) don't print it out (except just a one-liner to say "recursively oopsed in %pS" or something) (b) don't try to clean up with do_exit (because that's likely just going to oops again or run out of stack etc) That might have left us with a more visible original oops. Maybe the register contents at that point could have given us any ideas (ie things like the slab poisoning memory patterns or whatever). > Btw, why it is atomic_t ? That whole thing is insane. Afaik, there is one single user (apart from the WARN_ON), and that's some stupid block scheduler crap for IO priority boosting. The block layer people have been way too eager to add random ugly crud. And no, I don't see why the atomic_t would make any sense. It's thread-local. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PROBLEM] WARNING: at kernel/exit.c:910 do_exit 2010-11-21 19:11 ` Linus Torvalds @ 2010-11-21 19:29 ` Oleg Nesterov 0 siblings, 0 replies; 5+ messages in thread From: Oleg Nesterov @ 2010-11-21 19:29 UTC (permalink / raw) To: Linus Torvalds Cc: Jens Axboe, Pekka Enberg, LKML, Peter Zijlstra, Andrew Morton, Linux Netdev List On 11/21, Linus Torvalds wrote: > > I do wonder if we should just flag a thread as "busy oopsing" before > we call "do_exit(), so that _if_ we do a recursive oops we > > (a) don't print it out (except just a one-liner to say "recursively > oopsed in %pS" or something) > (b) don't try to clean up with do_exit (because that's likely just > going to oops again or run out of stack etc) > > That might have left us with a more visible original oops. Maybe the > register contents at that point could have given us any ideas (ie > things like the slab poisoning memory patterns or whatever). +inf ;) I thought about this many times. To me, the major offender is __schedule_bug(). It is quite useful by itself, but every bug with spinlock held triggers it. Oleg. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PROBLEM] WARNING: at kernel/exit.c:910 do_exit
@ 2010-11-21 15:35 Pekka Enberg
0 siblings, 0 replies; 5+ messages in thread
From: Pekka Enberg @ 2010-11-21 15:35 UTC (permalink / raw)
To: oleg, Linus Torvalds, LKML, Peter Zijlstra, Andrew Morton
Hi all,
The following warning triggered on me while I was browsing the web:
http://twitpic.com/38vxxg
[ Click on the "Rotate photo" button for landscape version. ]
It's
WARN_ON(atomic_read(&tsk->fs_excl));
in do_exit(). There was a prior oops in __pipe_free_info() called in
sys_recvmsg() paths that unfortunately scrolled away. Does it ring a
bell to anyone? This is latest Linus' master from few hours ago. I've
been running 2.6.37-rc1 until today without any problems.
I'll let you know if it triggers again.
Pekka
^ permalink raw reply [flat|nested] 5+ messages in threadend of thread, other threads:[~2010-11-21 19:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <AANLkTi=ej5guZ2R72=fqe5mciwjVeY5LkDL3Qx1W0GvA@mail.gmail.com>
2010-11-21 17:42 ` [PROBLEM] WARNING: at kernel/exit.c:910 do_exit Linus Torvalds
2010-11-21 18:51 ` Oleg Nesterov
2010-11-21 19:11 ` Linus Torvalds
2010-11-21 19:29 ` Oleg Nesterov
2010-11-21 15:35 Pekka Enberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).