public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shawn Starr <Shawn.Starr@Home.net>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: PS hanging in 2.4.1 - HAPPENING NOW!!!
Date: Sat, 03 Feb 2001 23:00:09 -0500	[thread overview]
Message-ID: <3A7CD3C9.21CC5505@Home.net> (raw)
In-Reply-To: <Pine.LNX.4.10.10102031849210.9705-100000@penguin.transmeta.com>

Adding now.. I just applied 2.4.1-ac2 and now i'll add this code snipet to the source. It might take 4 days or so for the bug to show itself. At least, unless it triggers eariler (?)

Restarting...

Shawn.


Linus Torvalds wrote:

> On Sat, 3 Feb 2001, Shawn.Starr@Home.net wrote:
> >
> >Feb  3 17:57:08 coredump kernel: gnomeicu  S 0000CD17     0  9338      1        (NOTLB)    9340  9332
> >Feb  3 17:57:08 coredump kernel: Call Trace: [search_by_key+203/3232] [search_for_position_by_key+170/916] [make_cpu_key+57/64] [_get_block_create_0+136/1072] [_get_block_create_0+162/1072] [remove_wait_queue+40/48] [__wait_on_buffer+128/140]
> >Feb  3 17:57:08 coredump kernel:        [<f0000000>] [reiserfs_get_block+158/3408] [search_for_position_by_key+170/916] [search_for_position_by_key+570/916] [make_cpu_key+57/64] [kmem_cache_alloc+75/116] [get_unused_buffer_head+52/144] [create_buffers+96/444]
> >Feb  3 17:57:08 coredump kernel:        [block_read_full_page+246/552] [add_to_page_cache_unique+202/212] [reiserfs_readpage+15/20] [reiserfs_get_block+0/3408] [read_cluster_nonblocking+258/324] [filemap_nopage+332/1032] [do_no_page+77/192] [handle_mm_fault+232/340]
> >Feb  3 17:57:08 coredump kernel:        [do_page_fault+312/1020] [do_page_fault+0/1020] [start_request+388/508] [intlat_local_irq_disable+16/20] [ide_do_request+685/752] [schedule+639/964] [remove_wait_queue+40/48] [error_code+52/64]
> >Feb  3 17:57:09 coredump kernel:        [__generic_copy_from_user+52/60] [opost_block+67/384] [handle_mm_fault+232/340] [add_wait_queue+59/68] [write_chan+365/516] [tty_write+341/448] [write_chan+0/516] [sys_write+143/196]
> >Feb  3 17:57:09 coredump kernel:        [system_call+62/80]
>
> Ok, the above seems to be the culprit here.
>
> Note how the thing is in TASK_INTERRUPTIBLE (S) sleep somewhere in the
> reiserfs code..
>
> Debugging this is slightly harder than I'd like, because the "call trace"
> is really not a trace, but actually just a dump of the stack of everything
> that looks like a kernel address. And a lot of it is crap - stuff left
> over by previous calls that hasn't gotten overwritten and is visible
> because some function has a large stack footprint (lots of local variables
> that end up not being very used and let things show through).
>
> Anyway, what I _think_ is the cleaned-up stacktrace is
>
>         [reiserfs_get_block+158/3408]
>         [reiserfs_readpage+15/20]
>         [read_cluster_nonblocking+258/324]
>         [filemap_nopage+332/1032]
>         [do_no_page+77/192]
>         [handle_mm_fault+232/340]
>         [do_page_fault+312/1020]
>         [error_code+52/64]
>         [__generic_copy_from_user+52/60]
>         [opost_block+67/384]
>         [handle_mm_fault+232/340]
>         [add_wait_queue+59/68]
>         [write_chan+365/516]
>         [tty_write+341/448]
>         [write_chan+0/516]
>         [sys_write+143/196]
>         [system_call+62/80]
>
> and what is interesting is that you got a page fault while you were
> copying stuff in to the tty layer. Which happens with TASK_INTERRUPTIBLE
> sleep. Now, the page fault code never clears that, so we enter reiserfs
> still "sleeping", and reiserfs will do a
>
>         if (need_resched(current))
>                 schedule();
>
> which won't do what reiserfs _wants_ it to do at all. Because if
> task->state is TASK_INTERRUPTIBLE, the above will go to sleep, not just
> cause a nice reschedule. And we'll be sleeping with the task MM semaphore
> held - only to wake up if somebody were to signal us or something.
>
> If you can re-create this hang, could you please try to add this single
> line to the top of "handle_mm_fault()" in mm/memory.c (after the variable
> declarations, of course):
>
>         current->state = TASK_RUNNING;
>
> which just means that if we get a page fault while we're half asleep, it
> will be safe to do a schedule() without explicitly setting the process
> running again.
>
>                 Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

  reply	other threads:[~2001-02-04  4:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200102040216.SAA09819@penguin.transmeta.com>
2001-02-04  3:02 ` PS hanging in 2.4.1 - HAPPENING NOW!!! Linus Torvalds
2001-02-04  4:00   ` Shawn Starr [this message]
2001-02-04  4:12     ` Shawn Starr
2001-02-03 22:48 Shawn Starr
2001-02-03 23:00 ` Shawn Starr
2001-02-04  1:11 ` Marko Kreen
2001-02-04  1:06   ` Shawn Starr
2001-02-04  1:19     ` Marko Kreen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A7CD3C9.21CC5505@Home.net \
    --to=shawn.starr@home.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox