linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* wait_even_interruptible_timeout(), signal, spin_lock() = system hang
@ 2010-05-28 16:44 Shirish Pargaonkar
  2010-06-04 11:51 ` Jeff Layton
  2010-06-06  8:00 ` Maciej Rutecki
  0 siblings, 2 replies; 6+ messages in thread
From: Shirish Pargaonkar @ 2010-05-28 16:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

After this sequence of calls, system hangs (smp, x86 box based with
.34 kernel), can ping only.
I have not been able to break in with Alt Sysrq t, working on that

        rc = wait_event_interruptible_timeout(ses->server->response_q,
                        (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
        if (rc < 0) {
                cFYI(1, ("command 0x%x interrupted", midQ->command));
                return -1;
        }

and when function that invoking function after coming out with ERESTARTSYS
(I kill the command with Ctrl C) calls
 spin_lock(&GlobalMid_Lock);

system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)

I do not have to use wait_event_interruptible_timeout and no such problems with
wait_event_timeout, it is only when signal/interrupt is involved, I
run into this problem

Any pointers/ideas what could be happening, would be really really appreciated.

Regards,

Shirish

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang
  2010-05-28 16:44 wait_even_interruptible_timeout(), signal, spin_lock() = system hang Shirish Pargaonkar
@ 2010-06-04 11:51 ` Jeff Layton
  2010-06-04 12:13   ` Shirish Pargaonkar
  2010-06-06  8:00 ` Maciej Rutecki
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2010-06-04 11:51 UTC (permalink / raw)
  To: Shirish Pargaonkar; +Cc: linux-kernel, linux-fsdevel

On Fri, 28 May 2010 11:44:46 -0500
Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:

> After this sequence of calls, system hangs (smp, x86 box based with
> .34 kernel), can ping only.
> I have not been able to break in with Alt Sysrq t, working on that
> 
>         rc = wait_event_interruptible_timeout(ses->server->response_q,
>                         (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
>         if (rc < 0) {
>                 cFYI(1, ("command 0x%x interrupted", midQ->command));
>                 return -1;
>         }
> 
> and when function that invoking function after coming out with ERESTARTSYS
> (I kill the command with Ctrl C) calls
>  spin_lock(&GlobalMid_Lock);
> 
> system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)
> 

Sounds like a race of some sort, but could also be that msleep() is
doing something (perhaps relating to the pending signal) that prevents
the hang. Without some sort of clue as to what the box is hung on at
the time there is no way to know.

> I do not have to use wait_event_interruptible_timeout and no such problems with
> wait_event_timeout, it is only when signal/interrupt is involved, I
> run into this problem
> 
> Any pointers/ideas what could be happening, would be really really appreciated.
> 

No idea right offhand. I'd suggest getting a core or sysrq data and see
what it's doing.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang
  2010-06-04 11:51 ` Jeff Layton
@ 2010-06-04 12:13   ` Shirish Pargaonkar
  2010-06-05 13:57     ` Shirish Pargaonkar
  0 siblings, 1 reply; 6+ messages in thread
From: Shirish Pargaonkar @ 2010-06-04 12:13 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-fsdevel

On Fri, Jun 4, 2010 at 6:51 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Fri, 28 May 2010 11:44:46 -0500
> Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
>
>> After this sequence of calls, system hangs (smp, x86 box based with
>> .34 kernel), can ping only.
>> I have not been able to break in with Alt Sysrq t, working on that
>>
>>         rc = wait_event_interruptible_timeout(ses->server->response_q,
>>                         (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
>>         if (rc < 0) {
>>                 cFYI(1, ("command 0x%x interrupted", midQ->command));
>>                 return -1;
>>         }
>>
>> and when function that invoking function after coming out with ERESTARTSYS
>> (I kill the command with Ctrl C) calls
>>  spin_lock(&GlobalMid_Lock);
>>
>> system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)
>>
>
> Sounds like a race of some sort, but could also be that msleep() is
> doing something (perhaps relating to the pending signal) that prevents
> the hang. Without some sort of clue as to what the box is hung on at
> the time there is no way to know.
>
>> I do not have to use wait_event_interruptible_timeout and no such problems with
>> wait_event_timeout, it is only when signal/interrupt is involved, I
>> run into this problem
>>
>> Any pointers/ideas what could be happening, would be really really appreciated.
>>
>
> No idea right offhand. I'd suggest getting a core or sysrq data and see
> what it's doing.
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff, Thanks.  The system hangs really hard. It does not respond to
Alt ScrLk  Ctrl ScrLk
key sequence at the text mode console i.e. nothing gets logged in
/var/log/messages.

Regards,

Shirish

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang
  2010-06-04 12:13   ` Shirish Pargaonkar
@ 2010-06-05 13:57     ` Shirish Pargaonkar
  0 siblings, 0 replies; 6+ messages in thread
From: Shirish Pargaonkar @ 2010-06-05 13:57 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-fsdevel

On Fri, Jun 4, 2010 at 7:13 AM, Shirish Pargaonkar
<shirishpargaonkar@gmail.com> wrote:
> On Fri, Jun 4, 2010 at 6:51 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> On Fri, 28 May 2010 11:44:46 -0500
>> Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
>>
>>> After this sequence of calls, system hangs (smp, x86 box based with
>>> .34 kernel), can ping only.
>>> I have not been able to break in with Alt Sysrq t, working on that
>>>
>>>         rc = wait_event_interruptible_timeout(ses->server->response_q,
>>>                         (midQ->midState != MID_REQUEST_SUBMITTED), timeout);
>>>         if (rc < 0) {
>>>                 cFYI(1, ("command 0x%x interrupted", midQ->command));
>>>                 return -1;
>>>         }
>>>
>>> and when function that invoking function after coming out with ERESTARTSYS
>>> (I kill the command with Ctrl C) calls
>>>  spin_lock(&GlobalMid_Lock);
>>>
>>> system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)
>>>
>>
>> Sounds like a race of some sort, but could also be that msleep() is
>> doing something (perhaps relating to the pending signal) that prevents
>> the hang. Without some sort of clue as to what the box is hung on at
>> the time there is no way to know.
>>
>>> I do not have to use wait_event_interruptible_timeout and no such problems with
>>> wait_event_timeout, it is only when signal/interrupt is involved, I
>>> run into this problem
>>>
>>> Any pointers/ideas what could be happening, would be really really appreciated.
>>>
>>
>> No idea right offhand. I'd suggest getting a core or sysrq data and see
>> what it's doing.
>>
>> --
>> Jeff Layton <jlayton@redhat.com>
>>
>
> Jeff, Thanks.  The system hangs really hard. It does not respond to
> Alt ScrLk  Ctrl ScrLk
> key sequence at the text mode console i.e. nothing gets logged in
> /var/log/messages.
>
> Regards,
>
> Shirish
>

I think this is what is happening,

when one fsstress command gets killed, one of the numerous processes
that are getting
killed holds mid lock and dies before releasing the lock and others
when interrupted in
wait return with ERESTARTSYS and while attempting the mid lock, spin forever
causing system hang.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang
  2010-05-28 16:44 wait_even_interruptible_timeout(), signal, spin_lock() = system hang Shirish Pargaonkar
  2010-06-04 11:51 ` Jeff Layton
@ 2010-06-06  8:00 ` Maciej Rutecki
  2010-06-06 14:25   ` Shirish Pargaonkar
  1 sibling, 1 reply; 6+ messages in thread
From: Maciej Rutecki @ 2010-06-06  8:00 UTC (permalink / raw)
  To: Shirish Pargaonkar; +Cc: linux-kernel, linux-fsdevel

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=16139
for your bug report, please add your address to the CC list in there, thanks!

On piątek, 28 maja 2010 o 18:44:46 Shirish Pargaonkar wrote:
> After this sequence of calls, system hangs (smp, x86 box based with
> .34 kernel), can ping only.
> I have not been able to break in with Alt Sysrq t, working on that
> 
>         rc = wait_event_interruptible_timeout(ses->server->response_q,
>                         (midQ->midState != MID_REQUEST_SUBMITTED),
>  timeout); if (rc < 0) {
>                 cFYI(1, ("command 0x%x interrupted", midQ->command));
>                 return -1;
>         }
> 
> and when function that invoking function after coming out with ERESTARTSYS
> (I kill the command with Ctrl C) calls
>  spin_lock(&GlobalMid_Lock);
> 
> system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)
> 
> I do not have to use wait_event_interruptible_timeout and no such problems
>  with wait_event_timeout, it is only when signal/interrupt is involved, I
>  run into this problem
> 
> Any pointers/ideas what could be happening, would be really really
>  appreciated.
> 
> Regards,
> 
> Shirish
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Maciej Rutecki
http://www.maciek.unixy.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wait_even_interruptible_timeout(), signal, spin_lock() = system hang
  2010-06-06  8:00 ` Maciej Rutecki
@ 2010-06-06 14:25   ` Shirish Pargaonkar
  0 siblings, 0 replies; 6+ messages in thread
From: Shirish Pargaonkar @ 2010-06-06 14:25 UTC (permalink / raw)
  To: maciej.rutecki; +Cc: linux-kernel, linux-fsdevel

On Sun, Jun 6, 2010 at 3:00 AM, Maciej Rutecki <maciej.rutecki@gmail.com> wrote:
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=16139
> for your bug report, please add your address to the CC list in there, thanks!
>
> On piątek, 28 maja 2010 o 18:44:46 Shirish Pargaonkar wrote:
>> After this sequence of calls, system hangs (smp, x86 box based with
>> .34 kernel), can ping only.
>> I have not been able to break in with Alt Sysrq t, working on that
>>
>>         rc = wait_event_interruptible_timeout(ses->server->response_q,
>>                         (midQ->midState != MID_REQUEST_SUBMITTED),
>>  timeout); if (rc < 0) {
>>                 cFYI(1, ("command 0x%x interrupted", midQ->command));
>>                 return -1;
>>         }
>>
>> and when function that invoking function after coming out with ERESTARTSYS
>> (I kill the command with Ctrl C) calls
>>  spin_lock(&GlobalMid_Lock);
>>
>> system hangs.  If I sleep before return -1 (e.g. msleep(1), no hang)
>>
>> I do not have to use wait_event_interruptible_timeout and no such problems
>>  with wait_event_timeout, it is only when signal/interrupt is involved, I
>>  run into this problem
>>
>> Any pointers/ideas what could be happening, would be really really
>>  appreciated.
>>
>> Regards,
>>
>> Shirish
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
> --
> Maciej Rutecki
> http://www.maciek.unixy.pl
>

Thanks. Shaggy told me what I think is happening should not be for
following reasons.

"The process doesn't receive the signal in the
kernel unless it specifically checks for it, or it exits a system call.
There shouldn't be a spinlock held in either situation.
"

I will update the bug as well.

Regards,

Shirish

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-06 14:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-28 16:44 wait_even_interruptible_timeout(), signal, spin_lock() = system hang Shirish Pargaonkar
2010-06-04 11:51 ` Jeff Layton
2010-06-04 12:13   ` Shirish Pargaonkar
2010-06-05 13:57     ` Shirish Pargaonkar
2010-06-06  8:00 ` Maciej Rutecki
2010-06-06 14:25   ` Shirish Pargaonkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).