[PATCH] coredump: wait on the core pattern umh at least once

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] coredump: wait on the core pattern umh at least once
@ 2011-10-28 20:11 Scott James Remnant
  2011-10-28 20:22 ` Neil Horman
  2011-10-29 14:13 ` Oleg Nesterov
  0 siblings, 2 replies; 8+ messages in thread
From: Scott James Remnant @ 2011-10-28 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Neil Horman,
	Scott James Remnant

If a thread crashes as a result of a signal on the thread group leader
that signal can still be pending, which means the loop in
wait_for_dump_helpers() falls straight though.

This can mean that the crashing process is reaped before any core
pattern user-mode helper is run, leaving it without entries in /proc
to look through.

While the helper obviously has to deal with that, tweaking this loop
so it runs at least one iteration even in that case helps a lot.

Signed-off-by: Scott James Remnant <scott@netsplit.com>
---
 fs/exec.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 25dcbe5..8959d304 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2030,11 +2030,11 @@ static void wait_for_dump_helpers(struct file *file)
 	pipe->readers++;
 	pipe->writers--;
 
-	while ((pipe->readers > 1) && (!signal_pending(current))) {
+	do {
 		wake_up_interruptible_sync(&pipe->wait);
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 		pipe_wait(pipe);
-	}
+	} while ((pipe->readers > 1) && (!signal_pending(current)));
 
 	pipe->readers--;
 	pipe->writers++;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-28 20:11 [PATCH] coredump: wait on the core pattern umh at least once Scott James Remnant
@ 2011-10-28 20:22 ` Neil Horman
  2011-10-28 21:16   ` Scott James Remnant
  2011-10-29 14:13 ` Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Neil Horman @ 2011-10-28 20:22 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: linux-kernel, Mandeep Singh Baines, Oleg Nesterov

On Fri, Oct 28, 2011 at 01:11:28PM -0700, Scott James Remnant wrote:
> If a thread crashes as a result of a signal on the thread group leader
> that signal can still be pending, which means the loop in
> wait_for_dump_helpers() falls straight though.
> 
> This can mean that the crashing process is reaped before any core
> pattern user-mode helper is run, leaving it without entries in /proc
> to look through.
> 
> While the helper obviously has to deal with that, tweaking this loop
> so it runs at least one iteration even in that case helps a lot.
> 
> Signed-off-by: Scott James Remnant <scott@netsplit.com>
> ---
>  fs/exec.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 25dcbe5..8959d304 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -2030,11 +2030,11 @@ static void wait_for_dump_helpers(struct file *file)
>  	pipe->readers++;
>  	pipe->writers--;
>  
> -	while ((pipe->readers > 1) && (!signal_pending(current))) {
> +	do {
>  		wake_up_interruptible_sync(&pipe->wait);
>  		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>  		pipe_wait(pipe);
> -	}
> +	} while ((pipe->readers > 1) && (!signal_pending(current)));
>  
>  	pipe->readers--;
>  	pipe->writers++;
> -- 
> 1.7.3.1
> 
> 
Seems reasonable to me.
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-28 20:22 ` Neil Horman
@ 2011-10-28 21:16   ` Scott James Remnant
  0 siblings, 0 replies; 8+ messages in thread
From: Scott James Remnant @ 2011-10-28 21:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: linux-kernel, Mandeep Singh Baines, Oleg Nesterov

On Fri, Oct 28, 2011 at 1:22 PM, Neil Horman <nhorman@tuxdriver.com> wrote:

> Seems reasonable to me.
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
>

One thing I'm not sure about is whether this will work if there are
actually no readers really reading, just the incremented one. It
seemed to work just fine in testing, but I'd be more comfortable if
someone else could confirm.

Scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-28 20:11 [PATCH] coredump: wait on the core pattern umh at least once Scott James Remnant
  2011-10-28 20:22 ` Neil Horman
@ 2011-10-29 14:13 ` Oleg Nesterov
  2011-10-29 18:01   ` Scott James Remnant
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2011-10-29 14:13 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: linux-kernel, Mandeep Singh Baines, Neil Horman

On 10/28, Scott James Remnant wrote:
>
> If a thread crashes as a result of a signal on the thread group leader
> that signal can still be pending,

No. do_coredump() clears TIF_SIGPENDING.

The problem is, this is obviously not enough and should be fixed.

> While the helper obviously has to deal with that, tweaking this loop
> so it runs at least one iteration even in that case helps a lot.

I don't understand this patch. It doesn't look right at all.

> @@ -2030,11 +2030,11 @@ static void wait_for_dump_helpers(struct file *file)
>  	pipe->readers++;
>  	pipe->writers--;
>
> -	while ((pipe->readers > 1) && (!signal_pending(current))) {
> +	do {
>  		wake_up_interruptible_sync(&pipe->wait);
>  		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>  		pipe_wait(pipe);

This can hang forever. We didn't check pipe->readers, it it is zero
nobody can wakeup us.

> +	} while ((pipe->readers > 1) && (!signal_pending(current)));

And, it doesn't make any sense to call pipe_wait() with signal_pending(),
it won't block. Note that pipe_wait() schedules in TASK_INTERRUPTIBLE.

I already tried to explain why this signal_pending() was added, but
apparently I was not clear. I'll try again in the previous thread.

Oleg.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-29 14:13 ` Oleg Nesterov
@ 2011-10-29 18:01   ` Scott James Remnant
  2011-10-29 19:30     ` Oleg Nesterov
  2011-10-31 21:18     ` Neil Horman
  0 siblings, 2 replies; 8+ messages in thread
From: Scott James Remnant @ 2011-10-29 18:01 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Mandeep Singh Baines, Neil Horman

On Sat, Oct 29, 2011 at 7:13 AM, Oleg Nesterov <oleg@redhat.com> wrote:

> On 10/28, Scott James Remnant wrote:
> >
> > If a thread crashes as a result of a signal on the thread group leader
> > that signal can still be pending,
>
> No. do_coredump() clears TIF_SIGPENDING.
>
I'm definitely seeing cases where SIGTERM sent to the process group
that chrome is in results in one of chrome's thread's crashing (not
your concern, obviously), but at the point it enters this function
TIF_SIGPENDING is definitely set and the signal is SIGTERM.

The SIGTERM is in the shared pending set.

> I already tried to explain why this signal_pending() was added, but
> apparently I was not clear. I'll try again in the previous thread.
>
Could you add me to the Cc: of that thread?

Scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-29 18:01   ` Scott James Remnant
@ 2011-10-29 19:30     ` Oleg Nesterov
  2011-10-29 19:38       ` Scott James Remnant
  2011-10-31 21:18     ` Neil Horman
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2011-10-29 19:30 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: linux-kernel, Mandeep Singh Baines, Neil Horman

On 10/29, Scott James Remnant wrote:
>
> On Sat, Oct 29, 2011 at 7:13 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 10/28, Scott James Remnant wrote:
> > >
> > > If a thread crashes as a result of a signal on the thread group leader
> > > that signal can still be pending,
> >
> > No. do_coredump() clears TIF_SIGPENDING.
> >
> I'm definitely seeing cases where SIGTERM sent to the process group
> that chrome is in results in one of chrome's thread's crashing (not
> your concern, obviously), but at the point it enters this function

which function? wait_for_dump_helpers?

> TIF_SIGPENDING is definitely set and the signal is SIGTERM.

Yes, this is possible. But not as result of a signal which triggers
the coredumping. And once again, this clear_thread_flag(TIF_SIGPENDING)
is simply wrong (I mean, not enough).

> > I already tried to explain why this signal_pending() was added, but
> > apparently I was not clear. I'll try again in the previous thread.
> >
> Could you add me to the Cc: of that thread?

I thought you were cc'ed ;) Sorry, I didn't realiaze that these 2
threads are totally separate. Please look at

	http://marc.info/?t=131959137800005

and at
	http://marc.info/?l=linux-kernel&m=131989970411759

in particular.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-29 19:30     ` Oleg Nesterov
@ 2011-10-29 19:38       ` Scott James Remnant
  0 siblings, 0 replies; 8+ messages in thread
From: Scott James Remnant @ 2011-10-29 19:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Mandeep Singh Baines, Neil Horman

On Sat, Oct 29, 2011 at 12:30 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/29, Scott James Remnant wrote:
>>
>> On Sat, Oct 29, 2011 at 7:13 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>
>> > On 10/28, Scott James Remnant wrote:
>> > >
>> > > If a thread crashes as a result of a signal on the thread group leader
>> > > that signal can still be pending,
>> >
>> > No. do_coredump() clears TIF_SIGPENDING.
>> >
>> I'm definitely seeing cases where SIGTERM sent to the process group
>> that chrome is in results in one of chrome's thread's crashing (not
>> your concern, obviously), but at the point it enters this function
>
> which function? wait_for_dump_helpers?
>
>> TIF_SIGPENDING is definitely set and the signal is SIGTERM.
>
> Yes, this is possible. But not as result of a signal which triggers
> the coredumping. And once again, this clear_thread_flag(TIF_SIGPENDING)
> is simply wrong (I mean, not enough).
>
Makes sense.

>> > I already tried to explain why this signal_pending() was added, but
>> > apparently I was not clear. I'll try again in the previous thread.
>> >
>> Could you add me to the Cc: of that thread?
>
> I thought you were cc'ed ;) Sorry, I didn't realiaze that these 2
> threads are totally separate. Please look at
>
>        http://marc.info/?t=131959137800005
>
> and at
>        http://marc.info/?l=linux-kernel&m=131989970411759
>
> in particular.
>
Yeah I couldn't really work out why the signal_pending() was there
either, I was trying to rework the loop to keep it on the assumption
it was there for a reason - I'm quite happy that it get removed, that
also fixes my problem :)

Scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: wait on the core pattern umh at least once
  2011-10-29 18:01   ` Scott James Remnant
  2011-10-29 19:30     ` Oleg Nesterov
@ 2011-10-31 21:18     ` Neil Horman
  1 sibling, 0 replies; 8+ messages in thread
From: Neil Horman @ 2011-10-31 21:18 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: Oleg Nesterov, linux-kernel, Mandeep Singh Baines

On Sat, Oct 29, 2011 at 11:01:44AM -0700, Scott James Remnant wrote:
> On Sat, Oct 29, 2011 at 7:13 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > On 10/28, Scott James Remnant wrote:
> > >
> > > If a thread crashes as a result of a signal on the thread group leader
> > > that signal can still be pending,
> >
> > No. do_coredump() clears TIF_SIGPENDING.
> >
> I'm definitely seeing cases where SIGTERM sent to the process group
> that chrome is in results in one of chrome's thread's crashing (not
> your concern, obviously), but at the point it enters this function
> TIF_SIGPENDING is definitely set and the signal is SIGTERM.
> 
> The SIGTERM is in the shared pending set.
> 
> > I already tried to explain why this signal_pending() was added, but
> > apparently I was not clear. I'll try again in the previous thread.
> >
> Could you add me to the Cc: of that thread?
> 
> Scott
> 
FWIW, this is the (huge) thread, and specific post that originated the change
we're looking at:
https://lkml.org/lkml/2009/7/2/98
Neil


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-31 21:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-28 20:11 [PATCH] coredump: wait on the core pattern umh at least once Scott James Remnant
2011-10-28 20:22 ` Neil Horman
2011-10-28 21:16   ` Scott James Remnant
2011-10-29 14:13 ` Oleg Nesterov
2011-10-29 18:01   ` Scott James Remnant
2011-10-29 19:30     ` Oleg Nesterov
2011-10-29 19:38       ` Scott James Remnant
2011-10-31 21:18     ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox