Issue with core dump

All of lore.kernel.org
 help / color / mirror / Atom feed

* Issue with core dump
@ 2011-11-01 12:17 ` trisha yad
  0 siblings, 0 replies; 14+ messages in thread
From: trisha yad @ 2011-11-01 12:17 UTC (permalink / raw)
  To: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, oleg, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

Dear all,

I am running a multithreaded  application. So consider a global
variable x which is used by a, b and c thread.

In thread 'a' do abnormal operation(invalid memory access) and kernel
send signal kill to it. In the mean time Thread 'b' and Thread 'c'
got schedule and update
the variable x. when I got the core file, variable x  got updated, and
I am not  getting actual value that is present at time of crash of
thread a.
But In core file I got updated value of x. I want In core file exact
the same memory status as it at time of abnormal operation(invalid
memory access)

Is there any solution for such problem. ?

I want in core dump the same status  of memory as at time of abnormal
operation(invalid memory access).

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Issue with core dump
@ 2011-11-01 12:17 ` trisha yad
  0 siblings, 0 replies; 14+ messages in thread
From: trisha yad @ 2011-11-01 12:17 UTC (permalink / raw)
  To: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, oleg, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

Dear all,

I am running a multithreaded  application. So consider a global
variable x which is used by a, b and c thread.

In thread 'a' do abnormal operation(invalid memory access) and kernel
send signal kill to it. In the mean time Thread 'b' and Thread 'c'
got schedule and update
the variable x. when I got the core file, variable x  got updated, and
I am not  getting actual value that is present at time of crash of
thread a.
But In core file I got updated value of x. I want In core file exact
the same memory status as it at time of abnormal operation(invalid
memory access)

Is there any solution for such problem. ?

I want in core dump the same status  of memory as at time of abnormal
operation(invalid memory access).

Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-01 12:17 ` trisha yad
@ 2011-11-01 15:23   ` Oleg Nesterov
  -1 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2011-11-01 15:23 UTC (permalink / raw)
  To: trisha yad
  Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

On 11/01, trisha yad wrote:
>
> Dear all,
>
> I am running a multithreaded  application. So consider a global
> variable x which is used by a, b and c thread.
>
> In thread 'a' do abnormal operation(invalid memory access) and kernel
> send signal kill to it. In the mean time Thread 'b' and Thread 'c'
> got schedule and update
> the variable x. when I got the core file, variable x  got updated, and
> I am not  getting actual value that is present at time of crash of
> thread a.
> But In core file I got updated value of x. I want In core file exact
> the same memory status as it at time of abnormal operation(invalid
> memory access)

Yes, this is possible.

> Is there any solution for such problem. ?
>
> I want in core dump the same status  of memory as at time of abnormal
> operation(invalid memory access).

I don't think we can "fix" this.

We can probably change complete_signal() to notify other threads
"immediately", but this is not simple and obviously can not close
the window completely.

Whatever we do, we can't "stop" other threads at the time when
thread 'a' traps. All we can do is to try to shrink the window.

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-01 15:23   ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2011-11-01 15:23 UTC (permalink / raw)
  To: trisha yad
  Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

On 11/01, trisha yad wrote:
>
> Dear all,
>
> I am running a multithreaded  application. So consider a global
> variable x which is used by a, b and c thread.
>
> In thread 'a' do abnormal operation(invalid memory access) and kernel
> send signal kill to it. In the mean time Thread 'b' and Thread 'c'
> got schedule and update
> the variable x. when I got the core file, variable x  got updated, and
> I am not  getting actual value that is present at time of crash of
> thread a.
> But In core file I got updated value of x. I want In core file exact
> the same memory status as it at time of abnormal operation(invalid
> memory access)

Yes, this is possible.

> Is there any solution for such problem. ?
>
> I want in core dump the same status  of memory as at time of abnormal
> operation(invalid memory access).

I don't think we can "fix" this.

We can probably change complete_signal() to notify other threads
"immediately", but this is not simple and obviously can not close
the window completely.

Whatever we do, we can't "stop" other threads at the time when
thread 'a' traps. All we can do is to try to shrink the window.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-01 15:23   ` Oleg Nesterov
@ 2011-11-01 15:59     ` Tejun Heo
  -1 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-11-01 15:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

Hello,

On Tue, Nov 01, 2011 at 04:23:20PM +0100, Oleg Nesterov wrote:
> Whatever we do, we can't "stop" other threads at the time when
> thread 'a' traps. All we can do is to try to shrink the window.

Yeah, "at the time" can't even be defined preciesly.  Order of
operation isn't clearly defined in absence of synchronization
constructs.  In the described example, there's unspecified amount of
time (or cycles rather) between the load of the global variable and
the thread faulting.  Anything could have happened inbetween no matter
how immediate core dump was.

As long as we're reasonably immediate, which I think we already are, I
don't think there's much which needs to be changed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-01 15:59     ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-11-01 15:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

Hello,

On Tue, Nov 01, 2011 at 04:23:20PM +0100, Oleg Nesterov wrote:
> Whatever we do, we can't "stop" other threads at the time when
> thread 'a' traps. All we can do is to try to shrink the window.

Yeah, "at the time" can't even be defined preciesly.  Order of
operation isn't clearly defined in absence of synchronization
constructs.  In the described example, there's unspecified amount of
time (or cycles rather) between the load of the global variable and
the thread faulting.  Anything could have happened inbetween no matter
how immediate core dump was.

As long as we're reasonably immediate, which I think we already are, I
don't think there's much which needs to be changed.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-01 15:23   ` Oleg Nesterov
@ 2011-11-02  6:33     ` trisha yad
  -1 siblings, 0 replies; 14+ messages in thread
From: trisha yad @ 2011-11-02  6:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

Thanks all for your answer.

In loaded embedded system the time at with code hit do_user_fault()
and core_dump_wait() is bit
high, I check on my  system it took 2.7 sec. so it is very much
possible that core dump is not correct.
It  contain global value updated.

So is it possible at time of send_signal() we can stop modification of
faulty thread memory ?


Thanks



On Tue, Nov 1, 2011 at 8:53 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 11/01, trisha yad wrote:
>>
>> Dear all,
>>
>> I am running a multithreaded  application. So consider a global
>> variable x which is used by a, b and c thread.
>>
>> In thread 'a' do abnormal operation(invalid memory access) and kernel
>> send signal kill to it. In the mean time Thread 'b' and Thread 'c'
>> got schedule and update
>> the variable x. when I got the core file, variable x  got updated, and
>> I am not  getting actual value that is present at time of crash of
>> thread a.
>> But In core file I got updated value of x. I want In core file exact
>> the same memory status as it at time of abnormal operation(invalid
>> memory access)
>
> Yes, this is possible.
>
>> Is there any solution for such problem. ?
>>
>> I want in core dump the same status  of memory as at time of abnormal
>> operation(invalid memory access).
>
> I don't think we can "fix" this.
>
> We can probably change complete_signal() to notify other threads
> "immediately", but this is not simple and obviously can not close
> the window completely.
>
> Whatever we do, we can't "stop" other threads at the time when
> thread 'a' traps. All we can do is to try to shrink the window.
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-02  6:33     ` trisha yad
  0 siblings, 0 replies; 14+ messages in thread
From: trisha yad @ 2011-11-02  6:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips,
	kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

Thanks all for your answer.

In loaded embedded system the time at with code hit do_user_fault()
and core_dump_wait() is bit
high, I check on my  system it took 2.7 sec. so it is very much
possible that core dump is not correct.
It  contain global value updated.

So is it possible at time of send_signal() we can stop modification of
faulty thread memory ?


Thanks



On Tue, Nov 1, 2011 at 8:53 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 11/01, trisha yad wrote:
>>
>> Dear all,
>>
>> I am running a multithreaded  application. So consider a global
>> variable x which is used by a, b and c thread.
>>
>> In thread 'a' do abnormal operation(invalid memory access) and kernel
>> send signal kill to it. In the mean time Thread 'b' and Thread 'c'
>> got schedule and update
>> the variable x. when I got the core file, variable x  got updated, and
>> I am not  getting actual value that is present at time of crash of
>> thread a.
>> But In core file I got updated value of x. I want In core file exact
>> the same memory status as it at time of abnormal operation(invalid
>> memory access)
>
> Yes, this is possible.
>
>> Is there any solution for such problem. ?
>>
>> I want in core dump the same status  of memory as at time of abnormal
>> operation(invalid memory access).
>
> I don't think we can "fix" this.
>
> We can probably change complete_signal() to notify other threads
> "immediately", but this is not simple and obviously can not close
> the window completely.
>
> Whatever we do, we can't "stop" other threads at the time when
> thread 'a' traps. All we can do is to try to shrink the window.
>
> Oleg.
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-02  6:33     ` trisha yad
@ 2011-11-02 11:30       ` Ralf Baechle
  -1 siblings, 0 replies; 14+ messages in thread
From: Ralf Baechle @ 2011-11-02 11:30 UTC (permalink / raw)
  To: trisha yad
  Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote:

> Thanks all for your answer.
> 
> In loaded embedded system the time at with code hit do_user_fault()
> and core_dump_wait() is bit
> high, I check on my  system it took 2.7 sec. so it is very much
> possible that core dump is not correct.
> It  contain global value updated.
> 
> So is it possible at time of send_signal() we can stop modification of
> faulty thread memory ?

On existing hardware it is impossible to take a consistent snapshot of a
multi-threaded application at the time of one thread faulting.

A software simulator can handle this sort of race condition but of course
this approach has other disadvantages.

  Ralf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-02 11:30       ` Ralf Baechle
  0 siblings, 0 replies; 14+ messages in thread
From: Ralf Baechle @ 2011-11-02 11:30 UTC (permalink / raw)
  To: trisha yad
  Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell, Tejun Heo

On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote:

> Thanks all for your answer.
> 
> In loaded embedded system the time at with code hit do_user_fault()
> and core_dump_wait() is bit
> high, I check on my  system it took 2.7 sec. so it is very much
> possible that core dump is not correct.
> It  contain global value updated.
> 
> So is it possible at time of send_signal() we can stop modification of
> faulty thread memory ?

On existing hardware it is impossible to take a consistent snapshot of a
multi-threaded application at the time of one thread faulting.

A software simulator can handle this sort of race condition but of course
this approach has other disadvantages.

  Ralf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-02  6:33     ` trisha yad
@ 2011-11-02 15:31       ` Tejun Heo
  -1 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-11-02 15:31 UTC (permalink / raw)
  To: trisha yad
  Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

Hello,

On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote:
> In loaded embedded system the time at with code hit do_user_fault()
> and core_dump_wait() is bit
> high, I check on my  system it took 2.7 sec. so it is very much
> possible that core dump is not correct.

This may sound like arguing over semantics but it doesn't matter how
long it takes, it's still correct.  You're arguing that it's not
immediate enough.  IOW, no matter how fast you make it, you cannot
guarantee that results from slow operation wouldn't appear.

Also, the time between do_user_fault() and actual core dumping isn't
the important factor here.  do_user_fault() directly triggers delivery
of SIGSEGV (or BUS) and signal delivery will immediately deliver
SIGKILL to all other threads in the process, so it should be immediate
enough, or, rather, we don't have any way to make it any more
immediate.  It's basically direct call + IPI (if some threads are
running on other cpus).

Are you actually seeing artifacts from delayed core dump?  Given the
code path, I'm highly skeptical that would be the actual case.  If
you're using shared memory between different processes, then that
delay would matter but for such cases there's nothing much to do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-02 15:31       ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-11-02 15:31 UTC (permalink / raw)
  To: trisha yad
  Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

Hello,

On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote:
> In loaded embedded system the time at with code hit do_user_fault()
> and core_dump_wait() is bit
> high, I check on my  system it took 2.7 sec. so it is very much
> possible that core dump is not correct.

This may sound like arguing over semantics but it doesn't matter how
long it takes, it's still correct.  You're arguing that it's not
immediate enough.  IOW, no matter how fast you make it, you cannot
guarantee that results from slow operation wouldn't appear.

Also, the time between do_user_fault() and actual core dumping isn't
the important factor here.  do_user_fault() directly triggers delivery
of SIGSEGV (or BUS) and signal delivery will immediately deliver
SIGKILL to all other threads in the process, so it should be immediate
enough, or, rather, we don't have any way to make it any more
immediate.  It's basically direct call + IPI (if some threads are
running on other cpus).

Are you actually seeing artifacts from delayed core dump?  Given the
code path, I'm highly skeptical that would be the actual case.  If
you're using shared memory between different processes, then that
delay would matter but for such cases there's nothing much to do.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
  2011-11-02 15:31       ` Tejun Heo
@ 2011-11-02 15:55         ` Oleg Nesterov
  -1 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2011-11-02 15:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

On 11/02, Tejun Heo wrote:
>
> Also, the time between do_user_fault() and actual core dumping isn't
> the important factor here.  do_user_fault() directly triggers delivery
> of SIGSEGV (or BUS) and signal delivery will immediately deliver
> SIGKILL to all other threads in the process,

Not really, note the "if (!sig_kernel_coredump(sig))" check. And this
is what we can improve. But this is not simple, and personally I think
doesn't worth the trouble.

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Issue with core dump
@ 2011-11-02 15:55         ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2011-11-02 15:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel,
	linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton,
	Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki,
	Rusty Russell

On 11/02, Tejun Heo wrote:
>
> Also, the time between do_user_fault() and actual core dumping isn't
> the important factor here.  do_user_fault() directly triggers delivery
> of SIGSEGV (or BUS) and signal delivery will immediately deliver
> SIGKILL to all other threads in the process,

Not really, note the "if (!sig_kernel_coredump(sig))" check. And this
is what we can improve. But this is not simple, and personally I think
doesn't worth the trouble.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-11-02 15:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-01 12:17 Issue with core dump trisha yad
2011-11-01 12:17 ` trisha yad
2011-11-01 15:23 ` Oleg Nesterov
2011-11-01 15:23   ` Oleg Nesterov
2011-11-01 15:59   ` Tejun Heo
2011-11-01 15:59     ` Tejun Heo
2011-11-02  6:33   ` trisha yad
2011-11-02  6:33     ` trisha yad
2011-11-02 11:30     ` Ralf Baechle
2011-11-02 11:30       ` Ralf Baechle
2011-11-02 15:31     ` Tejun Heo
2011-11-02 15:31       ` Tejun Heo
2011-11-02 15:55       ` Oleg Nesterov
2011-11-02 15:55         ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.