* Issue with core dump @ 2011-11-01 12:17 ` trisha yad 0 siblings, 0 replies; 14+ messages in thread From: trisha yad @ 2011-11-01 12:17 UTC (permalink / raw) To: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, oleg, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo Dear all, I am running a multithreaded application. So consider a global variable x which is used by a, b and c thread. In thread 'a' do abnormal operation(invalid memory access) and kernel send signal kill to it. In the mean time Thread 'b' and Thread 'c' got schedule and update the variable x. when I got the core file, variable x got updated, and I am not getting actual value that is present at time of crash of thread a. But In core file I got updated value of x. I want In core file exact the same memory status as it at time of abnormal operation(invalid memory access) Is there any solution for such problem. ? I want in core dump the same status of memory as at time of abnormal operation(invalid memory access). Thanks ^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with core dump @ 2011-11-01 12:17 ` trisha yad 0 siblings, 0 replies; 14+ messages in thread From: trisha yad @ 2011-11-01 12:17 UTC (permalink / raw) To: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, oleg, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo Dear all, I am running a multithreaded application. So consider a global variable x which is used by a, b and c thread. In thread 'a' do abnormal operation(invalid memory access) and kernel send signal kill to it. In the mean time Thread 'b' and Thread 'c' got schedule and update the variable x. when I got the core file, variable x got updated, and I am not getting actual value that is present at time of crash of thread a. But In core file I got updated value of x. I want In core file exact the same memory status as it at time of abnormal operation(invalid memory access) Is there any solution for such problem. ? I want in core dump the same status of memory as at time of abnormal operation(invalid memory access). Thanks -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-01 12:17 ` trisha yad @ 2011-11-01 15:23 ` Oleg Nesterov -1 siblings, 0 replies; 14+ messages in thread From: Oleg Nesterov @ 2011-11-01 15:23 UTC (permalink / raw) To: trisha yad Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo On 11/01, trisha yad wrote: > > Dear all, > > I am running a multithreaded application. So consider a global > variable x which is used by a, b and c thread. > > In thread 'a' do abnormal operation(invalid memory access) and kernel > send signal kill to it. In the mean time Thread 'b' and Thread 'c' > got schedule and update > the variable x. when I got the core file, variable x got updated, and > I am not getting actual value that is present at time of crash of > thread a. > But In core file I got updated value of x. I want In core file exact > the same memory status as it at time of abnormal operation(invalid > memory access) Yes, this is possible. > Is there any solution for such problem. ? > > I want in core dump the same status of memory as at time of abnormal > operation(invalid memory access). I don't think we can "fix" this. We can probably change complete_signal() to notify other threads "immediately", but this is not simple and obviously can not close the window completely. Whatever we do, we can't "stop" other threads at the time when thread 'a' traps. All we can do is to try to shrink the window. Oleg. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-01 15:23 ` Oleg Nesterov 0 siblings, 0 replies; 14+ messages in thread From: Oleg Nesterov @ 2011-11-01 15:23 UTC (permalink / raw) To: trisha yad Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo On 11/01, trisha yad wrote: > > Dear all, > > I am running a multithreaded application. So consider a global > variable x which is used by a, b and c thread. > > In thread 'a' do abnormal operation(invalid memory access) and kernel > send signal kill to it. In the mean time Thread 'b' and Thread 'c' > got schedule and update > the variable x. when I got the core file, variable x got updated, and > I am not getting actual value that is present at time of crash of > thread a. > But In core file I got updated value of x. I want In core file exact > the same memory status as it at time of abnormal operation(invalid > memory access) Yes, this is possible. > Is there any solution for such problem. ? > > I want in core dump the same status of memory as at time of abnormal > operation(invalid memory access). I don't think we can "fix" this. We can probably change complete_signal() to notify other threads "immediately", but this is not simple and obviously can not close the window completely. Whatever we do, we can't "stop" other threads at the time when thread 'a' traps. All we can do is to try to shrink the window. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-01 15:23 ` Oleg Nesterov @ 2011-11-01 15:59 ` Tejun Heo -1 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2011-11-01 15:59 UTC (permalink / raw) To: Oleg Nesterov Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell Hello, On Tue, Nov 01, 2011 at 04:23:20PM +0100, Oleg Nesterov wrote: > Whatever we do, we can't "stop" other threads at the time when > thread 'a' traps. All we can do is to try to shrink the window. Yeah, "at the time" can't even be defined preciesly. Order of operation isn't clearly defined in absence of synchronization constructs. In the described example, there's unspecified amount of time (or cycles rather) between the load of the global variable and the thread faulting. Anything could have happened inbetween no matter how immediate core dump was. As long as we're reasonably immediate, which I think we already are, I don't think there's much which needs to be changed. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-01 15:59 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2011-11-01 15:59 UTC (permalink / raw) To: Oleg Nesterov Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell Hello, On Tue, Nov 01, 2011 at 04:23:20PM +0100, Oleg Nesterov wrote: > Whatever we do, we can't "stop" other threads at the time when > thread 'a' traps. All we can do is to try to shrink the window. Yeah, "at the time" can't even be defined preciesly. Order of operation isn't clearly defined in absence of synchronization constructs. In the described example, there's unspecified amount of time (or cycles rather) between the load of the global variable and the thread faulting. Anything could have happened inbetween no matter how immediate core dump was. As long as we're reasonably immediate, which I think we already are, I don't think there's much which needs to be changed. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-01 15:23 ` Oleg Nesterov @ 2011-11-02 6:33 ` trisha yad -1 siblings, 0 replies; 14+ messages in thread From: trisha yad @ 2011-11-02 6:33 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo Thanks all for your answer. In loaded embedded system the time at with code hit do_user_fault() and core_dump_wait() is bit high, I check on my system it took 2.7 sec. so it is very much possible that core dump is not correct. It contain global value updated. So is it possible at time of send_signal() we can stop modification of faulty thread memory ? Thanks On Tue, Nov 1, 2011 at 8:53 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 11/01, trisha yad wrote: >> >> Dear all, >> >> I am running a multithreaded application. So consider a global >> variable x which is used by a, b and c thread. >> >> In thread 'a' do abnormal operation(invalid memory access) and kernel >> send signal kill to it. In the mean time Thread 'b' and Thread 'c' >> got schedule and update >> the variable x. when I got the core file, variable x got updated, and >> I am not getting actual value that is present at time of crash of >> thread a. >> But In core file I got updated value of x. I want In core file exact >> the same memory status as it at time of abnormal operation(invalid >> memory access) > > Yes, this is possible. > >> Is there any solution for such problem. ? >> >> I want in core dump the same status of memory as at time of abnormal >> operation(invalid memory access). > > I don't think we can "fix" this. > > We can probably change complete_signal() to notify other threads > "immediately", but this is not simple and obviously can not close > the window completely. > > Whatever we do, we can't "stop" other threads at the time when > thread 'a' traps. All we can do is to try to shrink the window. > > Oleg. > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-02 6:33 ` trisha yad 0 siblings, 0 replies; 14+ messages in thread From: trisha yad @ 2011-11-02 6:33 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo Thanks all for your answer. In loaded embedded system the time at with code hit do_user_fault() and core_dump_wait() is bit high, I check on my system it took 2.7 sec. so it is very much possible that core dump is not correct. It contain global value updated. So is it possible at time of send_signal() we can stop modification of faulty thread memory ? Thanks On Tue, Nov 1, 2011 at 8:53 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 11/01, trisha yad wrote: >> >> Dear all, >> >> I am running a multithreaded application. So consider a global >> variable x which is used by a, b and c thread. >> >> In thread 'a' do abnormal operation(invalid memory access) and kernel >> send signal kill to it. In the mean time Thread 'b' and Thread 'c' >> got schedule and update >> the variable x. when I got the core file, variable x got updated, and >> I am not getting actual value that is present at time of crash of >> thread a. >> But In core file I got updated value of x. I want In core file exact >> the same memory status as it at time of abnormal operation(invalid >> memory access) > > Yes, this is possible. > >> Is there any solution for such problem. ? >> >> I want in core dump the same status of memory as at time of abnormal >> operation(invalid memory access). > > I don't think we can "fix" this. > > We can probably change complete_signal() to notify other threads > "immediately", but this is not simple and obviously can not close > the window completely. > > Whatever we do, we can't "stop" other threads at the time when > thread 'a' traps. All we can do is to try to shrink the window. > > Oleg. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-02 6:33 ` trisha yad @ 2011-11-02 11:30 ` Ralf Baechle -1 siblings, 0 replies; 14+ messages in thread From: Ralf Baechle @ 2011-11-02 11:30 UTC (permalink / raw) To: trisha yad Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote: > Thanks all for your answer. > > In loaded embedded system the time at with code hit do_user_fault() > and core_dump_wait() is bit > high, I check on my system it took 2.7 sec. so it is very much > possible that core dump is not correct. > It contain global value updated. > > So is it possible at time of send_signal() we can stop modification of > faulty thread memory ? On existing hardware it is impossible to take a consistent snapshot of a multi-threaded application at the time of one thread faulting. A software simulator can handle this sort of race condition but of course this approach has other disadvantages. Ralf ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-02 11:30 ` Ralf Baechle 0 siblings, 0 replies; 14+ messages in thread From: Ralf Baechle @ 2011-11-02 11:30 UTC (permalink / raw) To: trisha yad Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell, Tejun Heo On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote: > Thanks all for your answer. > > In loaded embedded system the time at with code hit do_user_fault() > and core_dump_wait() is bit > high, I check on my system it took 2.7 sec. so it is very much > possible that core dump is not correct. > It contain global value updated. > > So is it possible at time of send_signal() we can stop modification of > faulty thread memory ? On existing hardware it is impossible to take a consistent snapshot of a multi-threaded application at the time of one thread faulting. A software simulator can handle this sort of race condition but of course this approach has other disadvantages. Ralf -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-02 6:33 ` trisha yad @ 2011-11-02 15:31 ` Tejun Heo -1 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2011-11-02 15:31 UTC (permalink / raw) To: trisha yad Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell Hello, On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote: > In loaded embedded system the time at with code hit do_user_fault() > and core_dump_wait() is bit > high, I check on my system it took 2.7 sec. so it is very much > possible that core dump is not correct. This may sound like arguing over semantics but it doesn't matter how long it takes, it's still correct. You're arguing that it's not immediate enough. IOW, no matter how fast you make it, you cannot guarantee that results from slow operation wouldn't appear. Also, the time between do_user_fault() and actual core dumping isn't the important factor here. do_user_fault() directly triggers delivery of SIGSEGV (or BUS) and signal delivery will immediately deliver SIGKILL to all other threads in the process, so it should be immediate enough, or, rather, we don't have any way to make it any more immediate. It's basically direct call + IPI (if some threads are running on other cpus). Are you actually seeing artifacts from delayed core dump? Given the code path, I'm highly skeptical that would be the actual case. If you're using shared memory between different processes, then that delay would matter but for such cases there's nothing much to do. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-02 15:31 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2011-11-02 15:31 UTC (permalink / raw) To: trisha yad Cc: Oleg Nesterov, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell Hello, On Wed, Nov 02, 2011 at 12:03:39PM +0530, trisha yad wrote: > In loaded embedded system the time at with code hit do_user_fault() > and core_dump_wait() is bit > high, I check on my system it took 2.7 sec. so it is very much > possible that core dump is not correct. This may sound like arguing over semantics but it doesn't matter how long it takes, it's still correct. You're arguing that it's not immediate enough. IOW, no matter how fast you make it, you cannot guarantee that results from slow operation wouldn't appear. Also, the time between do_user_fault() and actual core dumping isn't the important factor here. do_user_fault() directly triggers delivery of SIGSEGV (or BUS) and signal delivery will immediately deliver SIGKILL to all other threads in the process, so it should be immediate enough, or, rather, we don't have any way to make it any more immediate. It's basically direct call + IPI (if some threads are running on other cpus). Are you actually seeing artifacts from delayed core dump? Given the code path, I'm highly skeptical that would be the actual case. If you're using shared memory between different processes, then that delay would matter but for such cases there's nothing much to do. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump 2011-11-02 15:31 ` Tejun Heo @ 2011-11-02 15:55 ` Oleg Nesterov -1 siblings, 0 replies; 14+ messages in thread From: Oleg Nesterov @ 2011-11-02 15:55 UTC (permalink / raw) To: Tejun Heo Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell On 11/02, Tejun Heo wrote: > > Also, the time between do_user_fault() and actual core dumping isn't > the important factor here. do_user_fault() directly triggers delivery > of SIGSEGV (or BUS) and signal delivery will immediately deliver > SIGKILL to all other threads in the process, Not really, note the "if (!sig_kernel_coredump(sig))" check. And this is what we can improve. But this is not simple, and personally I think doesn't worth the trouble. Oleg. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Issue with core dump @ 2011-11-02 15:55 ` Oleg Nesterov 0 siblings, 0 replies; 14+ messages in thread From: Oleg Nesterov @ 2011-11-02 15:55 UTC (permalink / raw) To: Tejun Heo Cc: trisha yad, linux-mm, Russell King - ARM Linux, linux-kernel, linux-mips, kamezawa.hiroyu, mhocko, rientjes, Andrew Morton, Konstantin Khlebnikov, KOSAKI Motohiro, Rafael J. Wysocki, Rusty Russell On 11/02, Tejun Heo wrote: > > Also, the time between do_user_fault() and actual core dumping isn't > the important factor here. do_user_fault() directly triggers delivery > of SIGSEGV (or BUS) and signal delivery will immediately deliver > SIGKILL to all other threads in the process, Not really, note the "if (!sig_kernel_coredump(sig))" check. And this is what we can improve. But this is not simple, and personally I think doesn't worth the trouble. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-11-02 15:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-01 12:17 Issue with core dump trisha yad 2011-11-01 12:17 ` trisha yad 2011-11-01 15:23 ` Oleg Nesterov 2011-11-01 15:23 ` Oleg Nesterov 2011-11-01 15:59 ` Tejun Heo 2011-11-01 15:59 ` Tejun Heo 2011-11-02 6:33 ` trisha yad 2011-11-02 6:33 ` trisha yad 2011-11-02 11:30 ` Ralf Baechle 2011-11-02 11:30 ` Ralf Baechle 2011-11-02 15:31 ` Tejun Heo 2011-11-02 15:31 ` Tejun Heo 2011-11-02 15:55 ` Oleg Nesterov 2011-11-02 15:55 ` Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.