* Linux do_coredump() and SMP systems @ 2015-02-17 13:41 Sudharsan Vijayaraghavan 2015-02-17 15:12 ` Greg KH 0 siblings, 1 reply; 5+ messages in thread From: Sudharsan Vijayaraghavan @ 2015-02-17 13:41 UTC (permalink / raw) To: kernelnewbies Hi All, We are running 3.8 kernel. I have a unique scenario, where we hit on several issues in do_coredump. We have a SMP system with thousands of cores, one pthread is tied to one core. The main process containing these pthreads runs in the first core. Here is the issue # 1 When one of threads core dump, we enter into do_coredump(), now one other thread in same process running in a different core can as well core dump(before SIGKILL was delivered to it as a consequence of first core dump) This gives way to entering into do_coredump more than once. Once we have two guys entering do_coredump() one can kill other with SIGKILL the result is completely unpredictable. No guarantee we will have two core files generated in the end Linux kernel does not seem to handle it at all. Adding a spin lock within do_coredump() will solve the case of multiple entries into do_coredump() I want to know whether Linux kernel really does not handle the above case or am I missing something? Please clarify Issue # 2: Within do_coredump() SIGKILL is sent to all threads in process other than the one running core dump. There is no guarantee that SIGKILL will be immediately received by all threads in the process, which means the state of threads (particularly backtrace per thread) can be lot of different now when compared to the time at which offending thread initiated a coredump. This is in turn means the core dump generated, will have a backtrace per thread, which is not accurate Please confirm my understanding, advice on how this problem can be solved Thanks, Sudharsan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems 2015-02-17 13:41 Linux do_coredump() and SMP systems Sudharsan Vijayaraghavan @ 2015-02-17 15:12 ` Greg KH [not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com> 0 siblings, 1 reply; 5+ messages in thread From: Greg KH @ 2015-02-17 15:12 UTC (permalink / raw) To: kernelnewbies On Tue, Feb 17, 2015 at 07:11:55PM +0530, Sudharsan Vijayaraghavan wrote: > Hi All, > > We are running 3.8 kernel. That's pretty old and obsolete, why are you stuck with that version? > I have a unique scenario, where we hit on several issues in do_coredump. > We have a SMP system with thousands of cores, one pthread is tied to > one core. The main process containing these pthreads runs in the first > core. > > Here is the issue # 1 > When one of threads core dump, we enter into do_coredump(), now one > other thread in same process running in a different > core can as well core dump(before SIGKILL was delivered to it as a > consequence of first core dump) > This gives way to entering into do_coredump more than once. > Once we have two guys entering do_coredump() one can kill other with SIGKILL > the result is completely unpredictable. No guarantee we will have two > core files generated in the end > > Linux kernel does not seem to handle it at all. > Adding a spin lock within do_coredump() will solve the case of > multiple entries into do_coredump() > > I want to know whether Linux kernel really does not handle the above > case or am I missing something? Odd, we should handle this just fine, try emailing the developers responsible for this code and cc: the linux-kernel mailing list so they can work it out. thanks, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com>]
* Linux do_coredump() and SMP systems [not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com> @ 2015-02-18 6:14 ` Sudharsan Vijayaraghavan 2015-02-18 16:01 ` Greg KH 0 siblings, 1 reply; 5+ messages in thread From: Sudharsan Vijayaraghavan @ 2015-02-18 6:14 UTC (permalink / raw) To: kernelnewbies We are doing prototype so much change have gone into kernel , we are finding it difficult to upgrade to latest immediately However I ran through the code once again, indeed kernel handles it down_write(&mm->mmap_sem); in coredump_wait() makes sure the second coredump is stopped and returns negative for core_waiters I will debug further, thanks for confirming that kernel handles this scenario On Tue, Feb 17, 2015 at 9:57 PM, Sudharsan Vijayaraghavan <sudvijayr@gmail.com> wrote: > We are doing prototype so much change have gone into kernel , we are > finding it difficult to upgrade to latest immediately > However I ran through the code once again, indeed kernel handles it > down_write(&mm->mmap_sem); in coredump_wait() makes sure the second > coredump is stopped and returns negative for core_waiters > > I will debug further, thanks for confirming that kernel handles this scenario > > > On Tue, Feb 17, 2015 at 8:42 PM, Greg KH <greg@kroah.com> wrote: >> On Tue, Feb 17, 2015 at 07:11:55PM +0530, Sudharsan Vijayaraghavan wrote: >>> Hi All, >>> >>> We are running 3.8 kernel. >> >> That's pretty old and obsolete, why are you stuck with that version? >> >>> I have a unique scenario, where we hit on several issues in do_coredump. >>> We have a SMP system with thousands of cores, one pthread is tied to >>> one core. The main process containing these pthreads runs in the first >>> core. >>> >>> Here is the issue # 1 >>> When one of threads core dump, we enter into do_coredump(), now one >>> other thread in same process running in a different >>> core can as well core dump(before SIGKILL was delivered to it as a >>> consequence of first core dump) >>> This gives way to entering into do_coredump more than once. >>> Once we have two guys entering do_coredump() one can kill other with SIGKILL >>> the result is completely unpredictable. No guarantee we will have two >>> core files generated in the end >>> >>> Linux kernel does not seem to handle it at all. >>> Adding a spin lock within do_coredump() will solve the case of >>> multiple entries into do_coredump() >>> >>> I want to know whether Linux kernel really does not handle the above >>> case or am I missing something? >> >> Odd, we should handle this just fine, try emailing the developers >> responsible for this code and cc: the linux-kernel mailing list so they >> can work it out. >> >> thanks, >> >> greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems 2015-02-18 6:14 ` Sudharsan Vijayaraghavan @ 2015-02-18 16:01 ` Greg KH 2015-02-19 12:00 ` Sudharsan Vijayaraghavan 0 siblings, 1 reply; 5+ messages in thread From: Greg KH @ 2015-02-18 16:01 UTC (permalink / raw) To: kernelnewbies On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote: > We are doing prototype so much change have gone into kernel , we are > finding it difficult to upgrade to latest immediately What changes are you making to the kernel that you are sticking with such an old version (3.8 is 2 years old now, and over 155 thousand changes have happened to the kernel since then)? > However I ran through the code once again, indeed kernel handles it > down_write(&mm->mmap_sem); in coredump_wait() makes sure the second > coredump is stopped and returns negative for core_waiters Great, so it works now? confused, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
* Linux do_coredump() and SMP systems 2015-02-18 16:01 ` Greg KH @ 2015-02-19 12:00 ` Sudharsan Vijayaraghavan 0 siblings, 0 replies; 5+ messages in thread From: Sudharsan Vijayaraghavan @ 2015-02-19 12:00 UTC (permalink / raw) To: kernelnewbies Hi Greg, There is plan to move to 3.14, right now the focus it to iron out existing issues. Now with regard to core dump issue, we find 10% of times we get struck in coredump_wait(): ==> wait_for_completion(&core_state->startup); Analyzing exit_mm() to see what is going wrong here. I have one other question, which am curious about, In coredump_wait(): There is loop to wait for task inactive (no task is running on any core) ptr = core_state->dumper.next; while (ptr != NULL) { pr_err("pid: %d %s() calling wait_task_inactive() for pid : %d\n", tsk->pid,__func__,ptr->task->pid); wait_task_inactive(ptr->task, 0); pr_err("pid: %d %s() wait_task_inactive() returned for pid : %d\n", tsk->pid,__func__,ptr->task->pid); ptr = ptr->next; } There is a delay between the crash and actual generation of core dump due to the above loop. In a multicore system it is quite possible other threads of the same process can run in other cores, as a consequence the address space / program counter etc., can change Given this coredump generated will not reflect the state of process (various thread registers/mm) as it must have been at time of crash (any thread/main process) Is my understanding correct? Just probing on way to get rid of this discrepancy Thanks, Sudharsan Thanks, Sudharsan On Wed, Feb 18, 2015 at 9:31 PM, Greg KH <greg@kroah.com> wrote: > On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote: >> We are doing prototype so much change have gone into kernel , we are >> finding it difficult to upgrade to latest immediately > > What changes are you making to the kernel that you are sticking with > such an old version (3.8 is 2 years old now, and over 155 thousand > changes have happened to the kernel since then)? >> However I ran through the code once again, indeed kernel handles it >> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second >> coredump is stopped and returns negative for core_waiters > > Great, so it works now? > > confused, > > greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-19 12:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-17 13:41 Linux do_coredump() and SMP systems Sudharsan Vijayaraghavan
2015-02-17 15:12 ` Greg KH
[not found] ` <CAP0SO-GBmpMS168SRNAdFmStknC=+4EuAENCvuiaX=mXRvr7hg@mail.gmail.com>
2015-02-18 6:14 ` Sudharsan Vijayaraghavan
2015-02-18 16:01 ` Greg KH
2015-02-19 12:00 ` Sudharsan Vijayaraghavan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).